Workers and autoscaling

Multinode is optimized to provision the appropriate amount of resources to match demand, while safeguarding against accidental overspending.

Functions

A function maintains a warm pool of workers, with each worker staying alive between successive function calls. A function will scale the number of workers in its pool to match the number of in-flight invocations, subject to certain constraints. You can specify these constraints using the max_workers, spare_workers and cooldown_period keyword arguments.

import multinode as mn

@mn.function(
  max_workers=30,
  spare_workers=2,
  cooldown_period=300
)
def square(x):
    return x ** 2

In the example above:

If there are 13 function invocations in flight at the current moment, then the framework will aim to scale the number of workers to 15 workers (i.e. 13 workers to meet the current demand, plus 2 additional spare_workers). Keeping spare workers alive is an effective way to avoid increased latency due to cold starts.
After a worker has completed a function invocation, it will remain alive for at least another 300 seconds (the cooldown_period), in anticipation of the load increasing again in the future.
When the system is idle, the number of workers falls to 2 (the number of spare_workers).
Under intense load, the number of workers can rise to up to 30 (the number of max_workers).

If cold starts are not a significant concern, then spare_workers should be set to zero to save costs.

The default values are max_workers=10, spare_workers=0 and cooldown_period=60. The highest allowed value for max_workers is 100.

Jobs

A job provisions a new worker per job execution, so concepts of spare workers and cooldown periods do not apply. However, it is possible to set a limit on the maximum concurrent executions of a job as a safeguarding measure.

@mn.job(max_concurrency=25)
def run_job(x):
    ###

The default value for max_concurrency is 10 and the maximum allowed value is 100.

Services

A service maintains a fleet of identical replica processes, each running the same API code behind the service's load balancer. The service will scale the number of replicas in order to maintain a certain target average CPU utilization across its replicas.

The autoscaling behaviour can be configured using the min_replicas, max_replicas and target_cpu_utilization keyword arguments.

import multinode as mn

@mn.service(
    port=80,
    min_replicas=3,
    max_replicas=20,
    target_cpu_utilization=0.4
)
def api():
    app.run()

Note that the default value for min_replicas is 2. This is to provide high availability, in case one replica suffers an unexpected hardware failure. If high availability is not a key requirement for your application, then set min_replicas to 1 to reduce costs.

The highest allowed value for max_replicas is 100. The target_cpu_utilization can take any decimal value between 0.05 and 0.95.

Scheduled tasks

A scheduled task provisions a new worker for each task execution, so there is nothing to configure.

Daemons

A daemon runs on a single process, so there is nothing to configure here either.