Multinode - Rapidly build distributed cloud applications in Python | Product Hunt

Resource management

Quotas

There are quotas on the total amount of CPU, memory and GPU resources that your application can consume across all of its workloads at a single point in time.

For example, if your Total CPU quota is 100, then at any moment in time, it is possible to run a service with 100 replicas using 1 CPU each, or 10 function calls using 10 CPU each, or any other combination such that the total CPU utilization across all workloads does not exceed 100 CPUs.

These global quotas take priority over the max_workers, max_replicas and max_concurrency settings for individual workloads. So be careful when requesting a large amount of resources for any single workload, since this may result in your other workloads not being able to start or scale to their desired capacity.

You can check your quotas by running multinode quotas CLI command. The output should look something like this:

ResourceMax quota
Total CPU64
Total memory128GiB
Total GPUcoming soon

Contact support if you need to increase them. We are very flexible about our quotas. Quotas are there to prevent accidents, not to get in your way.

Previous
Workers and autoscaling