Quotas | Multinode

There are quotas on the total amount of CPU, memory and GPU resources that your application can consume across all of its workloads at a single point in time.

For example, if your Total CPU quota is 100, then at any moment in time, it is possible to run a service with 100 replicas using 1 CPU each, or 10 function calls using 10 CPU each, or any other combination such that the total CPU utilization across all workloads does not exceed 100 CPUs.

These global quotas take priority over the max_workers, max_replicas and max_concurrency settings for individual workloads. So be careful when requesting a large amount of resources for any single workload, since this may result in your other workloads not being able to start or scale to their desired capacity.

You can check your quotas by running multinode quotas CLI command. The output should look something like this:

Resource	Max quota
Total CPU	64
Total memory	128GiB
Total GPU	coming soon

Contact support if you need to increase them. We are very flexible about our quotas. Quotas are there to prevent accidents, not to get in your way.