Resource management
Quotas
There are quotas on the total amount of CPU, memory and GPU resources that your application can consume across all of its workloads at a single point in time.
For example, if your Total CPU
quota is 100, then at any moment in time, it is possible to run a service with 100 replicas using 1 CPU each, or 10 function calls using 10 CPU each, or any other combination such that the total CPU utilization across all workloads does not exceed 100 CPUs.
These global quotas take priority over the max_workers
, max_replicas
and max_concurrency
settings for individual workloads. So be careful when requesting a large amount of resources for any single workload, since this may result in your other workloads not being able to start or scale to their desired capacity.
You can check your quotas by running multinode quotas
CLI command. The output should look something like this:
Resource | Max quota |
---|---|
Total CPU | 64 |
Total memory | 128GiB |
Total GPU | coming soon |
Contact support if you need to increase them. We are very flexible about our quotas. Quotas are there to prevent accidents, not to get in your way.