Skip to content

Latest commit

 

History

History
244 lines (156 loc) · 15.5 KB

how-to-manage-quotas.md

File metadata and controls

244 lines (156 loc) · 15.5 KB
title titleSuffix description services ms.service ms.subservice author ms.author ms.reviewer ms.date ms.topic ms.custom
Manage resources and quotas
Azure Machine Learning
Learn about the quotas and limits on resources for Azure Machine Learning and how to request quota increases.
machine-learning
machine-learning
core
SimranArora904
siarora
larryfr
11/28/2022
how-to
troubleshooting, contperf-fy20q4, contperf-fy21q2, event-tier1-build-2022

Manage and increase quotas for resources with Azure Machine Learning

Azure uses limits and quotas to prevent budget overruns due to fraud, and to honor Azure capacity constraints. Consider these limits as you scale for production workloads. In this article, you learn about:

[!div class="checklist"]

  • Default limits on Azure resources related to Azure Machine Learning.
  • Creating workspace-level quotas.
  • Viewing your quotas and limits.
  • Requesting quota increases.

Along with managing quotas, you can learn how to plan and manage costs for Azure Machine Learning or learn about the service limits in Azure Machine Learning.

Special considerations

  • A quota is a credit limit, not a capacity guarantee. If you have large-scale capacity needs, contact Azure support to increase your quota.

  • A quota is shared across all the services in your subscriptions, including Azure Machine Learning. Calculate usage across all services when you're evaluating capacity.

    Azure Machine Learning compute is an exception. It has a separate quota from the core compute quota.

  • Default limits vary by offer category type, such as free trial, pay-as-you-go, and virtual machine (VM) series (such as Dv2, F, and G).

Default resource quotas

In this section, you learn about the default and maximum quota limits for the following resources:

  • Azure Machine Learning assets
    • Azure Machine Learning compute
    • Azure Machine Learning managed online endpoints
    • Azure Machine Learning pipelines
  • Virtual machines
  • Azure Container Instances
  • Azure Storage

Important

Limits are subject to change. For the latest information, see Service limits in Azure Machine Learning.

Azure Machine Learning assets

The following limits on assets apply on a per-workspace basis.

Resource Maximum limit
Datasets 10 million
Runs 10 million
Models 10 million
Artifacts 10 million

In addition, the maximum run time is 30 days and the maximum number of metrics logged per run is 1 million.

Azure Machine Learning Compute

Azure Machine Learning Compute has a default quota limit on both the number of cores (split by each VM Family and cumulative total cores) and the number of unique compute resources allowed per region in a subscription. This quota is separate from the VM core quota listed in the previous section as it applies only to the managed compute resources of Azure Machine Learning.

Request a quota increase to raise the limits for various VM family core quotas, total subscription core quotas, cluster quota and resources in this section.

Available resources:

  • Dedicated cores per region have a default limit of 24 to 300, depending on your subscription offer type. You can increase the number of dedicated cores per subscription for each VM family. Specialized VM families like NCv2, NCv3, or ND series start with a default of zero cores. GPUs also default to zero cores.

  • Low-priority cores per region have a default limit of 100 to 3,000, depending on your subscription offer type. The number of low-priority cores per subscription can be increased and is a single value across VM families.

  • Clusters per region have a default limit of 200. This limit is shared between training clusters, compute instances and MIR endpoint deployments. (A compute instance is considered a single-node cluster for quota purposes.) Cluster quota can be increased up to a value of 500 per region within a given subscription.

Tip

To learn more about which VM family to request a quota increase for, check out virtual machine sizes in Azure. For instance GPU VM families start with an "N" in their family name (eg. NCv3 series)

The following table shows more limits in the platform. Reach out to the Azure Machine Learning product team through a technical support ticket to request an exception.

Resource or Action Maximum limit
Workspaces per resource group 800
Nodes in a single Azure Machine Learning Compute (AmlCompute) cluster set up as a non communication-enabled pool (that is, can't run MPI jobs) 100 nodes but configurable up to 65,000 nodes
Nodes in a single Parallel Run Step run on an Azure Machine Learning Compute (AmlCompute) cluster 100 nodes but configurable up to 65,000 nodes if your cluster is set up to scale per above
Nodes in a single Azure Machine Learning Compute (AmlCompute) cluster set up as a communication-enabled pool 300 nodes but configurable up to 4000 nodes
Nodes in a single Azure Machine Learning Compute (AmlCompute) cluster set up as a communication-enabled pool on an RDMA enabled VM Family 100 nodes
Nodes in a single MPI run on an Azure Machine Learning Compute (AmlCompute) cluster 100 nodes but can be increased to 300 nodes
Job lifetime 21 days1
Job lifetime on a low-priority node 7 days2
Parameter servers per node 1

1 Maximum lifetime is the duration between when a job starts and when it finishes. Completed jobs persist indefinitely. Data for jobs not completed within the maximum lifetime isn't accessible.

2 Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you implement checkpoints in your job.

Azure Machine Learning managed online endpoints

Azure Machine Learning managed online endpoints have limits described in the following table.

Resource Limit
Endpoint name Endpoint names must
  • Begin with a letter
  • Be 3-32 characters in length
  • Only consist of letters and numbers 1
  • Deployment name Deployment names must
  • Begin with a letter
  • Be 3-32 characters in length
  • Only consist of letters and numbers 1
  • Number of endpoints per subscription 50
    Number of deployments per subscription 200
    Number of deployments per endpoint 20
    Number of instances per deployment 20 2
    Max request time-out at endpoint level 90 seconds
    Total requests per second at endpoint level for all deployments 500 3
    Total connections per second at endpoint level for all deployments 500 3
    Total connections active at endpoint level for all deployments 500 3
    Total bandwidth at endpoint level for all deployments 5 MBPS 3

    1 Single dashes like, my-endpoint-name, are accepted in endpoint and deployment names.

    2 We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in a deployment, you must have a quota for 12. Otherwise, you'll receive an error.

    3 If you request a limit increase, be sure to calculate related limit increases you might need. For example, if you request a limit increase for requests per second, you might also want to compute the required connections and bandwidth limits and include these limit increases in the same request.

    To determine the current usage for an endpoint, view the metrics.

    To request an exception from the Azure Machine Learning product team, use the steps in the Request quota increases.

    Azure Machine Learning pipelines

    Azure Machine Learning pipelines have the following limits.

    Resource Limit
    Steps in a pipeline 30,000
    Workspaces per resource group 800

    Azure Machine Learning integration with Synapse

    Synapse spark clusters have a default limit of 12-2000, depending on your subscription offer type. This limit can be increased by submitting a support ticket and requesting for quota increase under the "Machine Learning Service: Spark vCore Quota" category.

    :::image type="content" source="./media/how-to-manage-quotas/spark-vcore-quota-increase.png" alt-text="Screenshot of the quota increase form with the Spark vCore Quota category selected.":::

    Virtual machines

    Each Azure subscription has a limit on the number of virtual machines across all services. Virtual machine cores have a regional total limit and a regional limit per size series. Both limits are separately enforced.

    For example, consider a subscription with a US East total VM core limit of 30, an A series core limit of 30, and a D series core limit of 30. This subscription would be allowed to deploy 30 A1 VMs, or 30 D1 VMs, or a combination of the two that doesn't exceed a total of 30 cores.

    You can't raise limits for virtual machines above the values shown in the following table.

    [!INCLUDE azure-subscription-limits-azure-resource-manager]

    Container Instances

    For more information, see Container Instances limits.

    Storage

    Azure Storage has a limit of 250 storage accounts per region, per subscription. This limit includes both Standard and Premium storage accounts.

    To increase the limit, make a request through Azure Support. The Azure Storage team will review your case and can approve up to 250 storage accounts for a region.

    Workspace-level quotas

    Use workspace-level quotas to manage Azure Machine Learning compute target allocation between multiple workspaces in the same subscription.

    By default, all workspaces share the same quota as the subscription-level quota for VM families. However, you can set a maximum quota for individual VM families on workspaces in a subscription. This lets you share capacity and avoid resource contention issues.

    1. Go to any workspace in your subscription.
    2. In the left pane, select Usages + quotas.
    3. Select the Configure quotas tab to view the quotas.
    4. Expand a VM family.
    5. Set a quota limit on any workspace listed under that VM family.

    You can't set a negative value or a value higher than the subscription-level quota.

    Screenshot that shows an Azure Machine Learning workspace-level quota.

    Note

    You need subscription-level permissions to set a quota at the workspace level.

    View quotas in the studio

    1. When you create a new compute resource, by default you'll see only VM sizes that you already have quota to use. Switch the view to Select from all options.

      :::image type="content" source="media/how-to-manage-quotas/select-all-options.png" alt-text="Screenshot shows select all options to see compute resources that need more quota":::

    2. Scroll down until you see the list of VM sizes you don't have quota for.

      :::image type="content" source="media/how-to-manage-quotas/scroll-to-zero-quota.png" alt-text="Screenshot shows list of zero quota":::

    3. Use the link to go directly to the online customer support request for more quota.

    View your usage and quotas in the Azure portal

    To view your quota for various Azure resources like virtual machines, storage, or network, use the Azure portal:

    1. On the left pane, select All services and then select Subscriptions under the General category.

    2. From the list of subscriptions, select the subscription whose quota you're looking for.

    3. Select Usage + quotas to view your current quota limits and usage. Use the filters to select the provider and locations.

    You manage the Azure Machine Learning compute quota on your subscription separately from other Azure quotas:

    1. Go to your Azure Machine Learning workspace in the Azure portal.

    2. On the left pane, in the Support + troubleshooting section, select Usage + quotas to view your current quota limits and usage.

    3. Select a subscription to view the quota limits. Filter to the region you're interested in.

    4. You can switch between a subscription-level view and a workspace-level view.

    Request quota increases

    To raise the limit or VM quota above the default limit, open an online customer support request at no charge.

    You can't raise limits above the maximum values shown in the preceding tables. If there's no maximum limit, you can't adjust the limit for the resource.

    When you're requesting a quota increase, select the service that you have in mind. For example, select Machine Learning Service, Container Instances, or Storage. For Azure Machine Learning endpoint, you can select the Request Quota button while viewing the quota in the preceding steps.

    1. Scroll to Machine Learning Service: Virtual Machine Quota.

      :::image type="content" source="./media/how-to-manage-quotas/virtual-machine-quota.png" lightbox="./media/how-to-manage-quotas/virtual-machine-quota.png" alt-text="Screenshot of the VM quota details form.":::

    2. Under Additonal Details specify the request details with the number of additional vCPUs required to run your Machine Learning Endpoint.

      :::image type="content" source="./media/how-to-manage-quotas/vm-quota-request-additional-info.png" lightbox="./media/how-to-manage-quotas/vm-quota-request-additional-info.png" alt-text="Screenshot of the VM quota additional details form.":::

    Note

    Free trial subscriptions are not eligible for limit or quota increases. If you have a free trial subscription, you can upgrade to a pay-as-you-go subscription. For more information, see Upgrade Azure free trial to pay-as-you-go and Azure free account FAQ.

    Endpoint quota increases

    When requesting the quota increase, provide the following information:

    1. When opening the support request, select Machine Learning Service: Endpoint Limits as the Quota type.

    2. On the Additional details tab, select Enter details and then provide the quota you'd like to increase and the new value, the reason for the quota increase request, and location(s) where you need the quota increase. Finally, select Save and continue to continue.

      :::image type="content" source="./media/how-to-manage-quotas/quota-details.png" lightbox="./media/how-to-manage-quotas/quota-details.png" alt-text="Screenshot of the endpoint quota details form.":::

    Next steps