Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[General workload issue]: Query is throttled API limitation #469

Open
1 task done
8ast1en opened this issue Dec 19, 2024 · 5 comments
Open
1 task done

[General workload issue]: Query is throttled API limitation #469

8ast1en opened this issue Dec 19, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request Pattern: ALZ 🚁 Issues / PR's related to the ALZ Pattern

Comments

@8ast1en
Copy link

8ast1en commented Dec 19, 2024

Check for previous/existing GitHub issues

  • I have checked for previous/existing GitHub issues

Issue Type?

Question

Description

Hello all,

I hope you are doing well today,

I'm currently using the EPAC module to manage ALZ and AMBA alerting policys and i'm here to ask a question concerning AMBA interval alerting

I've few errors likes a ResourceHealth alerts, I regularly receive an error message {"title":"### Query is throttled",""}

I can see that most alerts have a PT5M interval, this causes an API limitation problem ->

Image

Do you have an idea or a other time interval i can use for all alerts or maybe i can replace the custom log search deploy by policy by my own metrics alerts so that api can rest correctly.

i'm currently working with this policy assignment ->

Image

Thanks a lot,

@8ast1en 8ast1en added the AMBA Core Issues / PR's related AMBA Core label Dec 19, 2024
@judyer28 judyer28 assigned judyer28 and unassigned JoeyBarnes Dec 19, 2024
@paulgrimley paulgrimley assigned Brunoga-MS and tagolovina and unassigned judyer28 Dec 19, 2024
@paulgrimley paulgrimley added Pattern: ALZ 🚁 Issues / PR's related to the ALZ Pattern and removed AMBA Core Issues / PR's related AMBA Core labels Dec 19, 2024
@sean-vancity
Copy link

Was just discussing this with the Azure Resource Graph team where it seems the throttling is happening. They indicated the log alert queries are hitting the quota hard due to a lack of staggering and retries by log search alerts. One thing they suggested is to limit the number of arg() calls in a query.

This section from a log query deployed by AMBA hits the resource graph 3 times:
let excludedResources = (arg("").resources | where type =~ "Microsoft.Compute/virtualMachines" | project _ResourceId = id, tags | where parse_json(tostring(tags.AmbaDisable)) in~ ("true")); let excludedVMSSNodes = (arg("").resources | where type =~ "Microsoft.Compute/virtualMachines" | extend isVMSS = isnotempty(properties.virtualMachineScaleSet) | where isVMSS | project id, name); let overridenResource = (arg("").resources | where type =~ "Microsoft.Compute/virtualMachines" | project _ResourceId = tolower(id), tags | where tags contains "_amba-WriteBytesPerSecond-Data-threshold-override_");

Perhaps there's a way to write the queries to a single arg() call which would cut calls by 1/3.

@sean-vancity
Copy link

Maybe something like this:

let allResources = arg("").resources;
let policyThresholdString = "10000000";
let excludedResources = allResources
| where type =~ "Microsoft.Compute/virtualMachines"
| project _ResourceId = id, tags
| where parse_json(tostring(tags.AmbaDisable)) in~ ("true");
let excludedVMSSNodes = allResources
| where type =~ "Microsoft.Compute/virtualMachines"
| extend isVMSS = isnotempty(properties.virtualMachineScaleSet)
| where isVMSS
| project id, name;
let overridenResource = allResources
| where type =~ "Microsoft.Compute/virtualMachines"
| project _ResourceId = tolower(id), tags
| where tags contains "amba-WriteBytesPerSecond-Data-threshold-override";

@Brunoga-MS Brunoga-MS added the enhancement New feature or request label Dec 20, 2024
@Brunoga-MS
Copy link
Contributor

Hello @8ast1en ,
thanks for your feedback. We recognized the need for query optimization and have added this request to our backlog for January 2025. Unfortunately, there no way to implement staggering as documented at Staggering queries since we are using API and policy ARM templates we cannot use C# code in AMBA.

Thanks,
Bruno.

@8ast1en
Copy link
Author

8ast1en commented Jan 13, 2025

Hello @Brunoga-MS,

After Few investigation, whit the EPAC module, we uses the same managed identity to deploy all alertes log rules and this is causing my issue "Query is throttled".

To ensure that alerts, we need to split the alert using another managed identity.

However, when we use the EPAC module, the same managed identity is used per assignment. So, i can't deploy the newest alerting schema whit only one managed identity.

We can see on Microsoft documentation "a user can send at most 15 queries within every 5-second window without being throttled" -> https://learn.microsoft.com/en-us/azure/governance/resource-graph/concepts/guidance-for-throttled-requests#understand-throttling-headers

Maybe, we can split these alerting using mixed of alertes metrics, alertes log rules and so one or added a new one managed identity

What do you think ?

i can't rase an issue on the epac git hub -> https://github.com/Azure/enterprise-azure-policy-as-code/issues

Thanks,

@Brunoga-MS
Copy link
Contributor

Hello @8ast1en ,
we are following up on this with the Azure Resource Graph PG as well to ensure this resolved outside of AMBA as well should it be the case. In the meantime we are working to optimize our queries so they will call ARG only once in the same query. Below there's an example of the new approach:

let policyThresholdString = "30"; let resourceTagging = (arg("").resources | where type =~ "Microsoft.Compute/virtualMachines" | where isempty(properties.virtualMachineScaleSet) | where tags.["Monitor-Disable"] !in~ ("true", "Test", "Dev", "Sandbox") | project _ResourceId = tolower(id), resourceTags = tags); InsightsMetrics | where _ResourceId has "Microsoft.Compute/virtualMachines" | where Origin == "vm.azm.ms" | where Namespace == "LogicalDisk" and Name == "WriteLatencyMs" | extend Disk=tostring(todynamic(Tags)["vm.azm.ms/mountId"]) | where Disk in ("C:", "/") | summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, Disk | join hint.remote=left kind=inner (resourceTagging) on _ResourceId | project-away _ResourceId1 | extend newThresholdString = tostring(resourceTags.["_amba-ReadLatencyMs-OS-threshold-Override_"]) | extend appliedThreshold = iif(isempty(newThresholdString), toint(policyThresholdString), toint(newThresholdString)) | where AggregatedValue > appliedThreshold | project TimeGenerated, Computer, _ResourceId, Disk, AggregatedValue, appliedThreshold

Feel free to give it a try and let me know.

Thanks,
Bruno.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Pattern: ALZ 🚁 Issues / PR's related to the ALZ Pattern
Projects
None yet
Development

No branches or pull requests

7 participants