Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.13](backport #38294) [Metricbeat][Azure] Azure monitor/storage: Fix 429 Too Many Requests error #38457

Merged
merged 1 commit into from
Mar 20, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Mar 20, 2024

Proposed commit message

Please see the issue elastic/integrations#8575.

Metricsets affected by this change: storage and monitor.

This PR fixes the issue the following way:

  1. We check if the error produced has the header "Retry After".
  2. If this header is present then we use time.sleep for the seconds given by the header and try again after the time is up.
  3. If the header is not present, then it is likely another error and we return.

This is also the retry suggestion given by Retry Usage Guidance from Azure.

For monitor metricset, the previous implementation also had a different problem affecting the number of calls: we were trying to get the metrics definition for every metric, without checking if we had done that already for the combination namespace + resource ID. So, for example, this configuration:

    - resource_query: "resourceType eq 'Microsoft.ContainerRegistry/registries'"
      metrics:
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"
        - name: [ "StorageUsed" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1H"
          dimensions:
            - name: "Geolocation"
              value: "*"

Would lead to 2 calls for each resource, even though the namespace is the same for every metric group. This is also fixed in this PR.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Enable azure monitor metricset, like in this example configuration.

To reproduce the error you can change the code a bit to force it to make many requests. You can put the getMetricsDefinitionsWithRetry inside a for cycle for that.

I added some printing lines to the code. You should see an output like this:

[...]
>> Trying request  400
>> Trying request  401
>> Trying request  402
>> Metricbeat will sleep for 309 seconds and retry again.
>> Trying request  402  <---------- Same request as before
>> Trying request  403
[...]

Please notice that running this takes around 7 minutes to first encounter the error. After that time you have to wait for the sleep time, and only then you start making requests again.

Related issues

Closes elastic/integrations#8575.


This is an automatic backport of pull request #38294 done by [Mergify](https://mergify.com).

…error (#38294)

* Fiz azure too many requests error

Signed-off-by: constanca <[email protected]>

* Add CHANGELOG.next.asciidoc entry

Signed-off-by: constanca <[email protected]>

* Run mage check

Signed-off-by: constanca <[email protected]>

* Get metrics 1 time for each namespace/id

Signed-off-by: constanca <[email protected]>

* Add error comparison

Signed-off-by: constanca <[email protected]>

* Move retry to monitor service

Signed-off-by: constanca <[email protected]>

* Remove GetMetricDefinitions without retry

Signed-off-by: constanca <[email protected]>

* fix storage unit tests

Signed-off-by: constanca <[email protected]>

* fix monitor unit tests

Signed-off-by: constanca <[email protected]>

* Remove unnecessary for cycle

Signed-off-by: constanca <[email protected]>

---------

Signed-off-by: constanca <[email protected]>
(cherry picked from commit 10ff992)
@mergify mergify bot requested a review from a team as a code owner March 20, 2024 08:08
@mergify mergify bot added the backport label Mar 20, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 20, 2024
@botelastic
Copy link

botelastic bot commented Mar 20, 2024

This pull request doesn't have a Team:<team> label.

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 55 min 16 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@constanca-m constanca-m merged commit 2516537 into 8.13 Mar 20, 2024
21 checks passed
@constanca-m constanca-m deleted the mergify/bp/8.13/pr-38294 branch March 20, 2024 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport needs_team Indicates that the issue/PR needs a Team:* label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants