Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetricsAdvisor] Added RetryOnTooManyRequestsPolicy to tests #20653

Merged
merged 3 commits into from
Apr 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
using System.Collections.Generic;
using Azure.AI.MetricsAdvisor.Administration;
using Azure.AI.MetricsAdvisor.Models;
using Azure.Core;
using Azure.Core.TestFramework;
using NUnit.Framework;

Expand Down Expand Up @@ -38,7 +39,7 @@ public MetricsAdvisorLiveTestBase(bool isAsync) : base(isAsync)
public MetricsAdvisorAdministrationClient GetMetricsAdvisorAdministrationClient(bool useTokenCredential = false)
{
var endpoint = new Uri(TestEnvironment.MetricsAdvisorUri);
var instrumentedOptions = InstrumentClientOptions(new MetricsAdvisorClientsOptions());
var instrumentedOptions = GetInstrumentedOptions();

MetricsAdvisorAdministrationClient client = useTokenCredential
? new(endpoint, TestEnvironment.Credential, instrumentedOptions)
Expand All @@ -50,7 +51,7 @@ public MetricsAdvisorAdministrationClient GetMetricsAdvisorAdministrationClient(
public MetricsAdvisorClient GetMetricsAdvisorClient(bool useTokenCredential = false)
{
var endpoint = new Uri(TestEnvironment.MetricsAdvisorUri);
var instrumentedOptions = InstrumentClientOptions(new MetricsAdvisorClientsOptions());
var instrumentedOptions = GetInstrumentedOptions();

MetricsAdvisorClient client = useTokenCredential
? new(endpoint, TestEnvironment.Credential, instrumentedOptions)
Expand Down Expand Up @@ -88,5 +89,14 @@ protected void ValidateGroupKey(DimensionKey groupKey)
Assert.That(column.Value, Is.Not.Null.And.Not.Empty);
}
}

private MetricsAdvisorClientsOptions GetInstrumentedOptions()
{
var options = new MetricsAdvisorClientsOptions();

options.Retry.MaxRetries = 6;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doubling the default maximum number of retries allowed by our retry policy (default = 3).

At first, my original approach was trying to use an HttpPipelinePolicy to keep retrying as long as we got 429 errors back from the service (you can check it here). It worked just fine.

However, I have changed the approach to something simpler (and a bit less deterministic). The HttpPipelinePolicy could end up in an endless loop if the service kept returning a 429 for any reason (for example, a bug on their side).

Increasing the maximum number of retries seems to be enough to solve the problem. Here is the list of status codes affected by this change. These are not critical errors and will be retried 3 more times in case of failure.

I have run the live tests pipeline 3 times with the MaxRetries approach and had zero failures, which is great given that we were hitting the problem in 100% of the runs. We can easily increase the number if we start seeing the error again, or consider using the HttpPipelinePolicy again if the error persists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, are the other languages facing this too? And are the solutions the same? I ask because this seems generic enough that according to how their pipelines are set, it will apply to everyone else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised this concern during standup some weeks ago when the issue started, but no other languages seemed to be facing it at the time. It depends on how the pipelines are set like you said, and on the number of tests as well.


return InstrumentClientOptions(options);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ public async Task UpdateHookAsync()
}

[Test]
[Ignore("https://github.com/Azure/azure-sdk-for-net/issues/20677")]
Copy link
Member Author

@kinelski kinelski Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue has been present for more than a week and is not related to these changes. This one is from Apr 19th: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=849790&view=logs&j=68082d65-ebf7-5f3f-494d-d3fd09e80dd6&t=6f6444ca-c101-5878-5224-de85502bb056&l=89

public async Task GetHooksAsync()
{
string endpoint = MetricsAdvisorUri;
Expand Down