Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flakyness of DnsGetHostAddresses_PostCancelledToken_Throws #70044

Merged
merged 2 commits into from
Jun 1, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -184,13 +184,17 @@ public async Task DnsGetHostAddresses_PostCancelledToken_Throws()
using var cts = new CancellationTokenSource();

Task task = Dns.GetHostAddressesAsync(TestSettings.UncachedHost, cts.Token);

// This test might flake if the cancellation token takes too long to trigger:
// It's a race between the DNS server getting back to us and the cancellation processing.
cts.Cancel();

OperationCanceledException oce = await Assert.ThrowsAnyAsync<OperationCanceledException>(() => task);
Assert.Equal(cts.Token, oce.CancellationToken);
// This test is nondeterministic, the cancellation may take too long to trigger:
// It's a race between the DNS server getting back to us and the cancellation processing.
// since the host does not exist, both cases throw an exception.
Exception ex = await Assert.ThrowsAnyAsync<Exception>(() => task);
if (ex is OperationCanceledException oce)
{
Comment on lines +193 to +194
Copy link
Member

@antonfirsov antonfirsov Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit late to the party, but the test will now no longer fail if the implementation lacks actual cancellation support, so it became mostly useless.

In theory, [Collection(nameof(DisableParallelization))] was here to somewhat mitigate the race. I wonder if there is something specific about the OS-es/machines/queues where this is failing?

I see Ubuntu failures in kusto, which doesn't make sense to me, since the test is marked with [ActiveIssue(***, TestPlatforms.AnyUnix)] which AFAIK should disable the test on Linux.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have re-enabled the tests in #70009, since the issue mentioned was already closed (and the feature presumably implemented)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is something specific about the OS-es/machines/queues where this is failing?

It is failing in jitstress, which may increase the chance of the cancelling thread losing the race.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the test will now no longer fail if the implementation lacks actual cancellation support, so it became mostly useless.

I have missed that :/, do you have any suggestions how to bring it back without the flakyness?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is failing in jitstress, which may increase the chance of the cancelling thread losing the race.

Is there a way to detect if we are in a jitstress run, and skip the test in those cases only? It feels to me like jitstress is fundamentally incompatible with timing-critical networking tests, which we have many in fact.

I have re-enabled the tests in #70009, since the issue mentioned was already closed (and the feature presumably implemented)

#34633 has been reverted, there is still no async name resolution and cancellation support on Linux.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I have fallen to another trap... I will revert that PR then.

We should be able to detect jitstress by looking at envvars, I will take a look

Copy link
Member

@antonfirsov antonfirsov Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was super-confusing. We may consider reopening #33378, but it's very unlikely that we'll address in the near future, so probably it's better to just skip the test on Unix with [PlatformSpecific] + a comment.

// canceled in time
Assert.Equal(cts.Token, oce.CancellationToken);
}
}

// This is a regression test for https://github.com/dotnet/runtime/issues/63552
Expand Down