Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test - TestAuditOn #12101

Merged
merged 14 commits into from
Apr 21, 2022
Merged

Fix flaky test - TestAuditOn #12101

merged 14 commits into from
Apr 21, 2022

Conversation

Joerger
Copy link
Contributor

@Joerger Joerger commented Apr 20, 2022

There is a race condition between the UploadCompleter and Uploader. If both attempt to upload the same file, and Uploader loses, then Uploader will return an error.

This race condition became possible with the replacement of the 24 hour grace period in the UploadCompleter with the session tracker - #11551.

I'm not yet sure how to remove the race condition entirely, but this PR will make it much much more unlikely to unblock integration tests in CI.

This PR reduces the likelihood of this race condition by decoupling the UploadCompleter and Uploader, so that the UploacCompleter will always run on a ten minute interval as opposed to the 5 second interval of Uploaders.

Additionally, the proxy forwarding server does not (before this PR) use the session tracker service, so the UploadCompleter could not see whether an upload's session was active and would always try to upload session recordings prematurely.

@Joerger Joerger changed the title Fix TestAuditOn race conditions Fix flaky test - TestAuditOn Apr 20, 2022
@Joerger Joerger force-pushed the joerger/fix-upload-race-condition branch from f5ea5d8 to 1a164e5 Compare April 20, 2022 01:18
@Joerger Joerger marked this pull request as ready for review April 20, 2022 01:47
@Joerger Joerger requested a review from zmb3 April 20, 2022 01:47
@github-actions github-actions bot added the audit-log Issues related to Teleports Audit Log label Apr 20, 2022
@github-actions github-actions bot requested review from jakule and smallinsky April 20, 2022 01:47
Copy link
Collaborator

@zmb3 zmb3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me again, what's the difference between Uploader and UploadCompleter? I wonder if we can rename one of them to make things more obvious.

lib/events/complete.go Show resolved Hide resolved
lib/srv/sess.go Outdated Show resolved Hide resolved
@jakule
Copy link
Contributor

jakule commented Apr 20, 2022

Looks like now TestUploadCompleterCompletesAbandonedUploads is failing. Is this another flaky test or you think it's related to this change?

FAIL: github.com/gravitational/teleport/lib/events.TestUploadCompleterCompletesAbandonedUploads
===================================================
OUTPUT github.com/gravitational/teleport/lib/events.TestUploadCompleterCompletesAbandonedUploads
===================================================
=== RUN   TestUploadCompleterCompletesAbandonedUploads
2022-04-20T01:53:21Z DEBU [AUTH:COMP] Upload e56b79d3-c13d-4c5a-a096-61204e4367aa was abandoned, trying to complete. events/complete.go:163
2022-04-20T01:53:21Z DEBU [AUTH:COMP] Completed upload Upload(session=0be77e76-a95e-4abe-ada5-debba0133269, id=e56b79d3-c13d-4c5a-a096-61204e4367aa). events/complete.go:167
2022-04-20T01:53:21Z DEBU [AUTH:COMP] Found 1 active uploads, completed 1. events/complete.go:144
    complete_test.go:74: 
        	Error Trace:	complete_test.go:74
        	Error:      	Should be false
        	Test:       	TestUploadCompleterCompletesAbandonedUploads
--- FAIL: TestUploadCompleterCompletesAbandonedUploads (0.00s)

lib/events/complete_test.go Outdated Show resolved Hide resolved
lib/events/complete_test.go Outdated Show resolved Hide resolved
@russjones
Copy link
Contributor

@Joerger In my opinion, it's worth spending the extra week or two to eliminate flakiness.

We can pay the cost for 1-2 weeks of dedicated dev time or months of time in aggregate across the team.

@r0mant @zmb3 What do you think?

@Joerger Joerger force-pushed the joerger/fix-upload-race-condition branch from 38a539d to e54e496 Compare April 20, 2022 18:01
@Joerger Joerger force-pushed the joerger/fix-upload-race-condition branch from e54e496 to 61221a2 Compare April 20, 2022 18:02
@Joerger Joerger force-pushed the joerger/fix-upload-race-condition branch from 32a7435 to 5193cef Compare April 20, 2022 18:34
@Joerger Joerger requested review from jakule and zmb3 April 20, 2022 20:13
@Joerger
Copy link
Contributor Author

Joerger commented Apr 20, 2022

@zmb3 @jakule This is passing CI now, can you review again?

@Joerger Joerger enabled auto-merge (squash) April 21, 2022 00:55
@Joerger Joerger merged commit 93f6f61 into master Apr 21, 2022
@Joerger Joerger deleted the joerger/fix-upload-race-condition branch April 21, 2022 01:20
Joerger added a commit that referenced this pull request Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit-log Issues related to Teleports Audit Log backport-required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants