Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unit and integration test flakes #811

Merged
merged 6 commits into from
Feb 23, 2024
Merged

Fix unit and integration test flakes #811

merged 6 commits into from
Feb 23, 2024

Conversation

guggero
Copy link
Member

@guggero guggero commented Feb 23, 2024

Fixes #808.
Fixes #810.

Copy link
Contributor

@ffranr ffranr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing flakes is hard! Nice job. 🎆

Copy link
Contributor

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch with the custodian fixes!

Verified that the race is fixed with 50 iterations + the non-mock implementations of ReceiveProof do return copies.

@guggero guggero enabled auto-merge February 23, 2024 16:12
This commit is a small optimization in that we only start watching for a
proof on startup if we did receive a confirmation for it.
This fixes a unit test race condition where two goroutines would be
operating on the same proof because they both received the same
reference from the mock. This would not happen in production as the
proof archive always creates a new proof object.
So we fix the race by also creating a unique copy of the mocked proof
whenever it is requested.
Fixes #808.

This fixes an issue with errors not being wrapped correctly in some
places. That lead to the error not being recognized as a Postgres
serialization error, which meant the re-try mechanism wouldn't take
over.
Fixes #807.

If we're syncing a node to a universe that is still in the process of
getting proofs pushed to, we sometimes receive leaves with a root that
doesn't match the root we fetched at the beginning of the sync process.
We should re-try the sync, which we'll do in a separate PR.
For now we just want to fix the flake, which we can do by waiting to
start a new node until the existing one has finished syncing.
@guggero guggero added this pull request to the merge queue Feb 23, 2024
Merged via the queue into main with commit 8eff420 Feb 23, 2024
14 checks passed
@guggero guggero deleted the flake-fix branch February 23, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

universe: re-try sync if root mismatches [bug]: unit flake: race in custodian from proof import
3 participants