Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TESTING] - Investigate Restart Test Failures #3651

Closed
bfish713 opened this issue Sep 5, 2024 · 1 comment · Fixed by #3669
Closed

[TESTING] - Investigate Restart Test Failures #3651

bfish713 opened this issue Sep 5, 2024 · 1 comment · Fixed by #3669

Comments

@bfish713
Copy link
Collaborator

bfish713 commented Sep 5, 2024

What is this task and why do we need to work on it?

test_all_restart_one_da has been very flaky on CI
What we know so far:
1.) There is some timing issue on startup, causing some nodes to timeout on view 1 at first then eventually view passes
2.) When view 1 is timing out initially this causes some of our logic to fail and we get a subtraction overflow
3.) Lowering the next_view_timeout causes some random views to fail, see if we can find the bottleneck or bug

What work will need to be done to complete this task?

Investigate the 3 issues above, for the subtraction overflow it is a pretty easy fix to get the test from crashing but still want to know why we are getting into this scenario with shorter view timeouts

Are there any other details to include?

No response

What are the acceptance criteria to close this issue?

No flaky restart tests

Branch work will be merged to (if not the default branch)

No response

@lukeiannucci
Copy link
Contributor

pr1: some cleanup to prevent overflow
#3668 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants