Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(mempool/cat): fix flaky TestTxPool_BroadcastQueue #1321

Merged
merged 3 commits into from
May 7, 2024

Conversation

tuantran1702
Copy link
Contributor

@tuantran1702 tuantran1702 commented Apr 23, 2024

Description

Closes #1261

@tuantran1702 tuantran1702 requested a review from a team as a code owner April 23, 2024 19:52
@tuantran1702 tuantran1702 requested review from rootulp and cmwaters and removed request for a team April 23, 2024 19:52
@rootulp rootulp changed the title fix(tests): fix flaky TestTxPool_BroadcastQueue test test(mempool/cat): fix flaky TestTxPool_BroadcastQueue Apr 24, 2024
Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get the flake to reproduce locally before or after this change. @tropicaldog do you mind explaining why this fixes the flake?

@tuantran1702
Copy link
Contributor Author

tuantran1702 commented Apr 24, 2024

@rootulp I think the original code might lead to a race condition, where the goroutine may start consuming transactions before all of them are added. Using time.Sleep(10 * time.Millisecond) to introduce a delay is not a reliable solution imo.
The command I use to reproduce the flaky behavior is:
go test -timeout 30s -run ^TestTxPool_BroadcastQueue$ github.com/cometbft/cometbft/mempool/cat -race -count 10
Interestingly, the flaky behavior only occurs on my M2 Mac, where the test fails 50% of the time. When I tried running the same command several times on my Ubuntu 22.04 PC, no flakiness was observed.

Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I'm on an M1 Mac and the test is still passing on my machine after your fix so LGTM.

Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unable to reproduce the flake locally on Mac, but this fix looks reasonable to me. There is no need to wait to introduce the delay where we can just broadcast all the transactions before starting to consume them

Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @tropicaldog

@rootulp rootulp enabled auto-merge (squash) May 7, 2024 13:20
@evan-forbes
Copy link
Member

@tropicaldog we have to update this branch by merging main, perhaps you could give one of us perms to do that on your fork so we could merge this? either that, or you will have to merge main again 😅

@tuantran1702
Copy link
Contributor Author

@tropicaldog we have to update this branch by merging main, perhaps you could give one of us perms to do that on your fork so we could merge this? either that, or you will have to merge main again 😅

Yeah I guess I didn't have enough permission. Anyway I will merge main now, thanks for your reviews!

@rootulp rootulp merged commit 485332b into celestiaorg:main May 7, 2024
14 of 15 checks passed
mergify bot pushed a commit that referenced this pull request May 7, 2024
## Description
Closes #1261

(cherry picked from commit 485332b)
evan-forbes pushed a commit that referenced this pull request May 7, 2024
… (#1345)

## Description
Closes #1261 

<hr>This is an automatic backport of pull request #1321 done by
[Mergify](https://mergify.com).

Co-authored-by: Tuan Tran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flake in TestTxPool_BroadcastQueue
4 participants