Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: mempool: Add unit and integration tests #8017

Merged
merged 6 commits into from
Feb 14, 2022
Merged

Conversation

brdji
Copy link
Contributor

@brdji brdji commented Feb 2, 2022

Related Issues

There were 0 existing messagepool integration tests. Since the pool was only being tested as a side-effect of other features and integration tests, adding tests targeting only the messagepool will allow easier debugging, code safety, and improve coverage.

Proposed Changes

Add more unit and integration tests covering untested behaviors:

  • Uncovered lines (e.g. check_test.go, and Add message tests in messagepool_test.go)
  • Uncovered integration behaviors: batch push, remove, etc.

Checklist

Before you mark the PR ready for review, please make sure that:

  • All commits have a clear commit message.
  • The PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, INTERFACE BREAKING CHANGE, CONSENSUS BREAKING, build, chore, ci, docs,perf, refactor, revert, style, test
    • area: api, chain, state, vm, data transfer, market, mempool, message, block production, multisig, networking, paychan, proving, sealing, wallet, deps
  • This PR has tests for new functionality or change in behaviour
  • If new user-facing features are introduced, clear usage guidelines and / or documentation updates should be included in https://lotus.filecoin.io or Discussion Tutorials.
  • CI is green

@brdji brdji requested a review from a team as a code owner February 2, 2022 17:09
@codecov
Copy link

codecov bot commented Feb 8, 2022

Codecov Report

Merging #8017 (3438732) into master (2e22781) will increase coverage by 0.08%.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8017      +/-   ##
==========================================
+ Coverage   39.13%   39.22%   +0.08%     
==========================================
  Files         662      663       +1     
  Lines       72160    72171      +11     
==========================================
+ Hits        28242    28310      +68     
+ Misses      39008    38953      -55     
+ Partials     4910     4908       -2     
Impacted Files Coverage Δ
itests/kit/circuit.go 81.81% <81.81%> (ø)
markets/loggers/loggers.go 88.88% <0.00%> (-11.12%) ⬇️
chain/events/observer.go 71.64% <0.00%> (-6.72%) ⬇️
storage/wdpost_sched.go 75.49% <0.00%> (-5.89%) ⬇️
node/hello/hello.go 63.63% <0.00%> (-3.41%) ⬇️
extern/sector-storage/stores/local.go 57.22% <0.00%> (-2.78%) ⬇️
chain/store/store.go 63.00% <0.00%> (-2.50%) ⬇️
extern/storage-sealing/fsm.go 56.44% <0.00%> (-2.44%) ⬇️
miner/miner.go 54.75% <0.00%> (-1.97%) ⬇️
chain/gen/gen.go 68.19% <0.00%> (-1.23%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e22781...3438732. Read the comment docs.

Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm newer to this area of the code but LGTM. If I'm missing a logic problem or flaky test introduction we can always remove tests as they become problems.

Per @vyzo's feedback, we shouldn't parse err messages but figure out
a way to do this smarter. I updated the code just check for error
existence and @brdji should figure out what to do next.
@ZenGround0
Copy link
Contributor

@TheMenko tests failing now

@ZenGround0 ZenGround0 self-requested a review February 10, 2022 19:40
Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since your last commit had no logic change I think this mean one of your tests is flaky

@TheDivic
Copy link
Contributor

@ZenGround0 Seems we have a flake, I may have just died a little bit inside. 🥲
Do you have some advice how can I rerun tests on CI manually to test if I fixed it, or I have to push changes to trigger it?

@ZenGround0
Copy link
Contributor

I have the ability to rerun failures but since the test is flaky that won't do much good because a failure always means you need another push, it's rerunning successes that you're interested in.

In general I would run it under right conditions and enough times locally to get a failure reproduction, make the change that removes the flake and try to reproduce again locally if you're unsure. Usually understanding the root cause of the flake and convincing myself that it's gone is enough for me.

In the rare case local reproduction is just not happening then after making the change I was convinced removed the flake I would make a bunch of noop changes and push enough times to convince myself with high probability the flake was gone.

@TheDivic
Copy link
Contributor

@ZenGround0 Thanks for the detailed explanation! We will find a way to reproduce and fix it locally and we'll get back to you after with an explanation! 🙂

The flake was caused by improper waiting for certain chain operations
to finish:

- We didn't wait for messages to be registered as pushed
- We improperly waited for a fixed time (10 seconds) for messages to be
mined, which in the best case would wait longer than necessary and in the
worst case would cause the test to break.

What I did:
- fixed by waiting in a loop for "just enough time". This fixed the flake
and made the test run faster, on average, because we don't have unnecessary
waiting.
- I added a "circuit-breaker" where the wait loop will timeout after 10 seconds.
Using the same pattern described in my previous commit.
I also added the CircuitBreaker to the itests kit as it may be useful
for other integration tests when debugging flakyness caused by timeouts.
@TheDivic
Copy link
Contributor

@ZenGround0 I think this PR is ready to be merged now.
The flake was caused by improper waiting for async operations to complete, using time.Sleep.
I reproduced it locally by running it many times with go test -count=N. I used the same approach to validate that it's now fixed.
The solution is explained in detail in my commit messages.
I also extended the itests/kit with a useful helper function for debugging and fixing flakes caused by this same reason, as it may be useful in other tests.

You can use it if t.Deadline() is not "granular" enough, and you want to know which specific piece of code timed out,
or you need to set different deadlines in the same test.
*/
func CircuitBreaker(t *testing.T, label string, throttle, timeout time.Duration, cb func() bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@ZenGround0 ZenGround0 merged commit 3b5b55d into master Feb 14, 2022
@ZenGround0 ZenGround0 deleted the bloxico/mempool_tests branch February 14, 2022 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants