Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DX Streamline]: Flaky Tests in GitHub Actions #12001

Open
10 of 15 tasks
rjan90 opened this issue May 16, 2024 · 16 comments
Open
10 of 15 tasks

[DX Streamline]: Flaky Tests in GitHub Actions #12001

rjan90 opened this issue May 16, 2024 · 16 comments
Milestone

Comments

@rjan90
Copy link
Contributor

rjan90 commented May 16, 2024

Description

This tracking issue is to monitor the investigation and resolution of flaky/failing tests observed only in GitHub Actions. These tests have shown at most 2 failures in 54 runs. (Ref: #11786)

List of Flaky Tests

Tasks

@jennijuju
Copy link
Member

@aarshkshah1992 with your remove market PR, will itest-deals_pricing be gone as well?

@rjan90
Copy link
Contributor Author

rjan90 commented May 28, 2024

Some additional notes on a couple of these tests:

  • itest-harmonydb can probably be removed once Curio has migrated succesfully to their own repo.
  • multicore-sdr in the PR adding it it was noted that "the test works when run individually, but broken when it runs with any other tests as proofs expect the call to FilInitLogFd to be the very first proofs call."
  • itest-sector_finalize_early tests finalizing a sector (i.e moving it to the long-term storage), before the ProveCommit-message is sent. The option/config was added iirc, because there where a lot of issues early on that could cause the FinalizeStep (moving a sealed sector to its LTS) to fail, that raised the want for this config. But afaict, the finalize early config is not set by default today.

@rjan90 rjan90 changed the title Tracking Issue: Flaky Tests in GitHub Actions [DX Streamline]: Flaky Tests in GitHub Actions Jun 4, 2024
@rjan90 rjan90 added this to the DX-Streamline milestone Jun 4, 2024
@rjan90
Copy link
Contributor Author

rjan90 commented Jun 17, 2024

With the removal of markets in Lotus/Lotus-Miner, these tests has been removed:

  • itest-path_type_filters
  • itest-deals_pricing

Therefore I´m setting these as completed. Ref: #12099

@aarshkshah1992
Copy link
Contributor

aarshkshah1992 commented Jul 9, 2024

Fixed a couple of flaky tests as part of

#12203 [eth_filters_itest]
#12200 [eth_legacy_transaction_itest]

So marking them as done .

@aarshkshah1992
Copy link
Contributor

@rvagg
Copy link
Member

rvagg commented Jul 16, 2024

This one's new, unit-rest: https://github.com/filecoin-project/lotus/actions/runs/9950234508/job/27487773104?pr=12229#logs

Not sure I want to register this as a high priority flaky because it's the first time I've seen it and I can't even see in the output what the failure is because so many tests are mixed up.

@ribasushi
Copy link
Collaborator

@rvagg
Copy link
Member

rvagg commented Jul 31, 2024

Added TestTraceFilter which is a new test. @snissn can you quickly have a look at https://github.com/filecoin-project/lotus/actions/runs/10172996595/job/28136413489#step:10:4128 and see if you can suggest why it might be failing? It's getting 4 traces instead of 3 at

require.EqualValues(t, len(traces), 3)
, which is weird. I'd guess it's a race if it was 2 instead of 3 but one more? What could that be finding?

@rvagg
Copy link
Member

rvagg commented Aug 8, 2024

I've seen more instances of the above failure now.

Plus another failure in the same itest: https://github.com/filecoin-project/lotus/actions/runs/10298551550/job/28504189068?pr=12327

        	Error Trace:	/home/runner/work/lotus/lotus/itests/eth_transactions_test.go:701
        	Error:      	Received unexpected error:
        	            	cannot get trace for block 14: failed to get tipset: requested a future epoch (beyond 'latest')
        	Test:       	TestTraceFilter

@snissn we're going to need your help on these I think.

@rvagg
Copy link
Member

rvagg commented Aug 9, 2024

manual-onboarding flaky TestManualSectorOnboarding/WithRealProofs: https://github.com/filecoin-project/lotus/actions/runs/10300409613/job/28509755121?pr=12327
looks like a disagreement between the blockminer and the manual miner about when PoST is supposed to be submitted, blockminer pauses mining to wait for message, manual miner doesn't seem to think it needs one; seems like an unaccounted-for edge case?

@ribasushi
Copy link
Collaborator

@rvagg
Copy link
Member

rvagg commented Sep 11, 2024

I seem to have introduced a flaky test in gateway when looking at rate limits: https://github.com/filecoin-project/lotus/actions/runs/10820940825/job/30022021725#step:9:998

    gateway_test.go:398: expected end: 2024-09-11 23:04:13.504329557 +0000 UTC m=+20.756356531, now: 2024-09-11 23:04:09.391624721 +0000 UTC m=+16.643651706, allowPad: 800ms, actual delta: -4.112704738s
    gateway_test.go:399: 
        	Error Trace:	/home/runner/work/lotus/lotus/itests/gateway_test.go:399
        	Error:      	Max difference between 2024-09-11 23:04:13.504329557 +0000 UTC m=+20.756356531 and 2024-09-11 23:04:09.39166259 +0000 UTC m=+16.643689571 allowed is 800ms, but difference was 4.11266696s
        	Test:       	TestGatewayRateLimits

That's saying that it's completing a series of requests ~4s faster than it should, the max allowed padding is 800ms so it's way faster than even the outer bounds of the timing. The test sets up an environment where it should slow down requests in a fairly predictable way. It's still got lots of real-world effects feeding into it that make it variable, so something's in the way.

@ribasushi
Copy link
Collaborator

rvagg added a commit that referenced this issue Oct 1, 2024
Flaky reported multiple times in #12001
and we need some hints on why it's showing up as 4 traces instead of 3 sometimes.

Also fix the assertions while I'm in there.
rvagg added a commit that referenced this issue Oct 1, 2024
Flaky reported multiple times in #12001
and we need some hints on why it's showing up as 4 traces instead of 3 sometimes.

Also fix the assertions while I'm in there.
rvagg added a commit that referenced this issue Oct 2, 2024
Flaky reported multiple times in #12001
and we need some hints on why it's showing up as 4 traces instead of 3 sometimes.

Also fix the assertions while I'm in there.
@rvagg
Copy link
Member

rvagg commented Oct 3, 2024

TestContractInvocationMultiple is flaky, here's the latest: https://github.com/filecoin-project/lotus/actions/runs/11142239872/job/30964794231

Sadly my fault again #12431, but I was testing something in a way that hasn't been properly tested so it makes me sus about the underlying behaviour.

@masih
Copy link
Member

masih commented Oct 17, 2024

TestEthBlockHashesCorrect_MultiBlockTipset in itest-eth_block_hash seems to be flaky:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⌨️ In Progress
Development

No branches or pull requests

6 participants