You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upon initial investigation, it was concluded that the test is likely to be flaky. Tried rerunning the test 5 times locally and was not able to reproduce.
go test -count=5 -timeout 30m -race -run ^TestSnapshotWithPruning$ github.com/cosmos/cosmos-sdk/baseapp
ok github.com/cosmos/cosmos-sdk/baseapp 372.122s
Making this issue to keep track of the problem, in case it happens again.
This is the conclusion from the initial investigation:
According to the logs, everything works as expected besides snapshot store pruning.
m.logger.Info("completed state snapshot", "height", height, "format", snapshot.Format)
ifm.opts.KeepRecent>0 {
m.logger.Debug("pruning state snapshots")
pruned, err:=m.Prune(m.opts.KeepRecent)
iferr!=nil {
m.logger.Error("Failed to prune state snapshots", "err", err)
return
}
m.logger.Debug("pruned state snapshots", "pruned", pruned)
}
When we list snapshots in tests, we expect only the latest one (at height 20) to be returned. However, it returns both height 15 and 20, indicating that the one at height 15 is still present when it shouldn't.
If you check out the logs and compare them to the code linked above, there is a log for:
2022-05-10T14:54:21Z INFO completed state snapshot format=2 height=20 module=sdk/app
but no log saying:
m.logger.Error("Failed to prune state snapshots", "err", err)
which should have showed up at the info level if there was indeed an error. I think that the test is simply flaky. More specifically, we end up listing snapshots before the pruning completes. As a result, we get 2 snapshots rather than one. Most likely, if there was a 200-500ms sleep, this wouldn't happen.
So to mitigate this from happening again we can:
add a sleep statement before listing snapshots here:
Summary of Bug
A test-race failure was observed in CI: https://github.com/cosmos/cosmos-sdk/runs/6372160448?check_suite_focus=true
Upon initial investigation, it was concluded that the test is likely to be flaky. Tried rerunning the test 5 times locally and was not able to reproduce.
Making this issue to keep track of the problem, in case it happens again.
This is the conclusion from the initial investigation:
According to the logs, everything works as expected besides snapshot store pruning.
cosmos-sdk/snapshots/manager.go
Lines 446 to 458 in 6872cae
When we list snapshots in tests, we expect only the latest one (at height 20) to be returned. However, it returns both height 15 and 20, indicating that the one at height 15 is still present when it shouldn't.
If you check out the logs and compare them to the code linked above, there is a log for:
but no log saying:
which should have showed up at the info level if there was indeed an error. I think that the test is simply flaky. More specifically, we end up listing snapshots before the pruning completes. As a result, we get 2 snapshots rather than one. Most likely, if there was a 200-500ms sleep, this wouldn't happen.
So to mitigate this from happening again we can:
cosmos-sdk/baseapp/baseapp_test.go
Line 1987 in 6872cae
cosmos-sdk/snapshots/manager.go
Lines 446 to 458 in 6872cae
Steps to Reproduce
Wasn't able to reproduce but linked the failed job.
For Admin Use
The text was updated successfully, but these errors were encountered: