Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver: TestTruncateLog failed #79719

Closed
cockroach-teamcity opened this issue Apr 9, 2022 · 19 comments
Closed

kv/kvserver: TestTruncateLog failed #79719

cockroach-teamcity opened this issue Apr 9, 2022 · 19 comments
Assignees
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data)

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 9, 2022

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ 9b76da708050aab88c24eaa74e7cae05aa5ddc70:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog3631979790
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:810: -- test log scope end --
--- FAIL: TestTruncateLog (0.63s)
=== RUN   TestTruncateLog/loosely-coupled=false
    raft_log_queue_test.go:771: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=false (0.29s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-14994

@cockroach-teamcity cockroach-teamcity added branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Apr 9, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Apr 9, 2022
@erikgrinaker
Copy link
Contributor

@sumeerbhola What's your take on this?

@tbg
Copy link
Member

tbg commented Apr 11, 2022

Sibling issue on master: #77825

@erikgrinaker erikgrinaker added T-kv-replication and removed T-kv KV Team labels Apr 11, 2022
@sumeerbhola
Copy link
Collaborator

This failure is in loosely-coupled=false, unlike the TestTerm issue (which I was unable to reproduce and seems to not have recurred for 2 weeks).

    --- FAIL: TestTruncateLog/loosely-coupled=false (0.29s)
    --- PASS: TestTruncateLog/loosely-coupled=true (0.30s)

Trying to reproduce.
I see the following in the logs, so maybe the problem is in the test which got unlucky with an unexpected merge.

I220409 07:14:24.781245 630552 kv/kvserver/batcheval/cmd_truncate_log.go:56 ⋮ [s1,r1/1:‹/M{in-ax}›] 1  attempting to truncate raft logs for another range: r2. Normally this is due to a merge and can be ignored.

This is contradicting the place of the failure which is the error below, which suggest more has been truncated (not less).

		// The term of the last truncated entry is still available.
		tc.repl.mu.Lock()
		term, err := tc.repl.raftTermRLocked(indexes[4])
		tc.repl.mu.Unlock()
		if err != nil {
			t.Fatal(err)
		}

@sumeerbhola
Copy link
Collaborator

640,000 runs so far with no failure :(

@rail
Copy link
Member

rail commented May 25, 2022

Manually synced with Jira

@jlinder jlinder removed the sync-me-3 label May 27, 2022
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ d64792534335d20233269318e2b1b4ce694d394e:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog2479753635
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:810: -- test log scope end --
--- FAIL: TestTruncateLog (0.75s)
=== RUN   TestTruncateLog/loosely-coupled=true
    raft_log_queue_test.go:771: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=true (0.35s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

This test on roachdash | Improve this report!

@tbg tbg added the S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) label May 30, 2022
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ ae7edece60a9d0d89a86a6fc6408eb253c371881:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog4246184072
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:805: -- test log scope end --
--- FAIL: TestTruncateLog (0.73s)
=== RUN   TestTruncateLog/loosely-coupled=false
    raft_log_queue_test.go:766: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=false (0.32s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ 3875aa648b3aeb78aa458b3513e41bdbc52004e3:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog3712744608
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:805: -- test log scope end --
--- FAIL: TestTruncateLog (0.67s)
=== RUN   TestTruncateLog/loosely-coupled=false
    raft_log_queue_test.go:766: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=false (0.31s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ d21a4a188562e73172c66fa9d57eb1baa6ab149c:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog684204320
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:805: -- test log scope end --
--- FAIL: TestTruncateLog (0.68s)
=== RUN   TestTruncateLog/loosely-coupled=true
    raft_log_queue_test.go:766: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=true (0.33s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ 0468eb91d6fa303a4f016e99aff7bb2824c38f1c:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog2078562329
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:805: -- test log scope end --
--- FAIL: TestTruncateLog (0.82s)
=== RUN   TestTruncateLog/loosely-coupled=false
    raft_log_queue_test.go:766: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=false (0.36s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ 21fc175f98866bd9b18ead1606867bdb933878e4:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog268465716
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:805: -- test log scope end --
--- FAIL: TestTruncateLog (0.82s)
=== RUN   TestTruncateLog/loosely-coupled=false
    raft_log_queue_test.go:766: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=false (0.37s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ afad258bd2690d6ff8b0b8636224775dabbd6d98:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog2791901721
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:805: -- test log scope end --
--- FAIL: TestTruncateLog (0.80s)
=== RUN   TestTruncateLog/loosely-coupled=true
    raft_log_queue_test.go:766: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=true (0.39s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestTruncateLog failed with artifacts on release-22.1 @ 9ab632f3bedfb653dacf571fb4b10e02de51ef77:

=== RUN   TestTruncateLog
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestTruncateLog2261802216
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestTruncateLog
    raft_log_queue_test.go:805: -- test log scope end --
--- FAIL: TestTruncateLog (0.88s)
=== RUN   TestTruncateLog/loosely-coupled=false
    raft_log_queue_test.go:766: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=false (0.38s)
=== RUN   TestTruncateLog/loosely-coupled=true
    raft_log_queue_test.go:766: requested index is unavailable due to compaction
    --- FAIL: TestTruncateLog/loosely-coupled=true (0.45s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@irfansharif
Copy link
Contributor

It's started happening more frequently, also on #100449.

@erikgrinaker
Copy link
Contributor

Yeah, these are on our slate, just have to deal with some GA blockers first.

@erikgrinaker erikgrinaker self-assigned this Apr 13, 2023
@erikgrinaker
Copy link
Contributor

Looks like this and TestTerm are only flaky on release-22.1, the master builds haven't flaked for months. We could consider just skipping it, since this is almost certainly a test flake and 22.1 is out of support in a month anyway.

@erikgrinaker
Copy link
Contributor

There were some related bugfixes in this area in June 2022 that weren't backported to 22.1:

However, we did see flakes on master a few days after those merged. Will see if I can find any other fixes in that area.

@erikgrinaker
Copy link
Contributor

Haven't been able to reproduce after ~100k stress runs on release-22.1, of both TestTruncateLog and TestTerm, regular and under race (to provoke races via slowness). Adding sleeps in the test and code did not reproduce either.

This test has been flaky as far back as 2019. Since this isn't flaking in later releases, and 22.1 is out of support in a month, I'm going to cut our losses and skip this on release-22.1.

@erikgrinaker
Copy link
Contributor

Resolved by #101435.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data)
Projects
None yet
Development

No branches or pull requests

7 participants