Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: ycsb/F/nodes=3/cpu=32 failed [replica inconsistency] #105697

Closed
cockroach-teamcity opened this issue Jun 28, 2023 · 4 comments
Closed
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 28, 2023

roachtest.ycsb/F/nodes=3/cpu=32 failed with artifacts on master @ fa47de054aa94a7207f3140cc8dfd87b36f8ec03:

(cluster.go:2279).Run: output in run_081342.894117559_n4_workload-run-ycsb-in: ./workload run ycsb --init --insert-count=1000000 --workload=F --concurrency=144 --splits=3 --histograms=perf/stats.json --select-for-update=true --ramp=2m --duration=30m {pgurl:1-3} returned: COMMAND_PROBLEM: exit status 1
(monitor.go:137).Wait: monitor failure: monitor task failed: t.Fatal() was called
(test_runner.go:1122).func1: 1 dead node(s) detected
test artifacts and logs in: /artifacts/ycsb/F/nodes=3/cpu=32/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=aws , ROACHTEST_cpu=32 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-29159

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 28, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Jun 28, 2023
@blathers-crl blathers-crl bot added the T-testeng TestEng Team label Jun 28, 2023
@tbg
Copy link
Member

tbg commented Jun 28, 2023

replica inconsistency

@tbg tbg changed the title roachtest: ycsb/F/nodes=3/cpu=32 failed roachtest: ycsb/F/nodes=3/cpu=32 failed [replica inconsistency] Jun 28, 2023
@tbg tbg self-assigned this Jun 28, 2023
@tbg tbg added T-kv-replication and removed T-testeng TestEng Team labels Jun 28, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 28, 2023

cc @cockroachdb/replication

@tbg
Copy link
Member

tbg commented Jun 28, 2023

87dfa2a Merge #104539 🔴 known bad
3c2d5a4 Merge #105383 #105641 🟢 50 passed acceptance/gossip/peerings
bdf2a64 Merge #105316 #105589 #105630
1f2f004 Merge #105514
6ba0fd5 Merge #104198
6ae93f4 Merge #105553
08343f5 Merge #105629
12ac5d8 Merge #105583
e8c7bdc Merge #97779 🟢 80+ passed acceptance/gossip/peerings
45be076 Merge #104620 #105482 #105579 #105581 #105596
c7965ed Merge #105536
19b8fca Merge #105272
4135fd1 Merge #104669
8d5aa76 Merge #105575
dca6ef2 Merge #105132
d74f69d Merge #105566
0cb06fc Merge #105126
57ed034 Merge #104755
0b0a212 Merge #105555
5fdd740 Merge #105588
c767b9e Merge #105522
961a2e6 Merge #105587
5b9824f Merge #105399 🟢 100 passed acceptance/gossip/peerings

@erikgrinaker erikgrinaker added A-storage Relating to our storage engine (Pebble) on-disk storage. and removed T-kv-replication labels Jun 28, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label Jun 28, 2023
@erikgrinaker erikgrinaker assigned jbowens and unassigned tbg Jun 28, 2023
craig bot pushed a commit that referenced this issue Jun 28, 2023
105618: cluster-ui: update database pages stories r=THardy98 a=THardy98

Epic: None

This change updates the stories for the databases, database details, and database table pages.

Release note: None

105645: roachtest: advance predecessor version to v23.1.2 r=THardy98 a=THardy98

Epic: None
Release note: None
Release justification: version bump

105683: protoutil: prefer MarshalToSizedBuffer over MarshalTo where possible r=erikgrinaker a=nvanbenschoten

This commit switches a number of callers of `protoutil.MarshalTo` over to `protoutil.MarshalToSizedBuffer`. The latter function is more strict in that it requires the the destination buffer to have a length of exactly `pb.Size()` bytes, as opposed to only requiring it to have a capacity of at least `pb.Size()` bytes. In return, it avoids a call to `pb.Size()`.

The three performance-sensitive callers that are changes here are:
- `roachpb.Value.SetProto`
- `raftlog.EncodeCommand`
- `storage.putBuffer.marshalMeta`

The commit also adds some testing-only assertions to verify that callers are using the functions correctly.

Epic: None
Release note: None

105712: sql: only skip import worker failure under race,stress,deadlock r=cucaroach a=cucaroach

Better to have the test running some of the time to catch regressions.

Informs: 102839
Epic: None
Release note: None


105718: Revert "storage: use size-carrying point tombstones" r=jbowens a=jbowens

This reverts commit 8edbf0f.

Epic: none
Release note: none
Informs #105700.
Informs #105697.

Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
@jbowens
Copy link
Collaborator

jbowens commented Jul 5, 2023

The reverted commit enabled use of a new sstable format with two changes:

  1. The new format allows DELSIZED point tombstones that carry the size of the deleted value.
  2. The new format introduces an 'obsolete bit' that marks obsolete keys, allowing iteration to efficiently skip over obsolete keys as necessary.

The corruption was caused by the second feature, which in some cases could cause an iterator to incorrectly skip keys. This was masked by an issue in the Pebble metamorphic tests that was failing to exercise the new sstable format version. The issue with the metamorphic tests was fixed and the 'obsolete bit' disabled temporarily in cockroachdb/pebble#2691. The Cockroach Pebble version was bumped to disable the 'obsolete bit' in #106043. Fixing the 'obsolete bit' is tracked in cockroachdb/pebble#2705. I'm reverting the revert of the point tombstone change itself in #106177.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team
Projects
Archived in project
Development

No branches or pull requests

4 participants