-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [replica inconsistency] #69414
Comments
on n7 (not n5 as indicated by the message - it's getting confused because of the gaps in the node list)
We've seen this before, in #67471. |
Uh-oh:
====== diff(f8449b7052dcb7045aefe6309381393c6715d3408fb571e9dfd30d88ec7a7f07819b997ada537108433b50daaec786f8fb7dc22109e1ffc1267b189faeb85c54, [minority]) ======
--- leaseholder
+++ follower
+0,0 ‹/Table/61/1/1686/4881/0›
+ ts:1970-01-01 00:00:00 +0000 UTC
+ value:‹1629979313.521491293,2 {Txn:id=a181b3d4 key=/Table/55/1/1686/2/0 pri=0.02746475 epo=4 ts=1629979313.521491293,2 min=1629979155.112560857,0 seq=5 Timestamp:1629979313.521491293,2 Deleted:false KeyBytes:12 ValBytes:306 RawBytes:[] IntentHistory:[] MergeTimestamp:<nil> TxnDidNotUpdateMeta:<nil>}›
+ raw mvcc_key/value: ‹c589f70696f713118800› ‹0a3d0a10a181b3d4ff05487885ec309606895b5e1a07bf89f706968a8820042a0c08ddfabbadcbb1b6cf16100230e9ff2338054a0a08d9919a9efdacb6cf16120c08ddfabbadcbb1b6cf1610021800200c28b202›
consistency check failed I'm uploading the full artifacts to https://drive.google.com/drive/folders/1z_gpHX39QwKMC4x7aYWMriRxV6mV5cTC?usp=sharing (CRL only). |
https://gist.github.com/4d458af340e47fa9c1f549297eb747f9 |
The timestamp of the separated intent missing on the leaseholder n6 is Thursday, August 26, 2021 12:01:53 PM The log timestamps at the time of the consistency check are close to an hour later, around 12:52. So whatever went wrong likely went wrong around 12:0[01] on r723. Unfortunately, there's radio silence around this time:
I'm also not seeing anything obviously wrong around the 12:00 mark:
|
The txnid also makes no appearance in the logs. Neither does the full pretty-printed key. |
If this is reproducible, then one thing to try would be to change the code to disable the SINGLEDEL optimization. btw, there is a case where SINGLEDEL can have 2 SETs under it.
|
@sumeerbhola how ironclad is the testing of single deletions on the pebble side? I expect this to be very very hard to reproduce, so if we can do any work on the pebble side to make sure that single deletes are very well tested in the presence of concurrent compactions, ingestions, etc that would be helpful. |
@AlexTalks will start trying to reproduce this tomorrow, with help from yours truly. I'll need to take another look at the test failure to get an idea of whether we can skip parts of the test or specialize it somehow, but we probably want to run with something like fd9aa5a and also more stringent consistency checks, perhaps even specialized checks on txn records we know have been single-del'ed. |
It is quite thoroughly tested using the Pebble metamorphic test which generates single delete just like other operations to be used in batches and file ingestion. |
@nicktrav will start working on enhancing the Pebble metamorphic test for this case + make it behave randomly in terms of sometimes using Del instead of SingleDel for keys that are eligible for SingleDel. |
The timeline in the test is:
Given all of this, I think we can make some assumptions:
In the past, such inconsistencies would usually come down to a problem at the storage layer (i.e. the "singledelete didn't do its job" theory the storage team is looking into by adding more testing) and the kv replication layer (for example, we once had a bug where cached raft entries at the leader were not reliably evicted when an uncommitted part of the raft log was discarded, so we'd sometimes send the wrong entry to a follower and it would thus diverge. #61990 (comment) is the most recent example of this). The import phase of the test takes around 30 minutes only, and we're interested in overload. I would thus say we try a repro cycle where we stay multi-region (in case that exacerbated anything), and we run the import but skip the rebalancing phase. When we then hit the cluster with 2000 warehouse TPCC, it will most likely overload the cluster (if that proves to be false, we start at a higher warehouse count like 2200). We run with an aggressive consistency check interval, too, as it is very valuable to stop the test close to where the issue happens. Since I expect that it will take a while to get a repro, we'll also put in the pebble ArchiveCleaner commit referenced above. We need to check that we're not running out of disk space; if that becomes an issue maybe we can hack around it by provisioning larger disks, or by catching disk-full conditions and ignoring the outcome. |
Going to see how many of the clusters for this test we can put into andrei-jepsen
|
^-- 10 clusters worked fine, so going to try making a few more. 💸 |
Currently, the metamorphic tests randomly generate a set of operations to perform on the Pebble instance. Support for single deletes exists, though the keys that have been deleted are not considered for reuse. Track keys that have been singly deleted, and randomly reuse these keys when generating subsequent operations to perform. Prior to this patch, the generation operation log would resemble the following: ``` db.Set("foo", "bar") ... batch54.SingleDelete("foo") ... // No more operations on key "foo". ``` With this patch, the following seqeunce of operations is permissible: ``` db.Set("foo", "bar") ... db.SingleDelete("foo") ... db.Set("foo", "baz") ... db.Merge("foo", "bam") ... db.Set("foo", "boom") ... // Subsequent operations on key "foo" are permissible. ``` Related to cockroachdb/cockroach#69414.
Generated SINGLEDEL operations are eligible for transformation into less restrictive DELETE operations. Transform a fixed percentage of SINGLEDEL operations at generation time into DELETEs to further exercise the delete execution paths. Related to cockroachdb/cockroach#69414.
Currently, sequences of operations for metamorphic tests are generated for certain specific cases (e.g. a SINGLEDEL follows a key that has been SET once). Additional test sequences require maintaining the "state" of the sequence in the `generator`, which clutters the struct with various fields required for the state management. Add a `sequenceGenerator` struct, which uses a "transition map" (a mapping from a current state to next state, along with a corresponding output for the transition) to model a state machine. The state machine can be used to generate random sequences of operations that adhere to certain rules (e.g. output a GET following a SET for a given key). A `generator` can be constructed containing one or more `sequenceGenerator`s that, when selected by the random number generator (i.e. the "deck"), generate the next operation in the sequence governed by the state machine and place the operation into the operation log. Related to cockroachdb/cockroach#69414.
Add a `sequenceGenerator` instance with a transition map that generates the following sequence of operations, similar to the problematic sequence generated for cockroachdb/cockroach#69414: ``` ((SET -> GET)+ -> DELETE -> GET)+ -> SINGLEDEL -> (GET)+ ``` See also cockroachdb/cockroach#69891.
Updating this to release-blocker, as discussed on the release triage meeting yesterday. This will mean no beta will be released until this issue is addressed. |
I haven't looked at this code in a while, but while my assumption was that if a ResolveIntent comes in at a newer epoch, we would keep the existing intent and rewrite it to that epoch, but apparently we "just" remove it. This behavior might make sense in practice (now that our concurrency control is cooperative), but it was not what I remembered, and plays a prominent role in the correctness bug cockroachdb#69414. Still need to figure out what to do with this. Comments definitely refer to the behavior I remember. It's possible that we changed this by accident. Will investigate more. Release justification: testing improvement related to release blocker cockroachdb#69414. Release note: None
Fixes cockroachdb#69414. Only use SingleDel if - the meta tracking allows it (i.e. the sole previous condition), and - epoch zero, and - no ignored txn seqno ranges. These three together imply that the engine history of the metadata key is a single `Set`, so we can safely use `SingleDel`. This is a below-Raft change, but no mixed-version problems are expected. This is because using a Del vs using a SingleDel (if it does not cause an anomaly) is not visible at the replica state level. In particular, a deleted key looks the same to the consistency checker as a single-deleted key. Release justification: fix for a release blocker Release note: None
Fixes cockroachdb#69414. Only use SingleDel if - the meta tracking allows it (i.e. the sole previous condition), and - epoch zero, and - no ignored txn seqno ranges. These three together imply that the engine history of the metadata key is a single `Set`, so we can safely use `SingleDel`. This is a below-Raft change, but no mixed-version problems are expected. This is because using a Del vs using a SingleDel (if it does not cause an anomaly) is not visible at the replica state level. In particular, a deleted key looks the same to the consistency checker as a single-deleted key. Release justification: fix for a release blocker Release note: None
Generated SINGLEDEL operations are eligible for transformation into less restrictive DELETE operations. Transform a fixed percentage of SINGLEDEL operations at generation time into DELETEs to further exercise the delete execution paths. Related to cockroachdb/cockroach#69414.
Generated SINGLEDEL operations are eligible for transformation into less restrictive DELETE operations. Transform a fixed percentage of SINGLEDEL operations at generation time into DELETEs to further exercise the delete execution paths. Related to cockroachdb/cockroach#69414.
Currently, by default, the metamorphic tests use a MemFS-backed DB by default. The `-disk` flag can be used to override all tests to use a disk-backed filesystem. The behavior is "all or nothing". To better simulate use of Pebble with a real, disk-backed filesystem, randomly generate test configurations that use disk. Tweak the flags to allow specifying an override filesystem type. In the case that "mem" or "disk" are specified, all metamorphic tests will use that FS type, ignoring the randomly generated FS type. The default is to use the randomly generated FS types. Disk-backed filesystems are used 10% of the time for the randomly generated configurations. Related to cockroachdb/cockroach#69414.
Fixes cockroachdb#69414. Only use SingleDel if - the meta tracking allows it (i.e. the sole previous condition), and - epoch zero, and - no ignored txn seqno ranges. These three together imply that the engine history of the metadata key is a single `Set`, so we can safely use `SingleDel`. Release justification: fix for a release blocker Release note: None
69806: kv/kvserver: use generalized engine keys in debug printing r=AlexTalks a=AlexTalks Previously the CLI debug command only printed `MVCCKey`s, resulting in the debug printer to error on key decoding whenever a `LockTableKey` was encountered. By switching to the more generalized `EngineKey` (and utilizing the existing MVCC key formatting whenever the key has an MVCC version), we can increase visibility into our debug logs while investigating issues. Related to #69414 Release justification: Non-production bug fix Release note: None 69944: stats: add a histogram version number r=rytaft a=rytaft This commit adds a histogram version number to the `HistogramData` proto. This will allow us to identify what logic was used to construct a particular histogram and possibly debug future issues. Release note: None Release justification: Low risk, high benefit change to existing functionality. 69963: sql: skip reset sql stats in TestRandomSyntaxFunctions r=maryliag,rafiss a=Azhng Previously, crdb_internal.reset_sql_stats() causes timeout in TestRandomSyntaxFunctions. This is very unlikely due to implementation of the function, and it is likely caused by contentions. This commit skip the tests for crdb_internal.reset_sql_stats() to prevent nightly failures. Related #69731 Release justification: Non-production code changes Release note: None 69967: vendor: bump Pebble to 6c12d67b83e6 r=jbowens a=jbowens ``` 6c12d67 internal/metamorphic: randomize FormatMajorVersion e82fb10 db: randomize format major version in unit tests 535b8d6 db: add FormatUpgrade event to EventListener 53dda0f db: introduce format major version 8ec1a49 vfs/atomicfs: add ReadMarker daf93f0 sstable: Free cache value when checksum type is corrupt d89613d metamorphic: randomly use disk for tests e3b6bec metamorphic: transform percentage of SINGLEDEL ops to DELETE ops 41239f8 db: add test demonstrating current SINGLEDEL behavior ``` Release note: none Release justification: non-production code changes, and bug fix necessary for a release blocker. 69974: backupccl: set sqlstats testing knobs for scheduled job test r=maryliag,miretskiy a=Azhng Previsouly, backupccl tests did not set sql stats AOST testing knob to override the AOST behavior. This causes sql stats error stack trace to show up in backupccl tests. This commit added sql stats testing knobs for backupccl test helpers to mitigate this. Release justification: Non-production code changes Release note: None Co-authored-by: Alex Sarkesian <[email protected]> Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Azhng <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
69923: storage: narrow down use of SingleDel to avoid anomalies r=sumeerbhola a=tbg Fixes #69414. Only use SingleDel if - the meta tracking allows it (i.e. the sole previous condition), and - epoch zero, and - no ignored txn seqno ranges. These three together imply that the engine history of the metadata key is a single `Set`, so we can safely use `SingleDel`. Release justification: fix for a release blocker Release note: None Co-authored-by: Tobias Grieger <[email protected]>
Fixes #69414. Only use SingleDel if - the meta tracking allows it (i.e. the sole previous condition), and - epoch zero, and - no ignored txn seqno ranges. These three together imply that the engine history of the metadata key is a single `Set`, so we can safely use `SingleDel`. Release justification: fix for a release blocker Release note: None
69658: spanconfig: disable infrastructure unless envvar is set r=irfansharif a=irfansharif Cluster settings are too easy a switch to reach for to enable the new span configs machinery. Let's gate it behind a necessary envvar as well and use cluster settings to selectively toggle individual components. This commit also fixes a mighty silly bug introduced in #69047; for the two methods we intended to gate use `spanconfig.experimental_kvaccessor.enabled`, we were checking the opposite condition or not checking it at all. Oops. Release note: None Release justification: non-production code changes 69809: kv/kvserver: use proper formatting when debug printing intents r=AlexTalks a=AlexTalks This commit changes the formatting used when printing intents via the CLI debug command from the default generated Protobuf formatter to our custom `MVCCMetadata` formatter implementation. Additionally, the `MergeTimestamp` and `TxnDidNotUpdateMetadata` fields have been added to the output. This changes the debug formatting from the former example: ``` 0,0 /Local/RangeID/203/r/RangePriorReadSummary (0x0169f6cb727270727300): {Txn:<nil> Timestamp:0,0 Deleted:false KeyBytes:0 ValBytes:0 RawBytes:[230 123 85 161 3 10 12 10 10 8 146 229 195 204 139 135 186 208 22 18 12 10 10 8 146 229 195 204 139 135 186 208 22] IntentHistory:[] Me rgeTimestamp:<nil> TxnDidNotUpdateMeta:<nil>} /Local/Lock/Intent/Table/56/1/1319/6/3055/0 0361fea07d3f0d40ba8f44dc4ee46cbdc2 (0x017a6b12c089f705278ef70bef880001000361fea07d3f0d40ba8f44dc4ee46cbdc212): 1630559399.176502568,0 {Txn:id=61fea07d key=/Table/57/1/1319/6/0 pri=0.01718258 epo=0 ts=1630559399.176502568,0 min=1630559399.176502568,0 seq=4 Timestamp:1630559399.176502568,0 Deleted:false KeyBytes:12 ValBytes:5 RawBytes:[] IntentHistory:[] MergeTimestamp:<nil> TxnDidNotUpdateMeta:0xc0016059b0} ``` to the following example: ``` 0,0 /Local/RangeID/203/r/RangePriorReadSummary (0x0169f6cb727270727300): txn={<nil>} ts=0,0 del=false klen=0 vlen=0 raw=/BYTES/0x0a0c0a0a0892e5c3cc8b87bad016120c0a0a0892e5c3cc8b87bad016 mergeTs=<nil> txnDidNotUpdateMeta=false /Local/Lock/Intent/Table/56/1/1319/6/3055/0 0361fea07d3f0d40ba8f44dc4ee46cbdc2 (0x017a6b12c089f705278ef70bef880001000361fea07d3f0d40ba8f44dc4ee46cbdc212): 1630559399.176502568,0 txn={id=61fea07d key=/Table/57/1/1319/6/0 pri=0.01718258 epo=0 ts=1630559399.176502568,0 min=1630559399.176502568,0 seq=4} ts=1630559399.176502568,0 del=false klen=12 vlen=5 mergeTs=<nil> txnDidNotUpdateMeta=true ``` Related to #69414 Release justification: Bug fix Release note: None Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Alex Sarkesian <[email protected]>
Fixes #69414. Only use SingleDel if - the meta tracking allows it (i.e. the sole previous condition), and - epoch zero, and - no ignored txn seqno ranges. These three together imply that the engine history of the metadata key is a single `Set`, so we can safely use `SingleDel`. Release justification: fix for a release blocker Release note: None
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ ab1fc343c9a1140191f96353995258e609a84d02:
Reproduce
See: roachtest README
This test on roachdash | Improve this report!
The text was updated successfully, but these errors were encountered: