storage: write at provisional commit ts, not orig ts #32487

benesch · 2018-11-19T22:19:16Z

The fact that transactions write at their original timestamp, and not
their provisional commit timestamp, allows leaving an intent under a
read. The timestamp cache will ensure that the transaction can't
actually commit unless it can bump its intents above the read, but it
will still leave an intent under the read in the meantime.

This can lead to starvation. Intents are meant to function as a sort of
lock on a key. Once a writer lays down an intent, no readers should be
allowed to read above that intent until that intent is resolved.
Otherwise a continual stream of readers could prevent the writer from
ever managing to commit by continually bumping the timestamp cache.

Now consider how CDC's poller works: it reads (tsmin, ts1], then (ts1,
ts2], then (ts2, ts3], and so on, in a tight loop. Since it uses a
time-bound iterator under the hood, reading (ts2, ts3], for example,
cannot return an intent written at ts2. But the idea was that we
inductively guranteed that we never read above an intent. If an intent
was written at ts2, even though the read from (ts2, ts3] would fail to
observe it, the previous read from (ts1, ts2] would have.

Of course, since transactions write at their original timestamp, a
transaction with an original timestamp of ts2 can write an intent at ts2
after the CDC poller has read (ts1, ts2]. (The transaction will be
forced to commit at ts3 or later to be sequenced after the CDC poller's
read, but it will leave the intent at ts2.) The CDC poller's next read,
from (ts2, ts3], thus won't see the intent, nor will any future reads at
higher timestamps. And so the CDC poller will continually bump the
timestamp cache, completely starving the writer.

Fix the problem by writing at the transaction's provisional commit
timestamp (i.e., the timestamp that has been forwarded by the timestamp
cache, if necessary) instead of the transaction's original timestamp.
Writing at the original timestamp was only necessary to prevent a lost
update in snapshot isolation mode, which is no longer supported. In
serializable mode, the anomaly is protected against by the read refresh
mechanism.

Besides fixing the starvation problem, the new behavior is more
intuitive than the old behavior. It also might have some performance
benefits, as it is less likely that intents will need to be bumped at
commit time, which saves on RocksDB writes.

Touches #32433.

Release note: None

cockroach-teamcity · 2018-11-19T22:19:23Z

This change is

benesch · 2018-11-19T22:28:28Z

The big downside of this change is that we really want TBIs to be useful for CDC in 2.1. CDC in 2.2 will make more use of range feeds and less use of polling so TBIs will be less necessary. But this change is definitely too scary to backport to 2.1.

tbg · 2018-11-19T22:45:43Z

Are we exploring ripping out TBIs and pushing logic to C++? That sounded like the least scary venue.

benesch · 2018-11-19T22:59:30Z

Are we exploring ripping out TBIs and pushing logic to C++? That sounded like the least scary venue.

My gut says that that won't be comparable to the benefit you get when you only need to scan the memtable and some L0 files. There is a fundamental limit to how fast scanning n keys can be, and TBIs are currently our only way of reducing n rather than the constant factor.

Perhaps we'd get lucky and things would be fast enough though. It's definitely worth an experiment.

bdarnell · 2018-11-19T23:38:00Z

The CDC poller's next read,
from (ts2, ts3], thus won't see the intent, nor will any future reads at
higher timestamps. And so the CDC poller will continually bump the
timestamp cache, completely starving the writer.

Oh, this is subtle. The CDC poller doesn't actually care about uncommitted intents, and it will see the committed value at its new timestamp when and if it is able to commit. But the tight loop starves out the transaction so it can't commit.

How does this change solve the problem? We write above the poller's read, but what stops the poller from advancing the timestamp cache again? Is it just that the poller stops advancing while there is a pending intent?

We have some tests that use SNAPSHOT isolation; these tests will (hopefully) become flaky with a change like this. We should get rid of tests that use SNAPSHOT before we make changes that will break them.

But this change is definitely too scary to backport to 2.1.

We can't break SNAPSHOT in 2.1.x; clusters being upgraded from 2.0 may still perform snapshot-isolation transactions in a mixed-version configuration.

If reads still use OrigTimestamp but writes use Timestamp, how does that affect commands that do both like ConditionalPut?

benesch · 2018-11-20T01:42:18Z

Oh, this is subtle. The CDC poller doesn't actually care about uncommitted intents, and it will see the committed value at its new timestamp when and if it is able to commit. But the tight loop starves out the transaction so it can't commit.

I think it's actually possible that the CDC poller misses the value entirely. Say the poller reads at (ts1, ts2], and a concurrent transaction writes at ts1 (its orig timestamp) while getting pushed to ts3. If that transaction takes a while to send its commit RPC, the CDC poller might read (ts2, ts3] before the intent at ts3 gets resolved. And then it will be reading (ts3, ts4] by the time the intent gets resolved and never see the value committed at ts3.

I could be off base here, but I think the intuition is that an intent is a lock on [t_intent, ∞). You can't say that you've read everything for t ≥ t_intent until that intent is resolved because the intent could commit at any such t.

How does this change solve the problem? We write above the poller's read, but what stops the poller from advancing the timestamp cache again? Is it just that the poller stops advancing while there is a pending intent?

Right. If the poller's ExportRequest catches a WriteIntentError, it won't update the timestamp cache, and so the intent can get bumped.

We can't break SNAPSHOT in 2.1.x; clusters being upgraded from 2.0 may still perform snapshot-isolation transactions in a mixed-version configuration.

Good point.

If reads still use OrigTimestamp but writes use Timestamp, how does that affect commands that do both like ConditionalPut?

I don't know. I suspect it works fine because all reads get refreshed to the commit timestamp anyway, so we're just doing extra work. But I need to think about this more.

andreimatei

LGTM mod the fact that this really needs tests for all the changes in applyTimestampCache.

I think we should couple this patch with a change to PushTxn which makes a push succeed if it's trying to push the txn at a ts <= its current timestamp (i.e. the timestamps known by the txn record). Like, if txn1 wrote at ts 5, and then it got bumped to ts 10, and now we have a read at ts 7, that read should succeed.
Otherwise, with this change, we can have situations where, because a txn now is leaving intents at different timestamps, whether or not two txn conflict can depend on which intent from the first the second one runs into.
In other words, I'm not so sure about the "the new behavior is more
intuitive than the old behavior" claim; the old one seems more intuitive to me. But otherwise I do think I like this change - there are benefits to writing at timestamps as high as possible.

andreimatei · 2018-11-20T20:04:06Z

pkg/roachpb/data.proto

-  // but to avoid being starved by short-lived transactions on that key which
-  // would otherwise not have to go through conflict resolution with txn1.
-  //
-  // Again, keep in mind that, when the transaction commits, all the intents are


this part of the comment should be kept somewhere. The comment on the timestamp field should be expended to explain how its used for writes, and how a txn might be laying down intents at different timestamps as it goes along.

andreimatei · 2018-11-20T20:05:03Z

pkg/storage/replica.go

@@ -2746,6 +2746,7 @@ func (r *Replica) applyTimestampCache(
 						txn := ba.Txn.Clone()
 						bumped = txn.Timestamp.Forward(nextTS) || bumped
 						ba.Txn = &txn
+						ba.Timestamp = txn.Timestamp


please add a comment at this change and below, reiterating that we want to write at the bumped ts.

andreimatei · 2018-11-20T20:16:04Z

Oh I wanted to say something else too - let's also couple this patch with deleting every trace of snapshot txns from our code. You can do it in a different commit if you want to backport the first (although the backport I guess would need to be put behind a cluster version per Ben's comments).

bdarnell · 2018-11-20T20:18:28Z

although the backport I guess would need to be put behind a cluster version per Ben's comments

And backports aren't allowed to introduce cluster versions unless it's a really exceptional case.

andreimatei · 2018-11-20T20:22:22Z

And backports aren't allowed to introduce cluster versions unless it's a really exceptional case.

We were just discussing offline how you don't really need a new cluster version, you could just gate on the 2.1 one.

Snapshot isolation was disabled in v2.1, so it is safe to remove all code paths that handle snapshot isolation in v2.2. This is a necessary precursor to cockroachdb#32487. Release note: None

Snapshot isolation was disabled in v2.1, so it is safe to remove all code paths that handle snapshot isolation in v2.2. Note that the change to the MVCCMetadata checksum in pkg/storage/below_raft_protos_test.go is safe, because the actual encoding of the protobuf is not affected by this change. What is affected is NewPopulatedMVCCMetadata, which no longer returns a TxnMeta with a non-zero IsolationType, since the IsolationType field has been removed. This is a necessary precursor to cockroachdb#32487. Release note: None

32860: kv,storage,sql: purge snapshot isolation r=andreimatei,nvanbenschoten,bdarnell a=benesch This touches a lot so I cast a wide net for reviewers. Feel free to remove yourself! --- Snapshot isolation was disabled in v2.1, so it is safe to remove all code paths that handle snapshot isolation in v2.2. This is a necessary precursor to #32487. Release note: None Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>

benesch

Ok, PTAL. I added a test and updated some more comments.

LGTM mod the fact that this really needs tests for all the changes in applyTimestampCache.

Done.

Oh I wanted to say something else too - let's also couple this patch with deleting every trace of snapshot txns from our code. You can do it in a different commit if you want to backport the first (although the backport I guess would need to be put behind a cluster version per Ben's comments)

Done in #32860.

I think we should couple this patch with a change to PushTxn which makes a push succeed if it's trying to push the txn at a ts <= its current timestamp (i.e. the timestamps known by the txn record). Like, if txn1 wrote at ts 5, and then it got bumped to ts 10, and now we have a read at ts 7, that read should succeed.

Isn't this already the case?

cockroach/pkg/storage/batcheval/cmd_push_txn.go

Lines 161 to 166 in 2aef042

    
           // If we're trying to move the timestamp forward, and it's already 
        
           // far enough forward, return success. 
        
           if args.PushType == roachpb.PUSH_TIMESTAMP && args.PushTo.Less(reply.PusheeTxn.Timestamp) { 
        
           	// Trivial noop. 
        
           	return result.Result{}, nil 
        
           }

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/replica.go, line 2749 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please add a comment at this change and below, reiterating that we want to write at the bumped ts.

Done.

benesch · 2018-12-18T04:36:55Z

It's greeeeen! 🍏📗💚🥗 Thank goodness.

bdarnell

Reviewed 12 of 12 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/engine/mvcc.go, line 1310 at r3 (raw file):

		// A txn writes intents at its provisional commit timestamp. See the
		// comment on the txn.Timestamp field definition for rationale.
		writeTimestamp.Forward(txn.Timestamp)

Should it be an error for txn.Timestamp to be less than timestamp?

pkg/storage/engine/mvcc_test.go, line 436 at r3 (raw file):

	txn := *txn1
	txn.Sequence++
	txn.Timestamp = hlc.Timestamp{WallTime: 1}

The tests would pass without this, right? (because of the way .Forward is used instead of just overwriting timestamp with txn.Timestamp)

pkg/storage/engine/mvcc_test.go, line 1114 at r3 (raw file):

	ts := []hlc.Timestamp{{Logical: 1}, {Logical: 2}, {Logical: 3}, {Logical: 4}, {Logical: 5}, {Logical: 6}}

	txn1ts := *txn1

Maybe the txn1 global should be replaced with a factory function since it's no longer used as-is anywhere (or is it?).

andreimatei

LGTM mod a comment on the interface of MVCCPut.

And a question: What is the timestamp used for populating the write timestamp cache (by DelRange)?

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/replica_test.go, line 2621 at r3 (raw file):

}

func TestReplicaTSCacheForwardsIntentTS(t *testing.T) {

pls add a comment to the test with some words

pkg/storage/replica_test.go, line 2666 at r3 (raw file):

				t.Fatal(pErr)
			}
			iter := tc.engine.NewIterator(engine.IterOptions{Prefix: true})

respect for going medieval

pkg/storage/engine/mvcc.go, line 1310 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Should it be an error for txn.Timestamp to be less than timestamp?

yeah, I also wonder. And this function's interface, taking both a txn and a timestamp, is confusing. Please add some comments to these args. Maybe the timestamp can be documented and asserted to be nil if a txn is passed.

pkg/storage/engine/enginepb/mvcc3.proto, line 63 at r3 (raw file):

  // original timestamp:
  //
  //    1. This timestamp is forwarded by the timestamp cache when this

good comment

benesch

Whew, ok, addressed all the feedback. PTAL as the required changes to the storage/engine tests were rather large.

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/replica_test.go, line 2621 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls add a comment to the test with some words

Done.

pkg/storage/engine/mvcc.go, line 1310 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

yeah, I also wonder. And this function's interface, taking both a txn and a timestamp, is confusing. Please add some comments to these args. Maybe the timestamp can be documented and asserted to be nil if a txn is passed.

Yes, it's extremely confusing. We can't use nil because hlc.Timestamp is passed by value. We don't really want to use hlc.Timestamp{} because that already means inline put. (Yes, technically you can't do an inline put within a txn, but seems bad to overload that sentinel.)

The best thing I came up with was to mandate that timestamp matches txn.OrigTimestamp (or txn.RefreshedTimestamp if that's set). This is quite redundant, but it matches today's usage and I can't come up with anything better that doesn't require a major refactor.

Note that enforcing this actually required a ton of changes to the tests in storage/engine, since those tests were frequently violating this new timestamp-must-be-the-txn's-read-timestamp rule.

pkg/storage/engine/mvcc_test.go, line 436 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The tests would pass without this, right? (because of the way .Forward is used instead of just overwriting timestamp with txn.Timestamp)

That's true in the revision you made this comment on, but is no longer true in the latest revision. See comment above for details.

pkg/storage/engine/mvcc_test.go, line 1114 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Maybe the txn1 global should be replaced with a factory function since it's no longer used as-is anywhere (or is it?).

There's still a bunch of tests that use it directly. (Many tests are happy with txn1's default timestamp.)

pkg/storage/engine/enginepb/mvcc3.proto, line 63 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

good comment

Thank you! :)

pkg/storage/replica.go, line 2749 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

Done.

No longer relevant.

andreimatei

I was asking:

What is the timestamp used for populating the write timestamp cache (by DelRange)?

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/engine/mvcc.go, line 1267 at r4 (raw file):

// for inline puts.)
//
// TODO(benesch): explore alternate function signatures that reduce

umm... I don't think you should be putting your name in TODOs any more (unless you've reconsidered :P).

I'd encourage you to add an options struct to this method that lets one clearly express inline vs intent vs non-transactional, and boil the ocean :)

benesch

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/engine/mvcc.go, line 1267 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

umm... I don't think you should be putting your name in TODOs any more (unless you've reconsidered :P).

I'd encourage you to add an options struct to this method that lets one clearly express inline vs intent vs non-transactional, and boil the ocean :)

Can I put your name on the TODO then? :sinceretroll:

I've got a bunch of merge stuff that's higher priority than this refactor, so I'm going to merge this as-is unless you object strongly. I think there's an 80% chance or so that I can actually get back to this refactor before I leave. (I've already done the refactor for MVCCGet (#33213) and MVCCScan (#32389).)

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/engine/mvcc.go, line 1267 at r4 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

Can I put your name on the TODO then? :sinceretroll:

I've got a bunch of merge stuff that's higher priority than this refactor, so I'm going to merge this as-is unless you object strongly. I think there's an 80% chance or so that I can actually get back to this refactor before I leave. (I've already done the refactor for MVCCGet (#33213) and MVCCScan (#32389).)

yeah, use my name

The fact that transactions write at their original timestamp, and not their provisional commit timestamp, allows leaving an intent under a read. The timestamp cache will ensure that the transaction can't actually commit unless it can bump its intents above the read, but it will still leave an intent under the read in the meantime. This can lead to starvation. Intents are meant to function as a sort of lock on a key. Once a writer lays down an intent, no readers should be allowed to read above that intent until that intent is resolved. Otherwise a continual stream of readers could prevent the writer from ever managing to commit by continually bumping the timestamp cache. Now consider how CDC's poller works: it reads (tsmin, ts1], then (ts1, ts2], then (ts2, ts3], and so on, in a tight loop. Since it uses a time-bound iterator under the hood, reading (ts2, ts3], for example, cannot return an intent written at ts2. But the idea was that we inductively guranteed that we never read above an intent. If an intent was written at ts2, even though the read from (ts2, ts3] would fail to observe it, the previous read from (ts1, ts2] would have. Of course, since transactions write at their original timestamp, a transaction with an original timestamp of ts2 can write an intent at ts2 *after* the CDC poller has read (ts1, ts2]. (The transaction will be forced to commit at ts3 or later to be sequenced after the CDC poller's read, but it will leave the intent at ts2.) The CDC poller's next read, from (ts2, ts3], thus won't see the intent, nor will any future reads at higher timestamps. And so the CDC poller will continually bump the timestamp cache, completely starving the writer. Fix the problem by writing at the transaction's provisional commit timestamp (i.e., the timestamp that has been forwarded by the timestamp cache, if necessary) instead of the transaction's original timestamp. Writing at the original timestamp was only necessary to prevent a lost update in snapshot isolation mode, which is no longer supported. In serializable mode, the anomaly is protected against by the read refresh mechanism. Besides fixing the starvation problem, the new behavior is more intuitive than the old behavior. It also might have some performance benefits, as it is less likely that intents will need to be bumped at commit time, which saves on RocksDB writes. Note that one conceptual oddity surrounds transactions reading their own writes: a transaction that is reading at ts1 and writing at ts2 will be able to read its own writes, even though simple timestamp comparison would make those writes invisible. This oddity is not introduced by this commit, though this commit does make this situation more common. Touches cockroachdb#32433. Release note: None

benesch · 2018-12-19T22:10:13Z

🎉 🎉 🎉

bors r=andreimatei,bdarnell

32487: storage: write at provisional commit ts, not orig ts r=andreimatei,bdarnell a=benesch The fact that transactions write at their original timestamp, and not their provisional commit timestamp, allows leaving an intent under a read. The timestamp cache will ensure that the transaction can't actually commit unless it can bump its intents above the read, but it will still leave an intent under the read in the meantime. This can lead to starvation. Intents are meant to function as a sort of lock on a key. Once a writer lays down an intent, no readers should be allowed to read above that intent until that intent is resolved. Otherwise a continual stream of readers could prevent the writer from ever managing to commit by continually bumping the timestamp cache. Now consider how CDC's poller works: it reads (tsmin, ts1], then (ts1, ts2], then (ts2, ts3], and so on, in a tight loop. Since it uses a time-bound iterator under the hood, reading (ts2, ts3], for example, cannot return an intent written at ts2. But the idea was that we inductively guranteed that we never read above an intent. If an intent was written at ts2, even though the read from (ts2, ts3] would fail to observe it, the previous read from (ts1, ts2] would have. Of course, since transactions write at their original timestamp, a transaction with an original timestamp of ts2 can write an intent at ts2 *after* the CDC poller has read (ts1, ts2]. (The transaction will be forced to commit at ts3 or later to be sequenced after the CDC poller's read, but it will leave the intent at ts2.) The CDC poller's next read, from (ts2, ts3], thus won't see the intent, nor will any future reads at higher timestamps. And so the CDC poller will continually bump the timestamp cache, completely starving the writer. Fix the problem by writing at the transaction's provisional commit timestamp (i.e., the timestamp that has been forwarded by the timestamp cache, if necessary) instead of the transaction's original timestamp. Writing at the original timestamp was only necessary to prevent a lost update in snapshot isolation mode, which is no longer supported. In serializable mode, the anomaly is protected against by the read refresh mechanism. Besides fixing the starvation problem, the new behavior is more intuitive than the old behavior. It also might have some performance benefits, as it is less likely that intents will need to be bumped at commit time, which saves on RocksDB writes. Touches #32433. Release note: None Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>

craig · 2018-12-19T22:28:40Z

Build succeeded

GitHub CI (Cockroach)

33534: Revert "Revert "engineccl: ignore intents beneath start in MVCCIncrem… r=bdarnell,tbg a=benesch …entalIterator"" This reverts commit ceb4072. This previously caused flakiness in some CDC tests, but that's been fixed by #32487. Release note: None Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>

…rrors Fixes cockroachdb#34853. This change fixes cockroachdb#34853, which appears to have been broken by cockroachdb@57d0201#diff-41e9e1b7829f8e25daeb564f42c26017L1514. That change made it so that intents are now written at a transaction's provisional commit timestamp instead of its original timestamp (which is fantastic!). In doing so, it removed special purpose code to forward the `MVCCLogicalOpDetails` timestamp to the provisional commit timestamp of the transaction because it assumed that the new `writeTimestamp` variable that the change introduced already assumed this role. Unfortunately, the change allowed this `writeTimestamp` variable to regress below the transaction's provisional commit timestamp in subtle cases where a committed value ended up between the transaction's original timestamp and its provisional commit timestamp (a case that generates `WriteTooOldError`s). In this case, we were replacing the `writeTimestamp` with the committed value's timestamp + 1 logical tick. This regression allowed `MVCCWriteIntentOp`s to look like they were ignoring closed timestamps. The fix was to either introduce the special purpose code again or to avoid letting `writeTimestamp` regress in this `WriteTooOldError` code path. This PR chose the latter solution. In doing so, this PR also improves upon the benefit already provided by cockroachdb#32487 - it reduces the number of intents that need to be moved on transaction commit. This is because it ensures that if a transaction has an original (read) timestamp below a committed value and a provisional commit (write) timestamp above a committed value, an intent left by that transaction will be in a "committable" state without a need to rewrite it during intent resolution. Release note: None

35081: sql: Fix test flake r=andy-kimball a=andy-kimball Filter out PushTxn/ResolveIntent trace messages, since they represent intent cleanup from earlier operations that interferes with subsequent tests. This was already being done in other variations, so this commit extends that to all the variations that are doing filtering of messages. Fixes #34911 Release note: None 35086: mvcc: don't write below provisional commit timestamp on WriteTooOld errors r=tbg a=nvanbenschoten Fixes #34853. This change fixes #34853, which appears to have been broken by 57d0201#diff-41e9e1b7829f8e25daeb564f42c26017L1514. That change made it so that intents are now written at a transaction's provisional commit timestamp instead of its original timestamp (which is fantastic!). In doing so, it removed special-purpose code to forward the `MVCCLogicalOpDetails` timestamp to the provisional commit timestamp of the transaction because it assumed that the new `writeTimestamp` variable that the change introduced already assumed this role. Unfortunately, the change allowed this `writeTimestamp` variable to regress below the transaction's provisional commit timestamp in subtle cases where a committed value ended up between the transaction's original timestamp and its provisional commit timestamp (a case that generates `WriteTooOldError`s). In this case, we were replacing the `writeTimestamp` with the committed value's timestamp + 1 logical tick. This regression allowed `MVCCWriteIntentOp`s to look like they were ignoring closed timestamps. The fix was to either introduce the special purpose code again or to avoid letting `writeTimestamp` regress in this `WriteTooOldError` code path. This PR chose the latter solution. In doing so, this PR also improves upon the benefit already provided by #32487 - it reduces the number of intents that need to be moved on transaction commit. This is because it ensures that if a transaction has an original (read) timestamp below a committed value and a provisional commit (write) timestamp above a committed value, an intent left by that transaction will be in a "committable" state without a need to rewrite it during intent resolution. Release note: None 35112: kv: assert an absence of retries in TestTxnCoordSenderRetries r=tbg,andreimatei a=nvanbenschoten The test was only asserting that cases that should observe retries did, not the cases that should not observe retries did not. The PR also fixes `TestPropagateTxnOnError`, which wasn't really testing anything. We now set txn.Writing on the client, so it wasn't verifying what it thought it was. We switch it to assert that observed timestamps are propagated on error instead. 35118: storage: Downgrade a spammy log message r=tbg a=bdarnell This log message only occurs on clusters with multiple stores per node, but on those clusters it fires multiple times per second. Release note: None Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Ben Darnell <ben@bendarnell.com>

benesch requested review from bdarnell, andreimatei, tbg and a team November 19, 2018 22:19

benesch changed the title ~~storage: write at provisional commit ts, not orig ts~~ [semiwip] storage: write at provisional commit ts, not orig ts Nov 19, 2018

andreimatei suggested changes Nov 20, 2018

View reviewed changes

benesch mentioned this pull request Dec 5, 2018

kv,storage,sql: purge snapshot isolation #32860

Merged

benesch force-pushed the wts branch 2 times, most recently from a801911 to 26ea8f2 Compare December 10, 2018 22:47

benesch commented Dec 10, 2018

View reviewed changes

benesch changed the title ~~[semiwip] storage: write at provisional commit ts, not orig ts~~ storage: write at provisional commit ts, not orig ts Dec 10, 2018

benesch mentioned this pull request Dec 12, 2018

storage: remove decoy batch timestamp handler #33079

Merged

benesch force-pushed the wts branch from 26ea8f2 to d125c37 Compare December 14, 2018 20:29

benesch requested review from a team December 14, 2018 20:29

benesch force-pushed the wts branch from d91baa0 to 4fddb13 Compare December 18, 2018 04:06

bdarnell approved these changes Dec 18, 2018

View reviewed changes

andreimatei approved these changes Dec 18, 2018

View reviewed changes

benesch force-pushed the wts branch 4 times, most recently from 39dba20 to 50869b1 Compare December 19, 2018 06:32

benesch commented Dec 19, 2018

View reviewed changes

benesch force-pushed the wts branch from 50869b1 to 1313362 Compare December 19, 2018 15:08

benesch requested a review from a team December 19, 2018 15:08

andreimatei approved these changes Dec 19, 2018

View reviewed changes

benesch force-pushed the wts branch from 1313362 to 0e620dc Compare December 19, 2018 21:36

benesch commented Dec 19, 2018

View reviewed changes

andreimatei approved these changes Dec 19, 2018

View reviewed changes

benesch force-pushed the wts branch from 0e620dc to 57d0201 Compare December 19, 2018 22:09

craig bot merged commit 57d0201 into cockroachdb:master Dec 19, 2018

benesch deleted the wts branch December 19, 2018 22:43

benesch mentioned this pull request Dec 19, 2018

teamcity: failed test: TestChangefeedSchemaChangeNoBackfill #32433

Closed

benesch mentioned this pull request Jan 6, 2019

Revert "Revert "engineccl: ignore intents beneath start in MVCCIncrem… #33534

Merged

nvanbenschoten mentioned this pull request Feb 20, 2019

mvcc: don't write below provisional commit timestamp on WriteTooOld errors #35086

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: write at provisional commit ts, not orig ts #32487

storage: write at provisional commit ts, not orig ts #32487

benesch commented Nov 19, 2018 •

edited

Loading

cockroach-teamcity commented Nov 19, 2018

benesch commented Nov 19, 2018

tbg commented Nov 19, 2018

benesch commented Nov 19, 2018

bdarnell commented Nov 19, 2018

benesch commented Nov 20, 2018 •

edited

Loading

andreimatei left a comment

andreimatei Nov 20, 2018

andreimatei Nov 20, 2018

andreimatei commented Nov 20, 2018

bdarnell commented Nov 20, 2018

andreimatei commented Nov 20, 2018

benesch left a comment

benesch commented Dec 18, 2018

bdarnell left a comment

andreimatei left a comment

benesch left a comment

andreimatei left a comment

benesch left a comment

andreimatei left a comment

benesch commented Dec 19, 2018

craig bot commented Dec 19, 2018

	// If we're trying to move the timestamp forward, and it's already
	// far enough forward, return success.
	if args.PushType == roachpb.PUSH_TIMESTAMP && args.PushTo.Less(reply.PusheeTxn.Timestamp) {
	// Trivial noop.
	return result.Result{}, nil
	}

storage: write at provisional commit ts, not orig ts #32487

storage: write at provisional commit ts, not orig ts #32487

Conversation

benesch commented Nov 19, 2018 • edited Loading

cockroach-teamcity commented Nov 19, 2018

benesch commented Nov 19, 2018

tbg commented Nov 19, 2018

benesch commented Nov 19, 2018

bdarnell commented Nov 19, 2018

benesch commented Nov 20, 2018 • edited Loading

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei Nov 20, 2018

Choose a reason for hiding this comment

andreimatei Nov 20, 2018

Choose a reason for hiding this comment

andreimatei commented Nov 20, 2018

bdarnell commented Nov 20, 2018

andreimatei commented Nov 20, 2018

benesch left a comment

Choose a reason for hiding this comment

benesch commented Dec 18, 2018

bdarnell left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

benesch commented Dec 19, 2018

craig bot commented Dec 19, 2018

Build succeeded

benesch commented Nov 19, 2018 •

edited

Loading

benesch commented Nov 20, 2018 •

edited

Loading