Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hlc: introduce synthetic flag on timestamps #56373

Merged
merged 5 commits into from
Nov 13, 2020

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Nov 6, 2020

Informs #52745.
Informs #36431.

This commit introduces an 8-bit flags field on the hlc timestamp struct. The flags are used to provide details about the timestamp and its meaning. They do not affect the sort order of Timestamps.

The commit then introduces the first flag: SYNTHETIC. As discussed in #52745, a synthetic timestamp is defined as a timestamp that makes no claim about the value of clocks in the system. While standard timestamps are pulled from HLC clocks and indicate that some node in the system has a clock with a reading equal to or above its value, a synthetic timestamp makes no such indication. By avoiding a connection to "real time", synthetic timestamps can be used to write values at a future time and to indicate that observed timestamps do not apply to such writes for the purposes of tracking causality between the write and its observers. Observed timestamps will be a critical part of implementing non-blocking transactions (#52745) and fixing the interaction between observed timestamps and transaction refreshing (#36431).

The original plan was to reserve the high-order bit in the logical portion of a timestamp as a "synthetic bit". This is how I began implementing things, but was turned off for a few reasons. First, it was fairly subtle and seemed too easy to get wrong. Using a separate field is more explicit and avoids a class of bugs. Second, I began to have serious concerns about how the synthetic bit would impact timestamp ordering. Every timestamp comparison would need to mask out the bit or risk being incorrect. This was even true of the LSM custom comparator. This seemed difficult to get right and seemed particularly concerning since we're planning on marking only some of a transaction's committed values as synthetic to fix #36431, so if we weren't careful, we could get atomicity violations. There were also minor backwards compatibility concerns.

But a separate field is more expensive in theory, so we need to be careful. However, it turns out that a separate field is mostly free in each case that we care about. In memory, the separate field is effectively free because the Timestamp struct was previously 12 bytes but was always padded out to 16 bytes when included as a field in any other struct. This means that the flags field is replacing existing padding. Over the wire, the field will not be included when zero and will use a varint encoding when not zero, so again, it is mostly free. In the engine key encoding, the field is also not included when zero, and takes up only 1 byte when non-zero, so it is mostly free.


First three commits from #56477.

@sumeerbhola I'm hoping you can take a look at the engine-level changes in the introduce synthetic flag on timestamps commit (4th commit as of the time of writing). I think the key encoding added here makes sense, but want to make sure you're on board. One possible concern is that we introduce a new 13-byte suffix, which means that combined with a 4-byte sequence number (see #41720 (comment)), we'd collide with the 17 byte engineKeyVersionLockTableLen.

@tbg do you mind being the primary reviewer here? I think you know the most about the motivations for this change and will have a good sense of whether this is the best way to introduce additional state on timestamps.

@nvanbenschoten nvanbenschoten requested review from tbg, sumeerbhola and a team November 6, 2020 17:50
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner November 6, 2020 17:50
@nvanbenschoten nvanbenschoten requested review from miretskiy and removed request for a team November 6, 2020 17:50
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten removed request for miretskiy and a team November 6, 2020 17:51
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/timestampMask branch from fc19b2a to 0cc3a6a Compare November 6, 2020 19:16
@tbg
Copy link
Member

tbg commented Nov 6, 2020 via email

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Nov 9, 2020
…rved timestamps

This commit introduces a new package which contains logic and documentation
related to the observed timestamp system, which allows transactions to track
causality between themselves and other, possibly-concurrent, transactions in
order to avoid uncertainty related restarts.

This ties into our efforts to better document the kv layer.

It also ties into synthetic timestamps (cockroachdb#56373), which are important for
non-blocking transactions (cockroachdb#52745) and for fixing the interaction between
observed timestamps and transaction refreshing (cockroachdb#36431). The proposed fix is to
split out a new EffectiveMaxTimestamp field from the MaxTimestamp field on a
transaction and set the EffectiveMaxTimestamp in LimitTxnMaxTimestamp. The
EffectiveMaxTimestamp field will dictate the uncertainty interval for values
with normal timestamps while the original MaxTimestamp will dictate the
uncertainty interval for values with synthetic timestamps. In practice, this
will mean that observed timestamps do not apply to values with synthetic
timestamps. This commit gives us a package to house this new complexity.
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments only -- I haven't yet looked at the last 2 commits which tbg will be reviewing, but I'll try to follow along.

Reviewed 12 of 12 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, 27 of 29 files at r4, 1 of 1 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @sumeerbhola, and @tbg)


pkg/roachpb/data.go, line 1101 at r4 (raw file):

		}

		if t.ReadTimestamp == o.ReadTimestamp {

why is the synthetic bit relevant here?


pkg/util/hlc/legacy_timestamp.proto, line 41 at r4 (raw file):

  // == operator). Consider use of the EqOrdering method when testing for
  // equality.
  optional uint32 flags = 3;

is this nullable because it will be serialized in MVCCMetadata and we don't want to pay the serialized cost when it is not set?


pkg/util/hlc/timestamp.go, line 272 at r4 (raw file):

func (t *Timestamp) Backward(s Timestamp) {
	// TODO(nvanbenschoten): if either timestamp is non-synthetic, we can remove
	// the synthetic bit.

Maybe phrase this TODO as: remove the synthetic bit from the older value if the younger one is non-synthetic.


pkg/util/hlc/timestamp.go, line 283 at r4 (raw file):

}

// ToLegacyTimestamp converts a Timestamp to a LegacyTimestamp.

Is it possible to get rid of LegacyTimestamp?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/timestampMask branch from 0cc3a6a to 10d9549 Compare November 10, 2020 03:21
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)


pkg/roachpb/data.go, line 1101 at r4 (raw file):

Previously, sumeerbhola wrote…

why is the synthetic bit relevant here?

It isn't really relevant. This is just switching from Equal to ==, which has the same semantics but it more idiomatic to test for structural equality.


pkg/util/hlc/legacy_timestamp.proto, line 41 at r4 (raw file):

Previously, sumeerbhola wrote…

is this nullable because it will be serialized in MVCCMetadata and we don't want to pay the serialized cost when it is not set?

Kind of. It's nullable because we need identical encoding below Raft to ensure that all replicas stay bytewise identical even in a mixed version cluster. This is the same reason why we've never been able to completely remove LegacyTimestamp. So since the field won't even exist on v20.2 nodes, we need to make sure it doesn't get serialized on v21.1 nodes.


pkg/util/hlc/timestamp.go, line 272 at r4 (raw file):

Previously, sumeerbhola wrote…

Maybe phrase this TODO as: remove the synthetic bit from the older value if the younger one is non-synthetic.

Done.


pkg/util/hlc/timestamp.go, line 283 at r4 (raw file):

Previously, sumeerbhola wrote…

Is it possible to get rid of LegacyTimestamp?

I'm trying, but it's putting up a fight: #56190. It's also used in MVCCMetadata, which is written to "below Raft" in a number of places. I think we're going to need to run an online migration if we ever want to remove it entirely.

@tbg tbg requested a review from sumeerbhola November 10, 2020 14:13
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still anxious that synthetic and regular timestamps will have an opportunity to mix when they shouldn't, given how much existing code is out there. In particular, .Forward() and .Backward() seem susceptible. I wish we had proper enum types.
From plowing through the code so far, are you worried at all? And/or do you have ideas on getting at least some amount of separation between them back (clearly separate types are not an option)?

Reviewed 41 of 41 files at r6, 4 of 4 files at r7, 1 of 1 files at r8, 5 of 5 files at r9, 35 of 35 files at r10, 2 of 2 files at r11, 53 of 53 files at r12, 8 of 8 files at r13.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/util/hlc/legacy_timestamp.proto, line 41 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Kind of. It's nullable because we need identical encoding below Raft to ensure that all replicas stay bytewise identical even in a mixed version cluster. This is the same reason why we've never been able to completely remove LegacyTimestamp. So since the field won't even exist on v20.2 nodes, we need to make sure it doesn't get serialized on v21.1 nodes.

Mind putting that in the comment?


pkg/util/hlc/timestamp.go, line 257 at r10 (raw file):

// Forward updates the timestamp from the one given, if that moves it forwards
// in time. Returns true if the timestamp was adjusted and false otherwise.

Clarify the language in the comment:

// Forward replaces the receiver with the argument, if that moves it forwards in time.


pkg/util/hlc/timestamp.go, line 269 at r10 (raw file):

// Backward updates the timestamp from the one given, if that moves it
// backwards in time.

ditto about clarifying language


pkg/util/hlc/timestamp.proto, line 51 at r10 (raw file):

  // timestamp makes no such indication. By avoiding a connection to "real
  // time", synthetic timestamps can be used to write values at a future
  // time and to indicate that observed timestamps do not apply to such

I feel that nobody will understand this in the future. Perhaps

// Synthetic timestamps are central to non-blocking transactions, which essentially write at "future timestamps". They are also used in observed timestamps, where they indicate versions that were moved from the timestamp at which they were originally written; only synthetic timestamps require observing the full uncertainty interval, whereas readings off the node's clock can tighten it for non-synthetic versions.


pkg/util/hlc/timestamp_test.go, line 46 at r10 (raw file):

	}
	a = makeTS(1, 1)
	if a.EqOrdering(b) {

curious why you didn't like require.False(t, a.EqOrdering(b), "%+v == %+v", a, b)


pkg/util/hlc/timestamp_test.go, line 207 at r10 (raw file):

		assert.Equal(t, c.exp, c.ts.String())

		// Flags don't round-trip through parsing, so ignore them.

Shouldn't they? If not, at least update comment on ParseTimestamp.


pkg/util/span/frontier.go, line 63 at r13 (raw file):

// Less implements heap.Interface.
func (h frontierHeap) Less(i, j int) bool {
	if h[i].ts.EqOrdering(h[j].ts) {

Can you leave that _ func() field on the struct to disable explicit comparisons going forward on both timestamp types?

bytes no_equality = 3 [(gogoproto.nullable) = true, (gogoproto.customname) = "_"];

worked nicely in my local testing.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/timestampMask branch from 10d9549 to 3d517fc Compare November 10, 2020 23:25
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

I'm still anxious that synthetic and regular timestamps will have an opportunity to mix when they shouldn't, given how much existing code is out there. In particular, .Forward() and .Backward() seem susceptible. I wish we had proper enum types.
From plowing through the code so far, are you worried at all? And/or do you have ideas on getting at least some amount of separation between them back (clearly separate types are not an option)?

My current plan is to never allow synthetic timestamps to do any harm by fataling if we ever call hlc.Update with a synthetic timestamp. This effectively means that we need to separate out the "transaction time domain" from the "clock time domain", and we'll be made aware pretty quickly if these ever mix. We already have this separation in some places. For instance, we have a BatchRequest.Timestamp and a BatchResponse.Now, which are independent from the timestamps passed around in Transaction objects.

My other thought is that I'm going to split hlc.Update into two methods – hlc.UpdateForStability and hlc.UpdateForCausality. This doesn't directly tie in to the discussion here, but it means that we're going to need to be a little more explicit about when, where, and why we update the hlc clock. I expect that this will uncover cases where timestamps from the "transaction time domain" and leaking back into the "clock time domain".

One more aggressive idea to enforce this separation is to use another flag to indicate which domain a timestamp is part of. A conversion from a "clock timestamp" to a "transaction timestamp" would be allowed, but not the other way around. We could then assert that no "transaction timestamp" (synthetic or otherwise) ever makes its way back into a clock. My primary concern with this would be its impact on performance. We'd probably want to make the transaction domain the zero value for the flag.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)


pkg/util/hlc/legacy_timestamp.proto, line 41 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Mind putting that in the comment?

Done.


pkg/util/hlc/timestamp.go, line 257 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Clarify the language in the comment:

// Forward replaces the receiver with the argument, if that moves it forwards in time.

Done.


pkg/util/hlc/timestamp.go, line 269 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

ditto about clarifying language

Done.


pkg/util/hlc/timestamp.proto, line 51 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I feel that nobody will understand this in the future. Perhaps

// Synthetic timestamps are central to non-blocking transactions, which essentially write at "future timestamps". They are also used in observed timestamps, where they indicate versions that were moved from the timestamp at which they were originally written; only synthetic timestamps require observing the full uncertainty interval, whereas readings off the node's clock can tighten it for non-synthetic versions.

I combined the two comments. I wanted to keep some more general definition so that we aren't limited in where we use these timestamps, but I like how clear your examples are.


pkg/util/hlc/timestamp_test.go, line 46 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

curious why you didn't like require.False(t, a.EqOrdering(b), "%+v == %+v", a, b)

Just to stay consistent with the rest of the tests in this file.


pkg/util/hlc/timestamp_test.go, line 207 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Shouldn't they? If not, at least update comment on ParseTimestamp.

Yeah, I should have just done this to begin with. Done.


pkg/util/span/frontier.go, line 63 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can you leave that _ func() field on the struct to disable explicit comparisons going forward on both timestamp types?

bytes no_equality = 3 [(gogoproto.nullable) = true, (gogoproto.customname) = "_"];

worked nicely in my local testing.

I went back and forth on it. This won't work if we want to continue using proto to generate the struct def:

pkg/util/hlc/timestamp.pb.go:139:22: cannot refer to blank field or method
pkg/util/hlc/timestamp.pb.go:174:10: cannot refer to blank field or method
pkg/util/hlc/timestamp.pb.go:177:50: cannot refer to blank field or method
pkg/util/hlc/timestamp.pb.go:178:24: cannot refer to blank field or method
pkg/util/hlc/timestamp.pb.go:204:6: cannot refer to blank field or method
pkg/util/hlc/timestamp.pb.go:206:7: cannot refer to blank field or method
pkg/util/hlc/timestamp.pb.go:300:11: cannot refer to blank field or method
pkg/util/hlc/timestamp.pb.go:432:5: cannot refer to blank field or method
pkg/util/hlc/timestamp.pb.go:433:8: cannot refer to blank field or method

But we could do so ourselves using option (gogoproto.typedecl) = false;. But I shied away from this for two reasons:

  1. it increases the struct size from 16 bytes to 40 bytes. We can do better with a _ func(), which limits to growth to 24 bytes, but that's still runtime overhead. And it's more pointers for the GC to chase.
  2. it adds yet another dependency on gogoproto that we'll never be able to break away from.

So in the end, I decided it wasn't worth it. It turns out that we don't end up checking for timestamp equality in very many places anyway – we pretty much always use Less or LessEq.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/timestampMask branch from 3d517fc to 8cfda31 Compare November 11, 2020 02:05
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: for the parts I reviewed

Reviewed 1 of 36 files at r15, 21 of 54 files at r17, 2 of 8 files at r18.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)

This file was removed when RocksDB was removed.
Informs cockroachdb#52745.
Informs cockroachdb#36431.

This commit introduces an 8-bit flags field on the hlc timestamp struct.
The flags are used to provide details about the timestamp and its
meaning. They do not affect the sort order of Timestamps.

The commit then introduces the first flag: SYNTHETIC. As discussed
in cockroachdb#52745, a synthetic timestamp is defined as a timestamp that makes no
claim about the value of clocks in the system. While standard timestamps
are pulled from HLC clocks and indicate that some node in the system has
a clock with a reading equal to or above its value, a synthetic
timestamp makes no such indication. By avoiding a connection to "real
time", synthetic timestamps can be used to write values at a future time
and to indicate that observed timestamps do not apply to such writes for
the purposes of tracking causality between the write and its observers.
Observed timestamps will be a critical part of implementing non-blocking
transactions (cockroachdb#52745) and fixing the interaction between observed
timestamps and transaction refreshing (cockroachdb#36431).

The original plan was to reserve the high-order bit in the logical
portion of a timestamp as a "synthetic bit". This is how I began
implementing things, but was turned off for a few reasons. First, it was
fairly subtle and seemed too easy to get wrong. Using a separate field
is more explicit and avoids a class of bugs. Second, I began to have
serious concerns about how the synthetic bit would impact timestamp
ordering. Every timestamp comparison would need to mask out the bit or
risk being incorrect. This was even true of the LSM custom comparator.
This seemed difficult to get right, and seemed particularly concerning
since we're planning on marking only some of a transaction's committed
values as synthetic to fix cockroachdb#36431, so if we weren't careful, we could
get atomicity violations. There were also minor backwards compatibility
concerns.

But a separate field is more expensive in theory, so we need to be
careful. However, it turns out that a separate field is mostly free in
each case that we care about. In memory, the separate field is
effectively free because the Timestamp struct was previously 12 bytes
but was always padded out to 16 bytes when included as a field in any
other struct. This means that the flags field is replacing existing
padding. Over the wire, the field will not be included when zero and
will use a varint encoding when not zero, so again, it is mostly free.
In the engine key encoding, the field is also not included when zero,
and takes up only 1 byte when non-zero, so it is mostly free.
Now that structural equality is different than semantic equality because
of the new flags pointer in LegacyTimestamp, we need to be careful about
comparing by value.

These seem to be the only issues because of how rarely LegacyTimestamp
is used. I detected these by adding a `_ func()` field to the struct and
observed what fell out when compiling.
Be more explicit and avoid any issues with structural equality.
Now that structural equality is different than semantic equality because
of the new flags field in Timestamp, we need to be careful about
comparing by value.

I detected these by adding a `_ func()` field to the struct and observed
what fell out when compiling.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/timestampMask branch from 8cfda31 to db9ef7e Compare November 12, 2020 18:34
@nvanbenschoten
Copy link
Member Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 12, 2020

Build failed (retrying...):

@nvanbenschoten
Copy link
Member Author

Strange failure in version-upgrade. Don't have time to look now, but canceling this build.

bors r-

@craig
Copy link
Contributor

craig bot commented Nov 12, 2020

Canceled.

@nvanbenschoten
Copy link
Member Author

I can't explain the failure other than some kind of network flake. The "dead" node never actually dies, it just hits a network error:

E201112 20:56:24.291372 1 cli/error.go:398 ⋮ ‹ERROR: connection lost.›
‹admin-set-ui-data: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [propagate])›

We also see this in the stderr:

*
* WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED!
* 
* This mode is intended for non-production testing only.
* 
* In this mode:
* - Your cluster is open to any client that can access 127.0.0.1.
* - Intruders with access to your machine or network can observe client-server traffic.
* - Intruders can log in without password and read or write any data in the cluster.
* - Intruders can consume all your server's resources and cause unavailability.
*
*
* INFO: To start a secure server without mandating TLS for clients,
* consider --accept-sql-without-tls instead. For other options, see:
* 
* - https://go.crdb.dev/issue-v/53404/v20.2
* - https://www.cockroachlabs.com/docs/v20.2/secure-a-cluster.html
*
*
* WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED!
* 
* This mode is intended for non-production testing only.
* 
* In this mode:
* - Your cluster is open to any client that can access 127.0.0.1.
* - Intruders with access to your machine or network can observe client-server traffic.
* - Intruders can log in without password and read or write any data in the cluster.
* - Intruders can consume all your server's resources and cause unavailability.
*
*
* INFO: To start a secure server without mandating TLS for clients,
* consider --accept-sql-without-tls instead. For other options, see:
* 
* - https://go.crdb.dev/issue-v/53404/v21.1
* - https://www.cockroachlabs.com/docs/v21.1/secure-a-cluster.html
*
*
* ERROR: ERROR: connection lost.
* 
* admin-set-ui-data: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [propagate])
*
ERROR: connection lost.

admin-set-ui-data: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [propagate])
Failed running "start"

I ran acceptance/version-upgrade 60 times on this PR and never hit a similar issue, so I don't believe that it has to do with this change.

bors r+

craig bot pushed a commit that referenced this pull request Nov 13, 2020
56373: hlc: introduce synthetic flag on timestamps r=nvanbenschoten a=nvanbenschoten

Informs #52745.
Informs #36431.

This commit introduces an 8-bit `flags` field on the hlc timestamp struct. The flags are used to provide details about the timestamp and its meaning. They do not affect the sort order of Timestamps.

The commit then introduces the first flag: SYNTHETIC. As discussed in #52745, a synthetic timestamp is defined as a timestamp that makes no claim about the value of clocks in the system. While standard timestamps are pulled from HLC clocks and indicate that some node in the system has a clock with a reading equal to or above its value, a synthetic timestamp makes no such indication. By avoiding a connection to "real time", synthetic timestamps can be used to write values at a future time and to indicate that observed timestamps do not apply to such writes for the purposes of tracking causality between the write and its observers. Observed timestamps will be a critical part of implementing non-blocking transactions (#52745) and fixing the interaction between observed timestamps and transaction refreshing (#36431).

The original plan was to reserve the high-order bit in the logical portion of a timestamp as a "synthetic bit". This is how I began implementing things, but was turned off for a few reasons. First, it was fairly subtle and seemed too easy to get wrong. Using a separate field is more explicit and avoids a class of bugs. Second, I began to have serious concerns about how the synthetic bit would impact timestamp ordering. Every timestamp comparison would need to mask out the bit or risk being incorrect. This was even true of the LSM custom comparator. This seemed difficult to get right and seemed particularly concerning since we're planning on marking only some of a transaction's committed values as synthetic to fix #36431, so if we weren't careful, we could get atomicity violations. There were also minor backwards compatibility concerns.

But a separate field is more expensive in theory, so we need to be careful. However, it turns out that a separate field is mostly free in each case that we care about. In memory, the separate field is effectively free because the Timestamp struct was previously 12 bytes but was always padded out to 16 bytes when included as a field in any other struct. This means that the flags field is replacing existing padding. Over the wire, the field will not be included when zero and will use a varint encoding when not zero, so again, it is mostly free. In the engine key encoding, the field is also not included when zero, and takes up only 1 byte when non-zero, so it is mostly free.

----

First three commits from #56477.

@sumeerbhola I'm hoping you can take a look at the engine-level changes in the `introduce synthetic flag on timestamps` commit (4th commit as of the time of writing). I think the key encoding added here makes sense, but want to make sure you're on board. One possible concern is that we introduce a new 13-byte suffix, which means that combined with a 4-byte sequence number (see #41720 (comment)), we'd collide with the 17 byte `engineKeyVersionLockTableLen`.

@tbg do you mind being the primary reviewer here? I think you know the most about the motivations for this change and will have a good sense of whether this is the best way to introduce additional state on timestamps.

56437: cli, ui: dismiss release notes signup banner per environment variable r=knz,dhartunian a=nkodali

Previously, the signup banner could only be dismissed manually.
For internal testing purposes, this banner is unnecessary. This
change provides a way to dismiss the signup banner upon start of
a cluster via the cli by setting the environment variable
COCKROACH_UI_RELEASE_NOTES_SIGNUP_DISMISSED=true.

Resolves #46998

Release note: none

56627: sql: rework SHOW REGIONS to SHOW REGIONS FROM CLUSTER r=ajstorm a=otan

Resolves #56331 

Release note (sql change): SHOW REGIONS functionality is now deferred to
SHOW REGIONS FROM CLUSTER.

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Namrata Kodali <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Nov 13, 2020
…rved timestamps

This commit introduces a new package which contains logic and documentation
related to the observed timestamp system, which allows transactions to track
causality between themselves and other, possibly-concurrent, transactions in
order to avoid uncertainty related restarts.

This ties into our efforts to better document the kv layer.

It also ties into synthetic timestamps (cockroachdb#56373), which are important for
non-blocking transactions (cockroachdb#52745) and for fixing the interaction between
observed timestamps and transaction refreshing (cockroachdb#36431). The proposed fix is to
split out a new EffectiveMaxTimestamp field from the MaxTimestamp field on a
transaction and set the EffectiveMaxTimestamp in LimitTxnMaxTimestamp. The
EffectiveMaxTimestamp field will dictate the uncertainty interval for values
with normal timestamps while the original MaxTimestamp will dictate the
uncertainty interval for values with synthetic timestamps. In practice, this
will mean that observed timestamps do not apply to values with synthetic
timestamps. This commit gives us a package to house this new complexity.
@craig
Copy link
Contributor

craig bot commented Nov 13, 2020

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Nov 13, 2020

Build succeeded:

@craig craig bot merged commit 1930679 into cockroachdb:master Nov 13, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/timestampMask branch November 13, 2020 19:57
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Nov 13, 2020
…rved timestamps

This commit introduces a new package which contains logic and documentation
related to the observed timestamp system, which allows transactions to track
causality between themselves and other, possibly-concurrent, transactions in
order to avoid uncertainty related restarts.

This ties into our efforts to better document the kv layer.

It also ties into synthetic timestamps (cockroachdb#56373), which are important for
non-blocking transactions (cockroachdb#52745) and for fixing the interaction between
observed timestamps and transaction refreshing (cockroachdb#36431). The proposed fix is to
split out a new EffectiveMaxTimestamp field from the MaxTimestamp field on a
transaction and set the EffectiveMaxTimestamp in LimitTxnMaxTimestamp. The
EffectiveMaxTimestamp field will dictate the uncertainty interval for values
with normal timestamps while the original MaxTimestamp will dictate the
uncertainty interval for values with synthetic timestamps. In practice, this
will mean that observed timestamps do not apply to values with synthetic
timestamps. This commit gives us a package to house this new complexity.
craig bot pushed a commit that referenced this pull request Nov 13, 2020
55148: kvserver: don't allow raft forwarding of lease requests r=andreimatei a=andreimatei

This patch aims to improve the behavior in scenarios where a follower
replica is behind, unaware of the latest lease, and it tries to acquire
a lease in its ignorant state. That lease acquisition request is bound
to fail (because the lease that it's based on is stale), but while it
fails (or, rather, until the behind replica finds out that it failed)
local requests are blocked. This blocking can last for a very long time
in situations where a snapshot is needed to catch up the follower, and
the snapshot is queued up behind many other snapshots (e.g. after a node
has been down for a while and gets restarted).

This patch tries an opinionated solution: never allow followers to
acquire leases. If there is a leader, it's a better idea for the leader
to acquire the lease. The leader might have a lease anyway or, even if
it doesn't, having the leader acquire it saves a leadership transfer
(leadership follows the lease).
We change the proposal path to recognize lease requests and reject them
early if the current replica is a follower and the leader is known. The
rejection points to the leader, which causes the request that triggered
the lease acquisition to make its way to the leader and attempt to
acquire a lease over there.

The next commit takes this further by short-circuiting the lease
proposal even sooner - but that patch is more best-effort.

Fixes #37906

Release note (bug fix): A bug causing queries sent to a
freshly-restarted node to sometimes hang for a long time while the node
catches up with replication has been fixed.

56392: kvserver/observedts: extract package to house logic and docs for observed timestamps r=nvanbenschoten a=nvanbenschoten

This commit introduces a new package that contains logic and documentation related to the observed timestamp system, which allows transactions to track causality between themselves and other, possibly-concurrent, transactions in order to avoid uncertainty related restarts.

This ties into our efforts to better document the kv layer.

It also ties into synthetic timestamps (#56373), which are important for non-blocking transactions (#52745) and for fixing the interaction between observed timestamps and transaction refreshing (#36431). The proposed fix is to split out a new EffectiveMaxTimestamp field from the MaxTimestamp field on a transaction and set the EffectiveMaxTimestamp in LimitTxnMaxTimestamp. The EffectiveMaxTimestamp field will dictate the uncertainty interval for values with normal timestamps while the original MaxTimestamp will dictate the uncertainty interval for values with synthetic timestamps. In practice, this will mean that observed timestamps do not apply to values with synthetic timestamps. This commit gives us a package to house this new complexity.

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten, @sumeerbhola, and @tbg)


pkg/util/hlc/timestamp.go, line 283 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm trying, but it's putting up a fight: #56190. It's also used in MVCCMetadata, which is written to "below Raft" in a number of places. I think we're going to need to run an online migration if we ever want to remove it entirely.

Where is MVCCMetadata written to below Raft? I am adding the bool txn_did_not_update_meta field to MVCCMetadata and I suppose I will need to make it truly optional.
I wish there was a way with gogoproto to make something optional but tell it to not use a pointer, and so avoid an allocation, and not marshal when the value is the default.

Hmm, I see this come up in gogo/protobuf#142, which you referenced in a commit 2 years ago, and proto3 was recommended as the solution. What is our story on proto3 adoption/conversion -- we are using it in some places?

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola and @tbg)


pkg/util/hlc/timestamp.go, line 283 at r4 (raw file):

Where is MVCCMetadata written to below Raft? I am adding the bool txn_did_not_update_meta field to MVCCMetadata and I suppose I will need to make it truly optional.

I believe the calls to MVCCPutProto with an empty timestamp in StateLoader are responsible for most of the uses below Raft.

I wish there was a way with gogoproto to make something optional but tell it to not use a pointer, and so avoid an allocation, and not marshal when the value is the default.

This is how proto3 works, but not proto2. We use proto3 in most places, but can't in cases where we need deterministic proto encoding for messages that are written below Raft.

One option would be to split MVCCMetadata into a proto3 MVCCMetadata without compatibility concerns and a proto2 LegacyMVCCMetadata with compatibility concerns. This would mirror what we did with Timestamp/LegacyTimestamp. I say this having not explored how difficult doing so would be.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Dec 22, 2020
…imestamp

This commit splits off a new enginepb.TxnTimestamp type (see `pkg/storage/enginepb/mvcc.go`)
from the existing hlc.Timestamp type through a type alias. While the two types
share the same memory and proto representation, they have different purposes and
properties. hlc.Timestamp continues to serve in its original role — representing
a real timestamp pulled from a clock on one of the nodes in the system.
enginepb.TxnTimestamp is introduced to serve in hlc.Timestamp's current
secondary role – representing MVCC time and the time that transactions read and
write at. In the words of cockroachdb#56373 (review),
this splits "clock-domain timestamps" from "transaction-domain timestamps", allowing
us to use to type system to help keep them separate and to make their interactions
more explicit.

This results in boiling most of the ocean, but comes with some benefits in the
end.

- It makes conversion from the clock-domain to the transaction-domain explicit.
This must pass through a call to `enginepb.ToTxnTimestamp`.

- It makes conversions from the transaction-domain to the clock-domain impossible
(not quite there yet).

- Similarly, it makes it impossible for transaction-domain timestamps to be
passed into `hlc.Clock.Update`.

- It allows us to more clearly state that clock-domain timestamps must always trail
present time but transaction-domain timestamps can move into the future. As such,
only transaction-domain timestamps will ever have the synthetic bit set.

I'm interested to get people's take on this. These benefits are nice, but the
size of this diff (it still doesn't come close to compiling) and the newly
introduced burden of needing to choose between multiple timestamp types is not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: stale read because intent from our past gets committed in our certain future?
4 participants