kvserver/tstracker: implement a better closedts tracker #59225

andreimatei · 2021-01-20T23:25:54Z

This patch implements a replacement for the existing minprop.Tracker.
The new tracker is not yet hooked up.

Compared to the old tracker, this new one is better in multiple ways:

it's supposed to be used at the level of a single range, not at the
level of a node
it supports lock-free concurrent inserts
it's scope is smaller, making it cleaner and more optimal at the job
of tracking requests. The old tracker combined tracking evaluating
requests with various closed timestamps considerations: it was in
charge of maintaining the closed timestamp, closing new timestamps,
bumping requests to the closed timestamp, and the tracking that it did
was relative to the closed timestamp. This new guy only deals with
tracking (it doesn't know anything about closed timestamps), and so it
will represent the tracked requests more accurately.

The implementation is intended to work with the proposal buffer's
locking model; locking is external to the tracker, but the caller is
expected to hold a "read" lock while inserting requests for tracking,
and a write lock while removing tracked requests. This matches how the
proposal buffer handles its own insertions and flushing.

The tracking is done using a two-rolling-buckets scheme inspired from
the existing one, but rationalized more.

Release note: None

cockroach-teamcity · 2021-01-20T23:26:02Z

This change is

nvanbenschoten

Nice! This really isn't bad for lock-free code.

Most of my comments are about the interface we expose from this package and whether there are ways that we can make this easier to understand for those that don't want to dig into the implementation.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)

pkg/kv/kvserver/tstracker/tracker.go, line 11 at r1 (raw file):

// licenses/APL.txt.

package tstracker

I wonder if we should try to maintain the kv/kvserver/closedts/... structure through this rewrite. Maybe we should put this in a kv/kvserver/closedts/tracker or kv/kvserver/closedts/minprop2 package?

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):

// (but the Tracker itself does not know anything about closed timestamps).

Let's expand on this. For anyone with an understanding of the old model, they're going to be surprised by what this new tracker does and does not do. So let's be very clear about that right up top.

The tracker is used to track write timestamps of requests, but all it really is is a data structure that tracks the minimum value out of a set of integers. The canonical implementation of this is a min-heap. This implementation trades some precision for performance so that it can be lock-free and memory efficient.

This indicates to me that we may actually gain some clarity by being more abstract here, so that we don't fool people into thinking that the structure does more than it actually does. What do you think about bringing an abstract interface front-and-center. Something like:

type Token interface{}

type MinTracker interface {
    Track(ts hlc.Timestamp) Token
    Untrack(token Token)
    LowerBound() hlc.Timestamp
}

We can then implement this interface using our lock-free approach and using a naive heap and test that the two behave comparably (not exactly the same, due to loss of precision). This has the benefit of allowing anyone who cares a little bit about the package to look at the interface, anyone who cares a little more about this package to look at the naive implementation to better understand expected semantics in underdocumented edge cases, and anyone who cares a lot about this package to look at the real, lock-free implementation.

Also, I think the hlc.Timestamp should be part of the interface, not integers. That's because the interface can give some room for loss of precision, but it shouldn't rely on users correctly rounding timestamps up or down in certain places to enforce its semantics. And since it doesn't look like it's always clear which way to round an HLC timestamp's logical portion, it's better hid in here.

pkg/kv/kvserver/tstracker/tracker.go, line 34 at r1 (raw file):

//
// Proposal buffer flush:
// 		externalLock.Lock()

The indentation on this seems off.

pkg/kv/kvserver/tstracker/tracker.go, line 43 at r1 (raw file):

// An accurate implementation would hold all tracked requests in a heap ordered
// by their timestamp, but that costs memory and can't be implemented in a
// lock-free manner (at least not be this author). The tracker generally doesn't

s/be/by/

pkg/kv/kvserver/tstracker/tracker.go, line 49 at r1 (raw file):

// of two buckets, with their timestamps and reference counts. Every request in
// a bucket counts as if its timestamp was the buckets timestamp - even though,
// in reality, its timestamp is generally higher.

I'd reiterate here that it can't be lower. Or remove the word "generally".

pkg/kv/kvserver/tstracker/tracker.go, line 87 at r1 (raw file):

//
// Track can be called concurrently with other Track calls.
func (t *Tracker) Track(ctx context.Context, wts int64) Token {

What constraints are placed on wts? Can it be any time?

Also, you mention rounding HLC timestamps up earlier on. Are there places (here?) where we need to round them down to establish a lower-bound instead of an upper-bound?

pkg/kv/kvserver/tstracker/tracker.go, line 93 at r1 (raw file):

	//
	//  										|		 |
	//  									  |		 |

This indentation also looks off, so it's tough to understand this diagram. Also, which way am I supposed to read this? Right to left? Could we flip it to avoid that?

pkg/kv/kvserver/tstracker/tracker.go, line 110 at r1 (raw file):

	// currently evaluates), which is b1's timestamp.
	//
	// - req1@25 is above both buckets, so it joins b2. It would be technically

This example is pretty difficult to parse, especially during a first read. I think that's because you have yet to introduce any of the invariants/semantics around what a bucket is and how it behaves. When do buckets get rotated? What does it mean to join a bucket? Buckets can have their timestamps decreased?? Can they have them increased? The reader needs to know the rules of the road before they can be expected to reason about what is and what is not "technically correct" and what would and wouldn't be a "bad idea".

pkg/kv/kvserver/tstracker/tracker.go, line 131 at r1 (raw file):

	// IMPLEMENTATION INVARIANTS:
	//
	// 1) After a bucket is initialized, its timestamp only gets lower until the

Before this, we should introduce what a "bucket's timestamp" means. My understanding is that it is the minimum timestamp of any request that has ever joined the bucket. The request may exit the bucket, but that doesn't increase the bucket's timestamp. So this is why a bucket's timestamp never increases. Is that correct?

I got tripped up on this for a bit, because I couldn't tell whether it was legal for a request to join a bucket with a lower timestamp that the request.

pkg/kv/kvserver/tstracker/tracker.go, line 227 at r1 (raw file):

// initialized returns the bucket's ts if the bucket is initialized, or 0
// otherwise.
func (b *bucket) initialized() int64 {

nit: isn't this kind of a strange name for a method that primarily is used to return the bucket's timestamp?

pkg/kv/kvserver/tstracker/tracker.go, line 264 at r1 (raw file):

// The external-facing Token hides the fact that the internal token is not
// copiable.
type Token *token

If we want to keep this on the heap, we should explore pooling these objects. We have very clean object lifetimes here.

pkg/kv/kvserver/tstracker/tracker.go, line 273 at r1 (raw file):

}

// createIfNotInitialized initializes b to ts if b is not already initialized.

We should mention what the bool return value is here. Or can we just get rid of it?

pkg/kv/kvserver/tstracker/tracker_test.go, line 62 at r1 (raw file):

// much lower than the lowest timestamp at which a request is currently
// evaluating).
func TestTrackerRandomStress(t *testing.T) {

This is a neat test! But are you at all worried about how timing dependent it is? If the consumer goroutine ever gets starved for a few ms then it's going to fail. Is that ok?

All this goes to say that I wonder if there's a need for a test in between this one and the previous one. I'm imagining single-threaded that generates a sequence of random input to stress edge cases and then asserts invariants about the output of the tracker. What kind of invariants can we assert? Certainly that the LowerBound is a true lower bound of the tracked requests. Can we say anything more without having an understanding of when buckets rotate?

pkg/kv/kvserver/tstracker/tracker_test.go, line 81 at r1 (raw file):

	const maxReqTrailingMillis = 10
	// How long will this test take?
	const testDurationMillis = 5000

5 seconds seems a bit long, doesn't it? Was this just for initial development?

pkg/kv/kvserver/tstracker/tracker_test.go, line 404 at r1 (raw file):

		// log.Infof(ctx, "lower bound error: %dms", overshotNanos/1000000)
	}
}

Any interest in adding a microbenchmark?

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)

pkg/kv/kvserver/tstracker/tracker_test.go, line 62 at r1 (raw file):

If the consumer goroutine ever gets starved for a few ms then it's going to fail.

Why?
The test fails under stress ( :S ) and I was looking forward to debugging it tomorrow, but sounds like you already know what's wrong?
The checker checks the lower bound error versus the actual lower bound (gotten from looking at the actual non-consumed requests). It doesn't measure anything against real time. And yet it fails, so I've screwed something up.
The analysis of what error can be expected is not entirely clear to me, but it seemed to me to be on the order of one evaluation.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)

pkg/kv/kvserver/tstracker/tracker_test.go, line 62 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

If the consumer goroutine ever gets starved for a few ms then it's going to fail.

Why?
The test fails under stress ( :S ) and I was looking forward to debugging it tomorrow, but sounds like you already know what's wrong?
The checker checks the lower bound error versus the actual lower bound (gotten from looking at the actual non-consumed requests). It doesn't measure anything against real time. And yet it fails, so I've screwed something up.
The analysis of what error can be expected is not entirely clear to me, but it seemed to me to be on the order of one evaluation.

Never mind me, I wasn't thinking straight.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)

pkg/kv/kvserver/tstracker/tracker.go, line 11 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I wonder if we should try to maintain the kv/kvserver/closedts/... structure through this rewrite. Maybe we should put this in a kv/kvserver/closedts/tracker or kv/kvserver/closedts/minprop2 package?

moved to kv/kvserver/closedts/tracker

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):

Let's expand on this. For anyone with an understanding of the old model, they're going to be surprised by what this new tracker does and does not do. So let's be very clear about that right up top.

I've put some more words in the beginning.

The tracker is used to track write timestamps of requests, but all it really is is a data structure that tracks the minimum value out of a set of integers.

You say "set of integers" here, but then you talk about wanting timestamps instead. Making it about timestamps would make it a little less abstract because it'd have to talk a bit about how the rounding works. On the other hand, this data structure is not very generic; it only suitable in conjunction with some assumptions about the uniformity of lifetimes of requests and their timestamps generally going up. So I'm not taking a particular position, just thinking.

This indicates to me that we may actually gain some clarity by being more abstract here, so that we don't fool people into thinking that the structure does more than it actually does.

I've considered it... I'd do it if there was a second implementation (and maybe there will be). But if there isn't, I think hiding behind an interface would hurt more than it'd help. The comment at the top of the struct tells you about the 3 methods that you want to use. No?

Also, I think the hlc.Timestamp should be part of the interface, not integers. That's because the interface can give some room for loss of precision, but it shouldn't rely on users correctly rounding timestamps up or down in certain places to enforce its semantics. And since it doesn't look like it's always clear which way to round an HLC timestamp's logical portion, it's better hid in here.

I've considered this, but I don't think I agree. I think I prefer the rounding to be the caller's responsibility. Like you say, it's not clear if we should be rounding up or down. It seems nice for the tracker to make the guarantee that its lower bound is never below what timestamps were put into it (or what timestamps were put into it since the last time it was empty). If we round down, we violate that. If we round up, then the definition of the "lower bound" suffers. I'll think about it some more.

pkg/kv/kvserver/tstracker/tracker.go, line 34 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The indentation on this seems off.

changed

pkg/kv/kvserver/tstracker/tracker.go, line 43 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/be/by/

done

pkg/kv/kvserver/tstracker/tracker.go, line 49 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd reiterate here that it can't be lower. Or remove the word "generally".

done

pkg/kv/kvserver/tstracker/tracker.go, line 87 at r1 (raw file):

What constraints are placed on wts? Can it be any time?

Yup, any time as far as this structure is concerned. The lower it is though, the higher the lower bound error might be in the future after the respective request is untracked, depending on dynamic conditions.

Also, you mention rounding HLC timestamps up earlier on. Are there places (here?) where we need to round them down to establish a lower-bound instead of an upper-bound?

Well so this datastructure currently provides a lower bound on the numbers you fed into it. Whatever rounding the caller does is (at least currently) outside of its concerns. If this structure got into the rounding business, then more specification would be needed.

pkg/kv/kvserver/tstracker/tracker.go, line 93 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This indentation also looks off, so it's tough to understand this diagram. Also, which way am I supposed to read this? Right to left? Could we flip it to avoid that?

see now pls

pkg/kv/kvserver/tstracker/tracker.go, line 110 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This example is pretty difficult to parse, especially during a first read. I think that's because you have yet to introduce any of the invariants/semantics around what a bucket is and how it behaves. When do buckets get rotated? What does it mean to join a bucket? Buckets can have their timestamps decreased?? Can they have them increased? The reader needs to know the rules of the road before they can be expected to reason about what is and what is not "technically correct" and what would and wouldn't be a "bad idea".

see now pls

pkg/kv/kvserver/tstracker/tracker.go, line 131 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Before this, we should introduce what a "bucket's timestamp" means. My understanding is that it is the minimum timestamp of any request that has ever joined the bucket. The request may exit the bucket, but that doesn't increase the bucket's timestamp. So this is why a bucket's timestamp never increases. Is that correct?

I got tripped up on this for a bit, because I couldn't tell whether it was legal for a request to join a bucket with a lower timestamp that the request.

everything you said is right. The text has an intro now.

pkg/kv/kvserver/tstracker/tracker.go, line 227 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: isn't this kind of a strange name for a method that primarily is used to return the bucket's timestamp?

yeah I hesitated, but I wanted to particularly emphasize the initialized/not initialized decision that can be drawn based on the returned value; I think it helped the readability of the code.

pkg/kv/kvserver/tstracker/tracker.go, line 264 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we want to keep this on the heap, we should explore pooling these objects. We have very clean object lifetimes here.

yeah, I was thinking about this. How would you do it?
I just got rid of the pointers.

pkg/kv/kvserver/tstracker/tracker.go, line 273 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should mention what the bool return value is here. Or can we just get rid of it?

got rid

pkg/kv/kvserver/tstracker/tracker_test.go, line 81 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

5 seconds seems a bit long, doesn't it? Was this just for initial development?

switched to 2s. Still skipped under Short.

pkg/kv/kvserver/tstracker/tracker_test.go, line 404 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Any interest in adding a microbenchmark?

yes, done. The padding didn't do anything, but the token allocation was big.

nvanbenschoten

Reviewed 1 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)

pkg/kv/kvserver/closedts/tracker/tracker_test.go, line 482 at r2 (raw file):

			mu.RLock()
			tok := t.Track(ctx, i)
			localToks = append(localToks, tok)

Does this work? Something about it looks off. We're appending to the local slice, but not the slice in toks[myid]. Should localToks := &toks[myid] and then *localToks = append(*localToks, tok)? Maybe I'm missing something. The nested slices combined with value semantics of slices are making my head hurt.

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):

Making it about timestamps would make it a little less abstract because it'd have to talk a bit about how the rounding works.

Would we have to talk about how rounding works? If the contract of LowerBound is that it returns... a lower bound, then we're already talking about a potential loss of precision. So then it seems very beneficial to push the complexity of rounding into the abstraction and under this contract. That way, users don't have to worry about it and can't risk getting it wrong. And it's easier to test.

On the other hand, this data structure is not very generic; it only suitable in conjunction with some assumptions about the uniformity of lifetimes of requests and their timestamps generally going up.

Why is that the case? The data structure would behave just fine if timestamps were random, wouldn't it?

I'd do it if there was a second implementation (and maybe there will be). But if there isn't, I think hiding behind an interface would hurt more than it'd help.

As a reader, I don't know that I would agree with this. Having an interface and a naive implementation that serves almost as pseudo-code would have been very helpful to me when getting familiar with this. And again, it also provides testing benefits. How would that hurt?

We've done this with a few other optimized data structures and have never regretted it. I'd go so far as to say that it's a best practice to write a naive implementation of a data structure to serve as a template whenever you write a lock-free version.

The comment at the top of the struct tells you about the 3 methods that you want to use. No?

I find the comment on its own both too specific and not specific enough, though it has improved in the latest revision. It's great for someone already familiar with the mechanism and use-case for this tracker, but it is talking about "request evaluation" and "proposal buffer flush" and all things that this structure and me as a reader should be completely unaware of. That's detrimental because now I'm asking questions like "who's responsibility is it to bump new requests' timestamps?" and "where is 5s coming from?" when I should really just be worrying about what the input and output of this tracker are. And that really was a mental burden when reading this code, because it was tough to tell what was in scope and what was out of scope! If we keep it general then we can compose complexity on top of this while forgetting about the complexities of the implementation, which was the biggest benefit of keeping this tracker simple in the first place.

I think I prefer the rounding to be the caller's responsibility. Like you say, it's not clear if we should be rounding up or down.

That's exactly the point. It's not clear because it's complex and requires someone to think deeply about the contract of this tracker to get right. But there's clearly a right and a wrong answer. So leaving it up to the user (or each user) to determine which way we should be rounding seems like a mistake.

Maybe there's a middle ground here. A TimestampTracker wrapper that performs the conversion. I hear your point about the IntTracker making a guarantee that its lower bound is never below what timestamps were put into it, but I also think that if this rounding ends up in pkg/kv/kvserver, we've done something wrong.

pkg/kv/kvserver/tstracker/tracker.go, line 131 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

everything you said is right. The text has an intro now.

Yeah, this is much better now.

pkg/kv/kvserver/tstracker/tracker.go, line 227 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

yeah I hesitated, but I wanted to particularly emphasize the initialized/not initialized decision that can be drawn based on the returned value; I think it helped the readability of the code.

Maybe add a second return value to a loadTimestamp method then? Returning t.b1.initialized() from LowerBound() in particular was pretty jarring. A method named initialized should really be returning a boolean.

pkg/kv/kvserver/tstracker/tracker.go, line 264 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

yeah, I was thinking about this. How would you do it?
I just got rid of the pointers.

And need for both Token and token now?

pkg/kv/kvserver/tstracker/tracker_test.go, line 62 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Never mind me, I wasn't thinking straight.

😃

What do you make of the idea for a single-threaded stress test so you can make slightly stronger assertions?

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):
Replying out of order.

I think I prefer the rounding to be the caller's responsibility. Like you say, it's not clear if we should be rounding up or down.

That's exactly the point. It's not clear because it's complex and requires someone to think deeply about the contract of this tracker to get right. But there's clearly a right and a wrong answer. So leaving it up to the user (or each user) to determine which way we should be rounding seems like a mistake.

"there's clearly a right and a wrong answer" - I'm not convinced that the line between "right" and "wrong" is so clear. We've talked before about how rounding up would be fine. But I think most likely we'll end up rounding down, and that'd still be fine.

Anyway, let's talk about what we'd actually do - round up or down?

If we round down, that's compatible with the lower-bound guarantee of the tracker. But, if we do that, the caller might get a lower bound below what it put in - in particular, a lower bound perhaps below the closed timestamp - and that'd be quite confusing I think. But then again, perhaps not the worst thing in the world.
If we round up, then this structure no longer returns a "lower bound". The contract would seem to be... murkier.

Maybe there's a middle ground here. TimestampTracker ... IntTracker...

I'm not a fan of this idea. Because I question the generality of the IntTracker (see below), I'd rather just have one structure, not a nested thing.

On the other hand, this data structure is not very generic; it only suitable in conjunction with some assumptions about the uniformity of lifetimes of requests and their timestamps generally going up.

Why is that the case? The data structure would behave just fine if timestamps were random, wouldn't it?

It would behave "just fine" in the sense that it'd be technically correct, but I don't think it'd be a good idea to use it. If numbers can be both randomly small, and there's also a big variance in their lifetimes, then a short-lived small number, coupled with a long-lived high number in its bucket, would make the lower bound be both large and long lived. I probably wouldn't use this structure; I'd probably find another, more robust way to summarize the set. We get away with this structure because we're making assumptions about the input.

Having an interface and a naive implementation that serves almost as pseudo-code would have been very helpful to me when getting familiar with this. And again, it also provides testing benefits. How would that hurt?

So let's talk very concretely. You're saying implement a heap, and then want? Would the heap be used anywhere? Would I write a test that uses both structures and asserts that the lower bound of one is higher than the lower bound of the other? I'm a bit reticent of that because I'm pretty happy with the existing test.
Would I write a benchmark for the heap? And if I write a benchmark and it's worse than the existing one, would I keep the benchmark in the codebase?

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):

I'm not convinced that the line between "right" and "wrong" is so clear.

One more thing here: if we move the rounding inside the Tracker, then it'd seem that rounding down is the only option. But if we keep the rounding outside, then rounding up is also an option - perhaps a nicer one.

nvanbenschoten · 2021-01-22T19:37:42Z

I’m not at a computer, so I’m not responding to everything right now, but I wanted to ask: how can rounding up be safe? If we have a request evaluating at 10,5, how can we say that the lower bound in-flight write is 11? If we then close off 11, we’d be immediately in violation, right?

andreimatei

I've reworked the explanation of the datastructure, moving away from talking about "requests" except in specific places that talk about how this thing is used. PTAL

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)

pkg/kv/kvserver/closedts/tracker/tracker_test.go, line 482 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this work? Something about it looks off. We're appending to the local slice, but not the slice in toks[myid]. Should localToks := &toks[myid] and then *localToks = append(*localToks, tok)? Maybe I'm missing something. The nested slices combined with value semantics of slices are making my head hurt.

you're right. Fixed.
This completely changed the benchmark results on a single CPU - I think because there's less (or zero) garbage to collect.

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):

I’m not at a computer, so I’m not responding to everything right now, but I wanted to ask: how can rounding up be safe? If we have a request evaluating at 10,5, how can we say that the lower bound in-flight write is 11? If we then close off 11, we’d be immediately in violation, right?

Good point. I had been confused because, when we were discussing your prototype implementation, you had mentioned that the plan was to round everything up. Perhaps I had misunderstood.

pkg/kv/kvserver/tstracker/tracker.go, line 227 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Maybe add a second return value to a loadTimestamp method then? Returning t.b1.initialized() from LowerBound() in particular was pretty jarring. A method named initialized should really be returning a boolean.

done

pkg/kv/kvserver/tstracker/tracker.go, line 264 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

And need for both Token and token now?

now gone too

pkg/kv/kvserver/tstracker/tracker_test.go, line 62 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

😃

What do you make of the idea for a single-threaded stress test so you can make slightly stronger assertions?

I forgot to reply to this, but perhaps you've noticed that I've improved the test to make it robust to consumer starvation - now we check the error against the actual "evaluation time" (which can now be arbitrarily large), not what we schedule the evaluation time to be when each request is created.

I'm pretty happy with the test tbh; what stronger assertions would you put in a single-threaded one?

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):
I've switched the interface to HLC btw.

Having an interface and a naive implementation that serves almost as pseudo-code would have been very helpful to me when getting familiar with this. And again, it also provides testing benefits. How would that hurt?

Besides the interface being an extra step in code navigation, here's a concrete way in which it hurts: the Token would also have to become an iface, meaning that returning Tokens would allocate. Unless we rely on the implementation of the Token not actually allocating cause it's as small as a pointer...

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I've switched the interface to HLC btw.

Having an interface and a naive implementation that serves almost as pseudo-code would have been very helpful to me when getting familiar with this. And again, it also provides testing benefits. How would that hurt?

Besides the interface being an extra step in code navigation, here's a concrete way in which it hurts: the Token would also have to become an iface, meaning that returning Tokens would allocate. Unless we rely on the implementation of the Token not actually allocating cause it's as small as a pointer...

I should say that I have a heap implemented, so don't you believe I'm lazy.

nvanbenschoten

This is looking good. Just a few more comments.

Reviewed 2 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)

pkg/kv/kvserver/closedts/tracker/tracker.go, line 24 at r3 (raw file):

// Tracker tracks the lower bound of a set of timestamps (called the tracked
// set) Timestamps can be added and removed from the tracked set. A conservative

"set)."

pkg/kv/kvserver/closedts/tracker/tracker.go, line 27 at r3 (raw file):

// estimation of the set's lower bound can be queried; the result will be lower
// or equal to all the tracked timestamps; it might not be equal to the lowest
// timestamp (i.e. it might not be accurate).

s/accurate/precise/

pkg/kv/kvserver/closedts/tracker/tracker.go, line 97 at r3 (raw file):

// Track inserts a timestamps into the tracked set. The returned token must
// later be passed to Untrack() to remove the number from the set.

s/number/timestamp/

pkg/kv/kvserver/closedts/tracker/tracker.go, line 187 at r3 (raw file):

	// Make sure that there's at least one bucket.
	t1 := createIfNotInitialized(ctx, b1, wts)

On a second reading of this, I don't understand why we need the createIfNotInitialized function. Could we use timestamp here, get rid of the one below, and just allow extendAndJoin to initialize any uninitialized buckets?

pkg/kv/kvserver/closedts/tracker/tracker.go, line 248 at r3 (raw file):

		// We've lost track of whether the tracked timestamps were synthetic or not,
		// so let's err on the safe side.
		Synthetic: true,

We need to be careful about v20.2 nodes getting handed synthetic timestamps, either directly or indirectly (for instance, in an encoded MVCC version). It was a good catch that we need to handle tracking synthetic timestamps. But it's also unfortunate that we will be introducing synthetic timestamps that didn't otherwise exist here. I wonder if we can do something to ensure that if a bucket has not tracked a single synthetic timestamp, that it doesn't return one.

Maybe we could add an atomic synthetic int32 to bucket that gets set in extendAndJoin when joining with a synthetic timestamp. We'd need to pessimistically set this before updating ts if we want to keep LowerBound() concurrent. But we could be a bit smarter if we wanted to mandate exclusive access in LowerBound. If so, we could set the synthetic field only if we're extending the bucket downward with a synthetic timestamp.

What do you think?

pkg/kv/kvserver/closedts/tracker/tracker_test.go, line 484 at r3 (raw file):

				toks[i] = toks[i][:0]
			}
			log.VInfof(ctx, 1, "cleared %d reqs", n)

nit: consider moving this outside of the mutex so it doesn't impact benchmark performance.

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):

I had been confused because, when we were discussing your prototype implementation, you had mentioned that the plan was to round everything up. Perhaps I had misunderstood.

I probably led you astray. I hadn't thought all the way through it at that point.

Unless we rely on the implementation of the Token not actually allocating cause it's as small as a pointer...

I'd be ok relying on that.

So let's talk very concretely. You're saying implement a heap, and then want? Would the heap be used anywhere? Would I write a test that uses both structures and asserts that the lower bound of one is higher than the lower bound of the other? I'm a bit reticent of that because I'm pretty happy with the existing test.
Would I write a benchmark for the heap? And if I write a benchmark and it's worse than the existing one, would I keep the benchmark in the codebase?

I'd say "yes" to all of this. But again, the #1 reason for the naive implementation would be documentation. But if I can't convince you, I guess we can keep it like this.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)

pkg/kv/kvserver/closedts/tracker/tracker.go, line 24 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"set)."

done

pkg/kv/kvserver/closedts/tracker/tracker.go, line 27 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/accurate/precise/

done

pkg/kv/kvserver/closedts/tracker/tracker.go, line 97 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/number/timestamp/

done

pkg/kv/kvserver/closedts/tracker/tracker.go, line 187 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

On a second reading of this, I don't understand why we need the createIfNotInitialized function. Could we use timestamp here, get rid of the one below, and just allow extendAndJoin to initialize any uninitialized buckets?

done

pkg/kv/kvserver/closedts/tracker/tracker.go, line 248 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We need to be careful about v20.2 nodes getting handed synthetic timestamps, either directly or indirectly (for instance, in an encoded MVCC version). It was a good catch that we need to handle tracking synthetic timestamps. But it's also unfortunate that we will be introducing synthetic timestamps that didn't otherwise exist here. I wonder if we can do something to ensure that if a bucket has not tracked a single synthetic timestamp, that it doesn't return one.

Maybe we could add an atomic synthetic int32 to bucket that gets set in extendAndJoin when joining with a synthetic timestamp. We'd need to pessimistically set this before updating ts if we want to keep LowerBound() concurrent. But we could be a bit smarter if we wanted to mandate exclusive access in LowerBound. If so, we could set the synthetic field only if we're extending the bucket downward with a synthetic timestamp.

What do you think?

I've done something towards the minimum - the creator of a bucket dictates whether the bucket's lowerbound is synthetic.

pkg/kv/kvserver/closedts/tracker/tracker_test.go, line 484 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider moving this outside of the mutex so it doesn't impact benchmark performance.

done

pkg/kv/kvserver/tstracker/tracker.go, line 24 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I had been confused because, when we were discussing your prototype implementation, you had mentioned that the plan was to round everything up. Perhaps I had misunderstood.

I probably led you astray. I hadn't thought all the way through it at that point.

Unless we rely on the implementation of the Token not actually allocating cause it's as small as a pointer...

I'd be ok relying on that.

So let's talk very concretely. You're saying implement a heap, and then want? Would the heap be used anywhere? Would I write a test that uses both structures and asserts that the lower bound of one is higher than the lower bound of the other? I'm a bit reticent of that because I'm pretty happy with the existing test.
Would I write a benchmark for the heap? And if I write a benchmark and it's worse than the existing one, would I keep the benchmark in the codebase?

I'd say "yes" to all of this. But again, the #1 reason for the naive implementation would be documentation. But if I can't convince you, I guess we can keep it like this.

OK, see now.

nvanbenschoten

Reviewed 2 of 5 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)

pkg/kv/kvserver/closedts/tracker/lockfree_tracker.go, line 197 at r4 (raw file):

		WallTime:  ts,
		Logical:   0,
		Synthetic: t.b1.ts_synthetic(),

If we allow LowerBound calls to be concurrent with Track calls, don't we still have a race where a bucket is initialized with a synthetic timestamp but we read from it between the time that bucket.ts is set and bucket.synthetic is set? The way to avoid this might be to invert the synthetic flag, so that this race can only result in a synthetic timestamps where one shouldn't be instead of a non-synthetic timestamp where one should be.

Another option would be to pack the synthetic bit into the high bit of the timestamp to make everything atomic again. It's an int64, so you'll always have that bit available.

pkg/kv/kvserver/closedts/tracker/lockfree_tracker.go, line 266 at r4 (raw file):

}

// lockfreeToken implements RemovalToken.

This now allocates when being stuffed into a RemovalToken, right? The way to avoid that is:

type lockfreeToken bucket

func (l *lockfreeToken) RemovalTokenMarker() {}

// in extendAndJoin
return (*lockfreeToken)(b)

Did this show up in the benchmark?

You might want to do the same thing with the heapTracker's tocken as well.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)

pkg/kv/kvserver/closedts/tracker/lockfree_tracker.go, line 197 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we allow LowerBound calls to be concurrent with Track calls, don't we still have a race where a bucket is initialized with a synthetic timestamp but we read from it between the time that bucket.ts is set and bucket.synthetic is set? The way to avoid this might be to invert the synthetic flag, so that this race can only result in a synthetic timestamps where one shouldn't be instead of a non-synthetic timestamp where one should be.

Another option would be to pack the synthetic bit into the high bit of the timestamp to make everything atomic again. It's an int64, so you'll always have that bit available.

Instead, I've just said that LowerBound cannot be called concurrently with Track.

pkg/kv/kvserver/closedts/tracker/lockfree_tracker.go, line 266 at r4 (raw file):

This now allocates when being stuffed into a RemovalToken, right?

Wrong. There's no allocation:

scripts/goescape pkg/kv/kvserver/closedts/tracker/lockfree_tracker.go
./lockfree_tracker.go:56:2: moved to heap: t
./lockfree_tracker.go:218:21: refcnt escapes to heap
./lockfree_tracker.go:218:74: timeutil.Unix(0, ts) escapes to heap

Things look different if I make the lockfreeToken type larger than 64 bits. I tried to see why this happens in https://github.com/golang/go/blame/master/src/runtime/iface.go but couldn't find it.
But it's got to be the same mechanism that makes putting a (64bit) pointer into an iface not allocate. What's really confusing is that we see in iface.go specific code for a fixed set of small integers to make those not allocate (by preallocating them on the heap); but I don't understand why that's necessary since any up-to-64bit int should be covered by the general mechanism?

nvanbenschoten

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)

pkg/kv/kvserver/closedts/tracker/lockfree_tracker.go, line 266 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

This now allocates when being stuffed into a RemovalToken, right?

Wrong. There's no allocation:
scripts/goescape pkg/kv/kvserver/closedts/tracker/lockfree_tracker.go
./lockfree_tracker.go:56:2: moved to heap: t
./lockfree_tracker.go:218:21: refcnt escapes to heap
./lockfree_tracker.go:218:74: timeutil.Unix(0, ts) escapes to heap
Things look different if I make the lockfreeToken type larger than 64 bits. I tried to see why this happens in https://github.com/golang/go/blame/master/src/runtime/iface.go but couldn't find it.
But it's got to be the same mechanism that makes putting a (64bit) pointer into an iface not allocate. What's really confusing is that we see in iface.go specific code for a fixed set of small integers to make those not allocate (by preallocating them on the heap); but I don't understand why that's necessary since any up-to-64bit int should be covered by the general mechanism?

I'm not positive that

andreimatei · 2021-02-02T00:09:55Z

bors r+

andreimatei

bors r-

Reproduced the stress failure after 6h

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)

craig · 2021-02-02T00:12:09Z

Canceled.

This patch implements a replacement for the existing minprop.Tracker. The new tracker is not yet hooked up. Compared to the old tracker, this new one is better in multiple ways: - it's supposed to be used at the level of a single range, not at the level of a node - it supports lock-free concurrent inserts - it's scope is smaller, making it cleaner and more optimal at the job of tracking requests. The old tracker combined tracking evaluating requests with various closed timestamps considerations: it was in charge of maintaining the closed timestamp, closing new timestamps, bumping requests to the closed timestamp, and the tracking that it did was relative to the closed timestamp. This new guy only deals with tracking (it doesn't know anything about closed timestamps), and so it will represent the tracked requests more accurately. The implementation is intended to work with the proposal buffer's locking model; locking is external to the tracker, but the caller is expected to hold a "read" lock while inserting requests for tracking, and a write lock while removing tracked requests. This matches how the proposal buffer handles its own insertions and flushing. The tracking is done using a two-rolling-buckets scheme inspired from the existing one, but rationalized more. Release note: None

andreimatei

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)

craig · 2021-02-09T02:25:48Z

Build succeeded:

GitHub CI (Cockroach)

andreimatei added the A-multiregion Related to multi-region label Jan 20, 2021

andreimatei requested review from nvanbenschoten and aayushshah15 January 20, 2021 23:25

nvanbenschoten reviewed Jan 21, 2021

View reviewed changes

andreimatei commented Jan 21, 2021

View reviewed changes

andreimatei force-pushed the closedts.tracker branch from 9bab2e7 to 29ad207 Compare January 21, 2021 23:27

andreimatei commented Jan 21, 2021

View reviewed changes

nvanbenschoten reviewed Jan 22, 2021

View reviewed changes

andreimatei commented Jan 22, 2021

View reviewed changes

aayushshah15 requested a review from nvanbenschoten January 22, 2021 21:52

andreimatei commented Jan 22, 2021

View reviewed changes

andreimatei force-pushed the closedts.tracker branch from 29ad207 to 3c21d38 Compare January 22, 2021 21:58

andreimatei commented Jan 22, 2021

View reviewed changes

nvanbenschoten reviewed Jan 23, 2021

View reviewed changes

andreimatei force-pushed the closedts.tracker branch from 3c21d38 to 2745413 Compare January 26, 2021 01:30

andreimatei commented Jan 26, 2021

View reviewed changes

nvanbenschoten reviewed Jan 26, 2021

View reviewed changes

andreimatei force-pushed the closedts.tracker branch from 2745413 to 1f98ad3 Compare January 26, 2021 22:34

andreimatei commented Jan 26, 2021

View reviewed changes

andreimatei force-pushed the closedts.tracker branch from 1f98ad3 to a0214d0 Compare January 26, 2021 22:38

nvanbenschoten approved these changes Jan 26, 2021

View reviewed changes

andreimatei mentioned this pull request Jan 29, 2021

kvserver: introduce a Raft-based transport for closedts #59566

Merged

andreimatei force-pushed the closedts.tracker branch from a0214d0 to 6e76cf0 Compare February 1, 2021 19:00

andreimatei commented Feb 2, 2021

View reviewed changes

andreimatei force-pushed the closedts.tracker branch from 6e76cf0 to 4205ece Compare February 9, 2021 00:26

andreimatei commented Feb 9, 2021

View reviewed changes

craig bot merged commit a8532bc into cockroachdb:master Feb 9, 2021

andreimatei deleted the closedts.tracker branch December 16, 2021 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver/tstracker: implement a better closedts tracker #59225

kvserver/tstracker: implement a better closedts tracker #59225

andreimatei commented Jan 20, 2021

cockroach-teamcity commented Jan 20, 2021

nvanbenschoten left a comment

andreimatei left a comment

andreimatei left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

andreimatei left a comment

nvanbenschoten commented Jan 22, 2021

andreimatei left a comment

andreimatei left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei commented Feb 2, 2021

andreimatei left a comment

craig bot commented Feb 2, 2021

andreimatei left a comment

craig bot commented Feb 9, 2021

kvserver/tstracker: implement a better closedts tracker #59225

kvserver/tstracker: implement a better closedts tracker #59225

Conversation

andreimatei commented Jan 20, 2021

cockroach-teamcity commented Jan 20, 2021

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Jan 22, 2021

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei commented Feb 2, 2021

andreimatei left a comment

Choose a reason for hiding this comment

craig bot commented Feb 2, 2021

andreimatei left a comment

Choose a reason for hiding this comment

craig bot commented Feb 9, 2021