-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: Replace multiTestContext with TestCluster in consistency_queue_test #50257
kv: Replace multiTestContext with TestCluster in consistency_queue_test #50257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz, @lunevalex, and @tbg)
pkg/kv/kvserver/consistency_queue_test.go, line 73 at r1 (raw file):
// observe a timestamp in the past to trick the clock. ts.Clock().Update(hlc.Timestamp{ WallTime: ts.Clock().PhysicalNow() + store.LeaseExpiration(),
Thanks for working on this!
We should not be passing in bogus future timestamps into clocks, not even in tests. The reason is that our system is build on assumptions of bounded clock skew, which is certainly violated here. The test would likely survive this (it may panic sporadically due when the clock skew is detected) but as a general rule we want to avoid this pattern, at least in its naive form. Being able to expire leases is a necessary tool to get the test to run faster in some cases, though. When we do need it, we could look into running a test cluster with a very large MaxOffset (larger than the lease expiration), which would allow us to bump this at least once. I'm sure there's a can of worms there somewhere, though.
This particular test would benefit from a simpler refactor: make the consistency checker's shouldQueue
method unit testable and don't start any nodes at all.
Basically, take
cockroach/pkg/kv/kvserver/consistency_queue.go
Lines 77 to 79 in 76c63de
func (q *consistencyQueue) shouldQueue( | |
ctx context.Context, now hlc.Timestamp, repl *Replica, _ *config.SystemConfig, | |
) (bool, float64) { |
*Replica
with a replica interface { getQueueLastProcessed() (hlc.Timestamp, error); Desc() *roachpb.RangeDescriptor) }
, pass a similarly taylored interface in for NodeLiveness
and when all is done you get to test the desired functionality directly. For inspiration, check out raft_log_queue.go
and replica_gc_queue.go
, both of them do this sort of thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)
pkg/kv/kvserver/consistency_queue_test.go, line 73 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Thanks for working on this!
We should not be passing in bogus future timestamps into clocks, not even in tests. The reason is that our system is build on assumptions of bounded clock skew, which is certainly violated here. The test would likely survive this (it may panic sporadically due when the clock skew is detected) but as a general rule we want to avoid this pattern, at least in its naive form. Being able to expire leases is a necessary tool to get the test to run faster in some cases, though. When we do need it, we could look into running a test cluster with a very large MaxOffset (larger than the lease expiration), which would allow us to bump this at least once. I'm sure there's a can of worms there somewhere, though.This particular test would benefit from a simpler refactor: make the consistency checker's
shouldQueue
method unit testable and don't start any nodes at all.Basically, take
and replacecockroach/pkg/kv/kvserver/consistency_queue.go
Lines 77 to 79 in 76c63de
func (q *consistencyQueue) shouldQueue( ctx context.Context, now hlc.Timestamp, repl *Replica, _ *config.SystemConfig, ) (bool, float64) { *Replica
with areplica interface { getQueueLastProcessed() (hlc.Timestamp, error); Desc() *roachpb.RangeDescriptor) }
, pass a similarly taylored interface in forNodeLiveness
and when all is done you get to test the desired functionality directly. For inspiration, check outraft_log_queue.go
andreplica_gc_queue.go
, both of them do this sort of thing.
Thank you for the suggestion this makes sense, I will refactor and update.
b3aaaa7
to
3c21991
Compare
pkg/kv/kvserver/consistency_queue_test.go, line 73 at r1 (raw file): Previously, lunevalex wrote…
It ended up being more messy than originally intended because of kvserver vs kvserver_test, but the test is a lot simpler now. |
3c21991
to
7eeb6db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some comments, thanks!
pkg/kv/kvserver/consistency_queue.go
Outdated
@@ -49,6 +49,15 @@ type consistencyQueue struct { | |||
replicaCountFn func() int | |||
} | |||
|
|||
// A data wrapper to allow for the shouldQueue method to be easier to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're probably getting tired of this, but .
@@ -77,30 +86,50 @@ func newConsistencyQueue(store *Store, gossip *gossip.Gossip) *consistencyQueue | |||
func (q *consistencyQueue) shouldQueue( | |||
ctx context.Context, now hlc.Timestamp, repl *Replica, _ *config.SystemConfig, | |||
) (bool, float64) { | |||
interval := q.interval() | |||
if interval <= 0 { | |||
return consistencyQueueShouldQueueImpl(ctx, now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why now
isn't also on consistencyShouldQueueData
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mindset was to wrap everything that we use from Replica in this struct, and now() was part of the original method signature so I left it out. No strong preference on this one, could go either way.
} | ||
|
||
// With a real clock its not possible to check the actual diff, | ||
// so we settle for just a key check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you add diff[0].Timestamp.WallTime=123
to the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the timestamp is part of the raw MVCC record, which makes this fairly hard to diff. Here is an example with 2 consecutive runs of the test with this code restored to as it.
` consistency_queue_test.go:257: expected:
--- leaseholder
+++ follower
+0.000000123,987 "e"
+ ts:1970-01-01 00:00:00.000000123 +0000 UTC
+ value:"\x00\x00\x00\x00\x01T"
+ raw mvcc_key/value: 6500000000000000007b000003db0d 000000000154
got:
--- leaseholder
+++ follower
+1593109825.736774000,987 "e"
+ ts:2020-06-25 18:30:25.736774 +0000 UTC
+ value:"\x00\x00\x00\x00\x01T"
+ raw mvcc_key/value: 6500161bdcf223399170000003db0d 000000000154
consistency_queue_test.go:257: expected:
--- leaseholder
+++ follower
+0.000000123,987 "e"
+ ts:1970-01-01 00:00:00.000000123 +0000 UTC
+ value:"\x00\x00\x00\x00\x01T"
+ raw mvcc_key/value: 6500000000000000007b000003db0d 000000000154
got:
--- leaseholder
+++ follower
+1593109975.875626000,987 "e"
+ ts:2020-06-25 18:32:55.875626 +0000 UTC
+ value:"\x00\x00\x00\x00\x01T"
+ raw mvcc_key/value: 6500161bdd151832a410000003db0d 000000000154`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically note: raw mvcc_key/value: 6500161bdcf223399170000003db0d 000000000154
vs raw mvcc_key/value: 6500161bdd151832a410000003db0d 000000000154
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this dropped off my radar. I would "just" regexp-replace s~(raw mvcc_key/value: ).*~\1 <omitted>~g
so that we compare the parts we know should match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in that case the only thing that is comparable is the value, since everything else has a timestamp, so we can just compare those directly without using strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I just overrode the timestamp and that seemed to have solved it all. Ok going with that.
@@ -91,10 +90,18 @@ func (s *Store) ComputeMVCCStats() (enginepb.MVCCStats, error) { | |||
|
|||
// ConsistencyQueueShouldQueue invokes the shouldQueue method on the | |||
// store's consistency queue. | |||
func (s *Store) ConsistencyQueueShouldQueue( | |||
ctx context.Context, now hlc.Timestamp, r *Replica, cfg *config.SystemConfig, | |||
func ConsistencyQueueShouldQueue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and why does it even exist? On master, it's called only from TestConsistencyQueueRequiresLive
which you've refactored).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestConsistencyQueueRequiresLive stills calls this and it lives in kvserver_test. Unfortunately we cant move that file into kvserver because of the circular dependency between TestCluster and kvserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
7eeb6db
to
6c243dc
Compare
It's also a crude way to test the output format. I know it's not ideal, if
we test this more directly elsewhere then yeah definitely don't bother
salvaging this
…On Fri, Jun 26, 2020, 18:56 lunevalex ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/kv/kvserver/consistency_queue_test.go
<#50257 (comment)>
:
> - act := diff.String()
-
- exp := `--- leaseholder
-+++ follower
-+0.000000123,987 "e"
-+ ts:1970-01-01 00:00:00.000000123 +0000 UTC
-+ value:"\x00\x00\x00\x00\x01T"
-+ raw mvcc_key/value: 6500000000000000007b000003db0d 000000000154
-`
- if act != exp {
- // We already logged the actual one above.
- t.Errorf("expected:\n%s\ngot:\n%s", exp, act)
- }
-
+ // With a real clock its not possible to check the actual diff,
+ // so we settle for just a key check.
I think in that case the only thing that is comparable is the value, since
everything else has a timestamp, so we can just compare those directly
without using strings.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#50257 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGXPZCW4IWH5QJ7LLNGZNDRYTHMVANCNFSM4N7I2BVQ>
.
|
6c243dc
to
5171245
Compare
Makes progress on cockroachdb#8299 multiTestContext is legacy construct that is deprecated in favor of running tests via TestCluster. This is one PR out of many to remove the usage of multiTestContext in the consistency_queue test cases. Release note : None
5171245
to
59306ed
Compare
bors r+ |
Build succeeded |
Makes progress on #8299
multiTestContext is legacy construct that is deprecated in favor of running tests via TestCluster. This is one PR out of many to remove the usage of multiTestContext in the consistency_queue test cases.
Release note : None