-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Etcd2Topo: Use node's ModRevision consistently for in-memory topo.Version value #15847
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <[email protected]>
22ee6d1
to
d49f984
Compare
To clarify, the use of Since we are watching a specific key, it shouldn't matter if we use |
Quote from the description: So the way I'm reading this is that there was an unnecessary change in a previous PR which may actually cause incorrect behavior, and this PR is fixing that. |
Signed-off-by: Matt Lord <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #15847 +/- ##
==========================================
+ Coverage 68.40% 68.42% +0.01%
==========================================
Files 1556 1559 +3
Lines 195121 196537 +1416
==========================================
+ Hits 133479 134482 +1003
- Misses 61642 62055 +413 ☔ View full report in Codecov by Sentry. |
AFAIK, the logical version within watches only comes in to play when we need to resume/restart the watch: vitess/go/vt/topo/etcd2topo/watch.go Lines 79 to 110 in 7603abc
As you can see there, it's already using the revision value (etcd keyspace version) for both. BUT, in looking at that I noticed that in a relatively new usage of |
Signed-off-by: Matt Lord <[email protected]>
I checked out your branch. I seem to see Since no test is failing, I am assuming we don't really use the version value anywhere. Looking at a couple of watchers (Shard and SrvVSchema, we only look at the new contents of the key and not the version). Otherwise I would have expected some unit test to have failed with the changes in this PR or the previous one. In vitess/go/vt/topo/etcd2topo/watch.go Lines 52 to 58 in d49f984
For new values it sends Version vitess/go/vt/topo/etcd2topo/watch.go Lines 136 to 140 in d49f984
In initial: vitess/go/vt/topo/etcd2topo/watch.go Lines 176 to 182 in d49f984
watcher: vitess/go/vt/topo/etcd2topo/watch.go Lines 257 to 264 in 4054eb7
|
OK, I just saw after my comment above that you have a new commit where everything is set as |
Yes, I noticed this too in answering your previous question and I updated things to be consistent (also updated the description for further clarification).
Sure, I can add one. |
Thinking about this some more: the watchers are watching a specific key, right? So shouldn't we be passing When will using |
Needs to be re-reviewed since more changes have been committed.
vitess/go/vt/topo/etcd2topo/watch.go Lines 63 to 66 in 96c43e4
It’s worth noting why Watch actually exists. It’s not because you want to process every change of a key (it doesn’t offer those guarantees) — it’s that you want to get the new version of a key when it’s changed. In other words, you always want your in-memory representation to be up-to-date / have the latest values from the database. https://etcd.io/docs/v3.5/tutorials/how-to-watch-keys/ It only guarantees that you (eventually) get notifications about future changes to the key (with "future" being based on the revision). That’s it. If you’re not already caught up, you can miss intermediate changes. And it's important to note that the etcd client library only takes revisions on
For example, get: https://github.com/etcd-io/etcd/blob/35361955578322a55841c1ed962c53e28658d951/client/v3/kv.go#L41-L49 You're saying that you want to watch this key from this database snapshot. And this is the only way that you can coordinate across keys, whether you're using transactions, multi-key watches (as we do with It's also worth noting that I haven't yet found places where Etcd itself, in the product or in the documentation, uses the key's So perhaps the key's
Wrong is not the correct word to use here IMO. We have to define the semantics and guarantees — only then can we know the best option. I've made every attempt to explain why So I take this in total to mean that the key's |
After further investigation (see my previous comment: #15847 (comment)) I think that we should use
That one exception is when explicitly doing
The usage there makes sense to me because it's a special case when creating a key — we want to enforce that it doesn't already exist because at lower layers / within etcd it's a So I'm going to add a test to confirm and enforce that we are using |
@rohit-nayak-ps Let's also look at an example since you seem to suspect that
As you can see, the Hopefully this, in combination with my previous comments, helps make it clear why using the |
I do understand the differences between Version and ModRevision and that if a key is deleted we restart the versioning. My only context to ask about Version vs ModRevision is related to the watcher, since that is the code path this PR affects. My understanding in this context is: If we don't care about the version of the watched object (has it been changed X times) and/or if it is a newly created key then it doesn't matter which one we use. If it does matter, do we delete the watched key ever? If not, it doesn't matter which one we use. If so, does it matter to us that a key was deleted? If not, it doesn't matter which one we use. If so, we do want the So in terms of watching specific keys, I believe the Just to round off the discussion on this PR, from my side, I am fine with whatever the others decide we want to do now for the watch versioning since we don't seem to depend on the versions in this context anyway. We can always change it later if any specific functionality is needed that depends on this decision. |
Signed-off-by: Matt Lord <[email protected]>
@rohit-nayak-ps and @deepthi thank you for the reviews and discussion! I've now thoroughly documented the reason for this PR and the reasoning for the changes made. I've also now added a unit test to confirm and enforce the new expected behavior. I'm considering this work done unless I get any further review comments. Thanks again! |
Description
Etcd has several version related fields/values in each key's header:
Revision
,ModRevision
, andVersion
. These are related but distinct things:To summarize:
Revision
: a logical clock used to track changes to the etcd keyspace (a transaction / snapshot ID)Version
: a logical clock used to track the changes to an individual key since it was last created (it is not monotonic)ModRevision
: the etcd keyspace revision number for the key at that point in time (the logical version of the key)The
Revision
is what allows for consistent reads and writes of multiple keys (in a transaction). It allows you, for example, to do a consistent read / scan of N keys at that logical point-in-time.In Vitess we also have awareness of each value and read/use these values in memory in the
etcd2topo
topo server implementation. There's also a generic and opaquetopo.Version
type — which can be set to either of the two lower level node/key specific values:ModRevision
orVersion
— that we use for strict serializability.In #15701 I aligned the Vitess and
etcd2topo
lower level version related fields, variable names, and comments for watches: https://github.com/vitessio/vitess/pull/15701/files/f1fed6dd787feea26a4f4e13f660e8eaa91e64bb#diff-3f055ca1a4446966abba5a840e13c5c6a11e73121143a66b06070a356425d676In later unrelated investigations, however, I realized that the use of
Version: EtcdVersion(initial.Kvs[0].ModRevision)
inWatch()
was intentional and consistent with how we're usingModRevision
for the node'stopo.Version
value infile.go
. The reason being that we use this to enforce strict serializability when atopo.Version
value is provided — failing the write if thetopo.Version
value provided is not equal to the currentModRevision
of the key — and we return the newtopo.Version
value based on the newModRevision
value (technically we return theRevision
value from the response header, but that would also be theModRevision
value for the key we updated in the transaction) on successful update:vitess/go/vt/topo/etcd2topo/file.go
Lines 48 to 74 in f27d287
We use the
ModRevision
value there as etcd's transaction response includes the newModRevision
from the commit(again, technically it's the
Revision
value from the response header, but that would also be theModRevision
value for the key we updated in the transaction) but NOT the new node/keyVersion
. This makes sense because a transaction can update any arbitrary number of keys. So the only logical version that can be provided for that transaction is the keyspaceRevision
, which would be theModRevision
value for each of the keys updated in the transaction. This is the logical point in time for the state of the data in etcd — equivalent to a GTID position/set in MySQL.That is used e.g. with
Shard
records/keys and theShardInfo
in memory structure:vitess/go/vt/topo/shard.go
Lines 165 to 222 in f27d287
One way for us to think about this: an etcd watch is equivalent to a (filtered) binlog stream with MySQL. We don’t know the version of a table or row, we only know their state at a logical point in time — the GTID position. The revision in etcd is that logical point in time / position in the stream of events/changes.
So with all of this in mind, we should also be using
ModRevision
for thetopo.Version
value inWatch()
— and beyond that, we should be using it for alltopo.Version
values within theetcd2topo
implementation. Please see the discussion from here on for further reasons why.So in the end, this PR does just that: it uses the etcd
ModRevision
value for the logicaltopo.Version
value of keys across theetcd2topo
implementation and adds a test to check and enforce that forWatch()
.Related Issue(s)
Checklist