-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NRG (2.11): Start catchup from n.commit
& fix AppendEntry is stored at seq=ae.pindex+1
#5987
Conversation
848e1d2
to
5bd3d7e
Compare
LMK when ready for review. |
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
…e.pindex+1 Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
4fff9ec
to
b62c058
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, let's mark for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style comments on tests
server/jetstream_cluster_4_test.go
Outdated
followerServer2.WaitForShutdown() | ||
|
||
// Although this request will time out, it will be added to the stream leader's WAL. | ||
_, err = js.Publish("foo", []byte("first")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could set a shorter timeout to make the test faster? (not sure what the default is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, default seemed to be 5 seconds, lowered to 1s.
streamLeaderServer.WaitForShutdown() | ||
|
||
// Only restart the (previous) followers. | ||
followerServer1 = c.restartServer(followerServer1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is one server variable reassigned and not the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used below to have a connection to that server:
nc, js = jsClientConnect(t, followerServer1)
The connection could be to either server, as long as it's not the (previous) leader. So only this one variable is used to setup the connection.
rs := c.randomNonStreamLeader(globalAccountName, "TEST") | ||
ts := time.Now().UnixNano() | ||
|
||
var scratch [1024]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit may use an explanation... maybe
Manually add 3 append entries to each node's WAL, except for one node who is one behind
I actually am not sure. Your inner loop goes to 3, but then you have a break at 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That description is correct. Two servers will have 3 uncommitted entries, and one server will have 2 uncommitted entries so it needs to catchup for that third one.
Have moved that condition for that one server up, so it's a bit clearer it gets 2 iterations of that loop.
} | ||
|
||
// Check that the first two published messages came from our WAL, and | ||
// the last came from a catchup by another leader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me you are doing the same check for all 3 entries you look at, this comment is maybe outdated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 checks, 2x require_Equal
and 1x require_NotEqual
. Have changed it to use require_True
with ==
and !=
instead, that seems a bit more clear.
(And it doesn't matter what the values being compared are, just that they either match or not)
if len(expected) > 0 && int(state.LastSeq-state.FirstSeq+1) != len(expected) { | ||
return fmt.Errorf("WAL is different: too many entries") | ||
} | ||
for index := state.FirstSeq; index <= state.LastSeq; index++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looped check deserves a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
n, err := s.initRaftNode(globalAccountName, cfg, pprofLabels{}) | ||
require_NoError(t, err) | ||
|
||
encode := func(ae *appendEntry) *appendEntry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment this bit of arcane magic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Maurice van Veen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Maurice van Veen <[email protected]>
…6027) Reverts the changes made in #5987, but the tests are kept. Instead opting for a simpler approach: - removing the `isNew` condition when `pterm` or `pindex` don't match, to ensure consistency even during catchup - move the `ae.pindex == n.pindex` condition up so `pterm` can be corrected (otherwise it would not be executed) Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Neil Twigg <[email protected]>
… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <[email protected]> --------- Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <[email protected]> --------- Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <[email protected]> --------- Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <[email protected]> --------- Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
Includes the following: - #5661 - #5666 - #5671 - #5344 - #5684 - #5689 - #5691 - #5714 - #5717 - #5707 - #5792 - #5912 - #5957 - #5700 - #5975 - #5991 - #5987 - #6027 - #6038 - #6053 - #5848 - #6055 - #6056 - #6060 - #6061 - #6072 - #5832 - #6073 - #6107 Signed-off-by: Neil Twigg <[email protected]>
This PR makes three complementary fixes to the way how catchup and truncating is handled.
Specifically:
n.loadEntry(index)
we need to pass where the AppendEntry is in terms of stream sequence, this is equal toae.pindex+1
since theae.pindex
is the value before it's stored in the stream.n.commit
, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leadersn.commit
, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't matchSigned-off-by: Maurice van Veen [email protected]