Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVED] Memory based streams and NRG behavior during server restarts #5506

Merged
merged 7 commits into from
Jun 10, 2024

Conversation

derekcollison
Copy link
Member

Improvements to catchups and health checks.

Improvements to handling snapshots for memory based wals.
With memory based wals we can not use snapshots on restarts, but we do use them while they are running.
However if a server becomes a leader with no snapshot it will be forced to stepdown when asked to catchup a follower. So we now inherit a leaders snapshot.

Also when we tried to truncate on a mismatch, we needed to truncate the previous index, not current.
When we fail due to the previous entry being compacted away, we would reset. We now reset the wal to the prior index and use the truncate term and index.

Lastly if we receive a heartbeat with correct index but newer term just inherit.
For stream health checks for replicated streams make sure that the monitor routine is running.
When waiting on consumer assignments at the beginning of the stream monitor, make sure the consumer monitor is running as well if replicated.

On a consumer snapshot, register pre-acks as needed.
On stream checkInterestState reset an empty stream to the low ack floor from all consumers.

Last fix consistency bug with memstore when skipping msgs on empty stream to ensure first == last + 1.

Signed-off-by: Derek Collison [email protected]

With memory based wals we can not use snapshots on restarts, but we do use them while they are running.
However if a server becomes a leader with no snapshot it will be forced to stepdown when asked to catchup a follower.
So we now inherit a leaders snapshot.

Also when we tried to truncate on a mismatch, we needed to truncate the previous index, not current.
When we fail due to the previous entry being compacted away, we would reset. We now reset the wal to the prior index and use the truncate term and index.

Lastly if we receive a heartbeat with correct index but newer term just inherit.

Signed-off-by: Derek Collison <[email protected]>
These are mostly for memory based streams and cosnumers that need to rebuild complete state on a server restart.

For stream health checks for replicated streams make sure that the monitor routine is running.
When waiting on consumer assignments at the beginning of the stream monitor, make sure the consumer monitor is running as well if replicated.
On a consumer snapshot, register pre-acks as needed.
On stream checkInterestState reset an empty stream to the low ack floor from all consumers.

Signed-off-by: Derek Collison <[email protected]>
@derekcollison derekcollison requested a review from a team as a code owner June 10, 2024 05:47
server/raft.go Outdated Show resolved Hide resolved
Signed-off-by: Derek Collison <[email protected]>
…ddress in separate CL or PR

Signed-off-by: Derek Collison <[email protected]>
Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 6f05a82 into main Jun 10, 2024
4 checks passed
@derekcollison derekcollison deleted the mem-wq-restarts branch June 10, 2024 16:50
wallyqs pushed a commit that referenced this pull request Jun 10, 2024
…ts (#5506)

Improvements to catchups and health checks.

Improvements to handling snapshots for memory based wals.
With memory based wals we can not use snapshots on restarts, but we do
use them while they are running.
However if a server becomes a leader with no snapshot it will be forced
to stepdown when asked to catchup a follower. So we now inherit a
leaders snapshot.

Also when we tried to truncate on a mismatch, we needed to truncate the
previous index, not current.
When we fail due to the previous entry being compacted away, we would
reset. We now reset the wal to the prior index and use the truncate term
and index.

Lastly if we receive a heartbeat with correct index but newer term just
inherit.
For stream health checks for replicated streams make sure that the
monitor routine is running.
When waiting on consumer assignments at the beginning of the stream
monitor, make sure the consumer monitor is running as well if
replicated.

On a consumer snapshot, register pre-acks as needed.
On stream checkInterestState reset an empty stream to the low ack floor
from all consumers.

Last fix consistency bug with memstore when skipping msgs on empty
stream to ensure first == last + 1.

Signed-off-by: Derek Collison <[email protected]>

---------

Signed-off-by: Derek Collison <[email protected]>
wallyqs added a commit that referenced this pull request Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants