-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c/archival_stm: do not reset _last_replicate on timeout #15677
c/archival_stm: do not reset _last_replicate on timeout #15677
Conversation
For same term syncing we store the replicate commands future so that it can be awaited in the next sync call. However, the `std::exchange(_last_replicate, std::nullopt)` before the wait is problematic as it "forgets" the last replicate. If sync times out and we retry it then it behaves as if the last replicate command succeeded. This is not necessarily true as the newly added test assertion shows. Use a shared future for _last_replicate so that it can be awaited multiple times and reset it only after it is resolved. This guarantees that sync will only return after last replicate command actually resolved. `ignore_ready_future` call is removed as the shared future does not issue a warning if it wasn't consumed (basically a revert of e221a0a).
8afb056
to
299e12d
Compare
/dt |
/dt |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/42877#018c6e4c-2fb6-4178-8619-2ee6c53c4969 |
/cdt |
/cdt |
If this should be in v23.3.1, then let's ensure we are backporting to v23.3.x now that it is branched. |
/cdt |
/cdt |
std::optional<ss::future<result<raft::replicate_result>>> _last_replicate; | ||
struct last_replicate { | ||
model::term_id term; | ||
ss::shared_future<result<raft::replicate_result>> result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you add a comment explaining the requirement of shared_future here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do in a subsequent PR. Merging to get it before the next CDT run.
auto fut = std::exchange(_last_replicate, std::nullopt).value(); | ||
|
||
if (!fut.available()) { | ||
if (!_last_replicate->result.available()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a shared future for _last_replicate so that it can be awaited
multiple times and reset it only after it is resolved.
Just trying to understand the remaining problematic scenarios. Does this generalize to: sync()
may be called concurrently by different fibers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does. To give one example at least, ListOffsets kafka rpc causes a sync() call too.
/backport v23.3.x |
/backport v23.2.x |
Failed to create a backport PR to v23.2.x branch. I tried:
|
For same term syncing we store the replicate commands future so that it can be awaited in the next sync call. However, the
std::exchange(_last_replicate, std::nullopt)
before the wait is problematic as it "forgets" the last replicate. If sync times out and we retry it then it behaves as if the last replicate command succeeded. This is not necessarily true as the newly added test assertion shows.Use a shared future for _last_replicate so that it can be awaited multiple times and reset it only after it is resolved. This guarantees that sync will only return after last replicate command actually resolved.
ignore_ready_future
call is removed as the shared future does not issue a warning if it wasn't consumed (basically a revert of e221a0a).Backports Required
Release Notes