-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed issue with offset translation that may lead to offset gaps #10232
Fixed issue with offset translation that may lead to offset gaps #10232
Conversation
When `raft::offset_translator` updated state during `prefix_truncate_reset` all the users which cached the `_state` pointer wouldn't see the state updates as the assignment didn't change the underlying data structure but the pointer itself. Fixed the issue by using an assignment operator that changes pointee not the pointer. Signed-off-by: Michal Maslanka <[email protected]>
An `rm_stm` can always access the `consensus` to query the offset translator state there is no need to cache the translator state in partition. Signed-off-by: Michal Maslanka <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what a find 🔥
CI Failures: |
@@ -104,7 +104,7 @@ offset_translator::start(must_reset reset, bootstrap_state&& bootstrap) { | |||
if (reset) { | |||
vlog(_logger.info, "resetting offset translation state"); | |||
|
|||
_state = storage::offset_translator_state( | |||
*_state = storage::offset_translator_state( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When
raft::offset_translator
updated state during
prefix_truncate_reset
all the users which cached the_state
pointer
nice catch. why would they have operator=(T&&)
defined for the shared pointer? that seems really odd.
would the offset translator api be easier to use if any wanting to hold a reference to the state held a reference to the translator instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the operator(T&&)
is indeed confusing. I was thinking about changing the interface to return a reference but i do not know all the lifecycle implications. There may be a user that is relaying on the shared pointer semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a user that is relaying on the shared pointer semantics.
Isn't this untrue now that you removed all cases of users keeping a copy of the shared pointer?
@@ -399,7 +400,7 @@ kafka_stages partition::replicate_in_stages( | |||
} | |||
auto old_offset = r.value().last_offset; | |||
auto new_offset = kafka::offset( | |||
_translator->from_log_offset(old_offset)()); | |||
get_offset_translator_state()->from_log_offset(old_offset)()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we aren't going to cache any pointers to the state going forward, should we remove that interface and drop the first commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are cahced pointers in different places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I remember the whole point of introducing these shared ptrs was to allow caching it for the duration of the fetch request (and the offset_translator_state
object could come from two places - for a local read we got it from raft and for a cloud read we reconstructed it on the fly while reading the segments). And just translating offsets on the fly was not enough because just before returning data to the client we had to go back and translate aborted tx ranges. Ah, implementing the first version of shadow indexing, good times.
Probably this can all be simplified somehow :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I was confused because the wording on one of the commits sounded like all cases in which the pointer were cached were being removed.
Signed-off-by: Michal Maslanka <[email protected]>
33002ba
to
fad8797
Compare
/backport v23.1.x |
/backport v22.3.x |
/backport v22.2.x |
Failed to run cherry-pick command. I executed the below command:
|
Failed to run cherry-pick command. I executed the below command:
|
This check acts as a reproducer for redpanda-data#10232 and more generally as a pervasive sanity check of our offset handling and idempotency.
This check acts as a reproducer for redpanda-data#10232 and more generally as a pervasive sanity check of our offset handling and idempotency.
When
hydrate_snapshot
happens in Raft theoffset_translator_state
inoffset_translator
is reset with theprefix_truncate_reset
. The underlying data structure wasn't updated and that lead the situation in which therm_stm
accessed the old not updated version ofoffset_translator_state
.Fixes: #8289
Backports Required
Release Notes
Bug Fixes
rm_stm
on snapshot hydrate