-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed inter-operation between follower fetching and incremental fetch requests #11748
Fixed inter-operation between follower fetching and incremental fetch requests #11748
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! a nit and a question, feel free to ignore.
|
||
def no_lag(): | ||
gr = rpk.group_describe(consumer_group) | ||
if gr.state != "Stable": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understang how cgroup state (stable or not) can tell about follower lag - if that is the follower lag that is checked, otherwise what lag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a group is in a state different than Stable
no partitions are returned, this way we can not reason about the lag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patch lgtm but I have a couple questions on client behavior just to be sure I'm understanding the issue correctly.
@@ -951,6 +951,10 @@ bool update_fetch_partition( | |||
include = true; | |||
partition.last_stable_offset = model::offset(resp.last_stable_offset); | |||
} | |||
if (partition.start_offset != resp.log_start_offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find..
q: what are repercussions of this bug? If we miss this and the subsequent request tries to fetch from an evicted offset , I see the reader throws a runtime error, is there more to it? Just making sure I understood it right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this a consumer will not be able to see the offset updates. I am not sure if there are any major issues, but i wanted to be consistent.
# sleep long enough to cause metadata refresh on the consumer | ||
time.sleep(30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: I'm curious how this test is forcing the client to do 'incremental fetches' vs the other test that looks pretty similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is using Java Kafka client that by default uses incremetal fetch requests.
Signed-off-by: Michal Maslanka <[email protected]>
When replying to incremental fetch request broker must include a partition in fetch response if there is a change in its metadata. In Redpanda we were missing validation of start offset change. This may lead to situations in which follower wasn't notified about changes in the start offset. Signed-off-by: Michal Maslanka <[email protected]>
…lica If partition contain information about preferred replica it MUST be included in incremental fetch response. Otherwise a client would indefinitely retry asking leader about preferred replica and stop making progress. Signed-off-by: Michal Maslanka <[email protected]>
Signed-off-by: Michal Maslanka <[email protected]>
Added test validating if follower fetching is working correctly with incremental fetch requests. Signed-off-by: Michal Maslanka <[email protected]>
When replica is marked as offline it should not be considered a candidate to read from. Fixes: redpanda-data#11767 Signed-off-by: Michal Maslanka <[email protected]>
Added test checking if consumer can continue operating if the only eligible replica in request rack is offline. Signed-off-by: Michal Maslanka <[email protected]>
fe7fd1b
to
6dc0c9a
Compare
@@ -951,6 +951,10 @@ bool update_fetch_partition( | |||
include = true; | |||
partition.last_stable_offset = model::offset(resp.last_stable_offset); | |||
} | |||
if (partition.start_offset != resp.log_start_offset) { | |||
include = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is include = true
intended to be conditionally set? otherwise, it seems like the entire condition is unncessary and the start_offset can be set unconditionally. or perhaps the condition of them not being equal is interesting enough to log about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh maybe it's including it only if it changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly, include only if changed
If partition contain information about preferred replica it MUST be
included in incremental fetch response. Otherwise a client would
indefinitely retry asking leader about preferred replica and
stop making progress.
Fixes: #11767
Backports Required
Release Notes
Bug Fixes