-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in CloudRetentionTest.test_cloud_retention
#7708
Comments
This test doesn't seem to succeed: FAIL test: CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=20 (14/14 runs) |
max_consume_rate_mb=20: max_consume_rate_mb=None: |
on (arm64, VM) https://buildkite.com/redpanda/vtools/builds/5047#0185838c-0b8d-4784-abe4-d77cf7b61b3e
|
I think this is caused by a bug in offset translation. I ran test with trace logging enabled in kgo-verifier, franz go, and in Redpanda, and found in the client logs several messages (newly added for debugging) like:
...after the On the servers, it looks like we keep on fetching the same offset, but end up not returning any results, e.g.:
There's an RPC to list offsets: It seems like a segment is truncated: We then return the offset The fetches thereafter are for offset
Looking at the offset translation code for segment lookup, we first do an offset lookup at model offset |
When serving a fetch request by a kafka::offset, we previously used the `partition_manifest` to perform a segment lookup by `kafka::offset_cast(ko)` first in order to traverse the segments forward to find the actual segment that contained `ko`. This doesn't work when the manifest has been truncated, e.g. if the casted offset falls before the start of the manifest, we would previously return that no segment exists for the fetch. This could result in segments erronesouly not being returned, and fetches erroneously being met with no data when some existed. This commit fixes the behavior to no longer use the casted segment lookup and adds some test coverage for kafka::offset lookups. Fixes redpanda-data#7708
When serving a fetch request by a kafka::offset, we previously used the `partition_manifest` to perform a segment lookup by `kafka::offset_cast(ko)` first in order to traverse the segments forward to find the actual segment that contained `ko`. This doesn't work when the manifest has been truncated, e.g. if the casted offset falls before the start of the manifest, we would previously return that no segment exists for the fetch. This could result in segments erronesouly not being returned, and fetches erroneously being met with no data when some existed. This commit fixes the behavior to no longer use the casted segment lookup and adds some test coverage for kafka::offset lookups. Fixes redpanda-data#7708
Isn't it the expected behavior? The consumer tries to fetch offset below start offset and gets an error. Or it just acts as if the partition is empty in this case? |
I don't think so. In this case, the partition has kafka offsets 84186 - 87788 (see |
So basically, prefix truncation breaks segment lookup so it can't find the first segment anymore? |
Right, a simple example of what I think is happening is:
We may see a fetch at kafka offset 7, and in today's code, the |
When serving a fetch request by a kafka::offset, we previously used the `partition_manifest` to perform a segment lookup by `kafka::offset_cast(ko)` first in order to traverse the segments forward to find the actual segment that contained `ko`. This doesn't work when the manifest has been truncated, e.g. if the casted offset falls before the start of the manifest, we would previously return that no segment exists for the fetch. This could result in segments erronesouly not being returned, and fetches erroneously being met with no data when some existed. This commit fixes the behavior to no longer use the casted segment lookup and adds some test coverage for kafka::offset lookups. Fixes redpanda-data#7708 (cherry picked from commit 10f87c1)
When serving a fetch request by a kafka::offset, we previously used the `partition_manifest` to perform a segment lookup by `kafka::offset_cast(ko)` first in order to traverse the segments forward to find the actual segment that contained `ko`. This doesn't work when the manifest has been truncated, e.g. if the casted offset falls before the start of the manifest, we would previously return that no segment exists for the fetch. This could result in segments erronesouly not being returned, and fetches erroneously being met with no data when some existed. This commit fixes the behavior to no longer use the casted segment lookup and adds some test coverage for kafka::offset lookups. Fixes redpanda-data#7708
https://buildkite.com/redpanda/vtools/builds/4587#0184f5eb-1f40-4102-a3ec-f323deb5d8ed
The text was updated successfully, but these errors were encountered: