Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rptest: fix cloud start kafka offset computation #10262

Merged

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Apr 21, 2023

Previously, BucketView.kafka_start_offset computed the start Kafka offset from the manifest from the segment with the min base_offset. Since the manifest is allowed to have entries in 'segments' that are below 'start_offset', it broke when, in a recent change, we stopped uploading after the housekeeping.

This commit fixes the failures in CloudStorageTimingStressTest by correcting the util function mentioned above. Cloud retention always operats on segment boundaries. There are two cases:

  1. Start offset is advanced to the base_offset of another segment
  2. Start offset is advanced to the commited_offset + 1 of the latest segment at the time. This happens when retention decides to remove all segments from the cloud to satisfy the retention policy.

Both cases are now handled correctly.

Related #10130
Fixes #10241

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@VladLazar
Copy link
Contributor Author

/ci-repeat 5
skip-units
dt-repeat=10
tests/rptest/tests/cloud_storage_timing_stress_test.py

jcsp
jcsp previously approved these changes Apr 21, 2023
@VladLazar
Copy link
Contributor Author

Let's wait for the ci-repeat to complete. Since BucketView.kafka_start_offset is only used in cloud_storage_timing_stress_test.py, it would be safe to merge without a full test run.

Lazin
Lazin previously approved these changes Apr 21, 2023
@VladLazar
Copy link
Contributor Author

A lot of runs failed due to disk space. Triggering another ci-repeat...

Checking disk space

Disk space free: 189M

Not enough disk space free, cutoff is 5242880 🚨

Cleaning up docker resources older than 4h

Total reclaimed space: 0B

Checking disk space again

Disk space free: 189M

Not enough disk space free, cutoff is 5242880 🚨

Disk health checks failed

@VladLazar
Copy link
Contributor Author

/ci-repeat 5
skip-units
dt-repeat=10
tests/rptest/tests/cloud_storage_timing_stress_test.py

@VladLazar VladLazar dismissed stale reviews from Lazin and jcsp via b3a99d5 April 21, 2023 12:13
@VladLazar VladLazar force-pushed the fix-cloud-start-kafka-offset-computation branch from 770cf10 to b3a99d5 Compare April 21, 2023 12:13
@VladLazar
Copy link
Contributor Author

/ci-repeat 5
skip-units
dt-repeat=10
tests/rptest/tests/cloud_storage_timing_stress_test.py

@VladLazar VladLazar force-pushed the fix-cloud-start-kafka-offset-computation branch from b3a99d5 to 991e4d5 Compare April 21, 2023 14:02
@VladLazar
Copy link
Contributor Author

/ci-repeat 5
skip-units
dt-repeat=10
tests/rptest/tests/cloud_storage_timing_stress_test.py

Vlad Lazar added 2 commits April 21, 2023 15:09
Previously, `BucketView.kafka_start_offset` computed the start Kafka
offset from the manifest from the segment with the min `base_offset`.
Since the manifest is allowed to have entries in 'segments' that are
below 'start_offset', it broke when, in a recent change, we stopped
uploading after the housekeeping.

This commit fixes the failures in `CloudStorageTimingStressTest` by
correcting the util function mentioned above. Cloud retention always
operats on segment boundaries. There are two cases:
1. Start offset is advanced to the `base_offset` of another segment
2. Start offset is advanced to the `committed_offset + 1` of the latest
   segment at the time. This happens when retention decides to remove
   all segments from the cloud to satisfy the retention policy.

Both cases are now handled correctly.
@VladLazar VladLazar force-pushed the fix-cloud-start-kafka-offset-computation branch from 991e4d5 to ed5ac40 Compare April 21, 2023 15:57
@VladLazar
Copy link
Contributor Author

/ci-repeat 5
skip-units
dt-repeat=10
tests/rptest/tests/cloud_storage_timing_stress_test.py

@VladLazar VladLazar requested review from jcsp and Lazin April 21, 2023 17:36
@VladLazar
Copy link
Contributor Author

The latest CI repeat was green. This is good to merge IMO.

@andrwng andrwng merged commit d30570f into redpanda-data:dev Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem area/tests
Projects
None yet
4 participants