-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: TestCreateCheckpoint_SpanConstrained skipped #100935
Comments
Previously, CreateCheckpoint would restrict a checkpoint by passing invalid keys to Pebble. This keys were unencoded `roachpb.Key`s without a version length last byte. The unit test is skipped, because it reveals another ununderstood problem. Close cockroachdb#100919. Informs cockroachdb#100935. Epic: none Release note: None
Previously, CreateCheckpoint would restrict a checkpoint by passing invalid keys to Pebble. This keys were unencoded `roachpb.Key`s without a version length last byte. The unit test is skipped, because it reveals another ununderstood problem. Close cockroachdb#100919. Informs cockroachdb#100935. Epic: none Release note: None
100940: storage: encode engine keys passed as checkpoint bounds r=sumeerbhola a=jbowens Previously, CreateCheckpoint would restrict a checkpoint by passing invalid keys to Pebble. These keys were unencoded `roachpb.Key`s without a version length last byte. The unit test is skipped, because it reveals another ununderstood problem. Close #100919. Informs #100935. Epic: none Release note: None 100947: randgen: Add tsquery/tsvector types to RandDatumSimple r=cucaroach a=cucaroach This API is used in sqlsmith and this exposes this new functionality to all our sqlsmith based random testing. Epic: none Release note: None Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Tommy Reilly <[email protected]>
Previously, CreateCheckpoint would restrict a checkpoint by passing invalid keys to Pebble. This keys were unencoded `roachpb.Key`s without a version length last byte. The unit test is skipped, because it reveals another ununderstood problem. Close #100919. Informs #100935. Epic: none Release note: None
Previously, CreateCheckpoint would restrict a checkpoint by passing invalid keys to Pebble. This keys were unencoded `roachpb.Key`s without a version length last byte. The unit test is skipped, because it reveals another ununderstood problem. Close cockroachdb#100919. Informs cockroachdb#100935. Epic: none Release note: None
hey @RaduBerinde do you mind taking a look at this unit test? |
Looking into the checkpoint from a reproduction, I see this:
I think the record writer is more complicated internally than I thought. It might not be ok to create a new writer to append a record to an existing file. |
Indeed, the record format expects full blocks except the last one. I think the fix here is to use a record reader and a record writer instead of copying the file, after which it's ok to continue using that same writer to add one more record. |
Looking at the record code some more, I believe the record reader is only bothered when the appended record data crosses a 32KB block boundary. This explains why this wasn't caught by any of the checkpoint tests in Pebble. |
When a checkpoint is restricted to a set of spans, we append a record to the checkpoint's manifest removing all the excluded ssts. We append the record after copying the raw data from the existing manifest. This is not quite ok: the `record` library works by breaking up records in chunks and packing chunks into 32KB blocks. Chunks cannot straddle a 32KB boundary, but this invariant is violated by our append method. In practice, this only happens if the record we are appending is big and/or we are very unlucky and the existing manifest is close to a 32KB boundary. To fix this: instead of doing a raw data copy of the existing manifest, we copy at the record level (using a record reader and a record writer). Then we can add a new record using the same writer. Informs cockroachdb/cockroach#100935
When a checkpoint is restricted to a set of spans, we append a record to the checkpoint's manifest removing all the excluded ssts. We append the record after copying the raw data from the existing manifest. This is not quite ok: the `record` library works by breaking up records in chunks and packing chunks into 32KB blocks. Chunks cannot straddle a 32KB boundary, but this invariant is violated by our append method. In practice, this only happens if the record we are appending is big and/or we are very unlucky and the existing manifest is close to a 32KB boundary. To fix this: instead of doing a raw data copy of the existing manifest, we copy at the record level (using a record reader and a record writer). Then we can add a new record using the same writer. Informs cockroachdb/cockroach#100935
When a checkpoint is restricted to a set of spans, we append a record to the checkpoint's manifest removing all the excluded ssts. We append the record after copying the raw data from the existing manifest. This is not quite ok: the `record` library works by breaking up records in chunks and packing chunks into 32KB blocks. Chunks cannot straddle a 32KB boundary, but this invariant is violated by our append method. In practice, this only happens if the record we are appending is big and/or we are very unlucky and the existing manifest is close to a 32KB boundary. To fix this: instead of doing a raw data copy of the existing manifest, we copy at the record level (using a record reader and a record writer). Then we can add a new record using the same writer. Informs cockroachdb/cockroach#100935
When a checkpoint is restricted to a set of spans, we append a record to the checkpoint's manifest removing all the excluded ssts. We append the record after copying the raw data from the existing manifest. This is not quite ok: the `record` library works by breaking up records in chunks and packing chunks into 32KB blocks. Chunks cannot straddle a 32KB boundary, but this invariant is violated by our append method. In practice, this only happens if the record we are appending is big and/or we are very unlucky and the existing manifest is close to a 32KB boundary. To fix this: instead of doing a raw data copy of the existing manifest, we copy at the record level (using a record reader and a record writer). Then we can add a new record using the same writer. Informs cockroachdb/cockroach#100935
Backport of cockroachdb#2460 for 23.1.x When a checkpoint is restricted to a set of spans, we append a record to the checkpoint's manifest removing all the excluded ssts. We append the record after copying the raw data from the existing manifest. This is not quite ok: the `record` library works by breaking up records in chunks and packing chunks into 32KB blocks. Chunks cannot straddle a 32KB boundary, but this invariant is violated by our append method. In practice, this only happens if the record we are appending is big and/or we are very unlucky and the existing manifest is close to a 32KB boundary. To fix this: instead of doing a raw data copy of the existing manifest, we copy at the record level (using a record reader and a record writer). Then we can add a new record using the same writer. Informs cockroachdb/cockroach#100935
Backport of #2460 for 23.1.x When a checkpoint is restricted to a set of spans, we append a record to the checkpoint's manifest removing all the excluded ssts. We append the record after copying the raw data from the existing manifest. This is not quite ok: the `record` library works by breaking up records in chunks and packing chunks into 32KB blocks. Chunks cannot straddle a 32KB boundary, but this invariant is violated by our append method. In practice, this only happens if the record we are appending is big and/or we are very unlucky and the existing manifest is close to a 32KB boundary. To fix this: instead of doing a raw data copy of the existing manifest, we copy at the record level (using a record reader and a record writer). Then we can add a new record using the same writer. Informs cockroachdb/cockroach#100935
100807: jobs: add job to populate system activity tables r=j82w a=j82w The system statistics table grow to large for the ui to quickly return results. The new activity tables job does the aggregation and only records the top 500 of each of the 6 columns. This means for a given hour there is a limit of 3000 rows. This allows the ui to return results fast and reliably. If the job detects there are less than 3k rows it will just copy all the rows to the activity tables. Epic: none closes: #98882 Release note: none 101811: kv: deflake and unskip TestRangeLocalUncertaintyLimitAfterNewLease r=andrewbaptist a=nvanbenschoten Fixes #99527. Fixes #100491. This commit deflakes and unskips TestRangeLocalUncertaintyLimitAfterNewLease. The test was flaky because it was pausing the clock for some but not all nodes in the cluster. This was because it was accidentally using a 3-node cluster instead of a 2-node cluster, like it was intending. Release note: None 101890: go.mod: bump Pebble to 6002e39ce756, enable TestCreateCheckpoint_SpanConstrained r=RaduBerinde a=RaduBerinde #### go.mod: bump Pebble to 6002e39ce756 6002e39c db: fix bug with restricted checkpoints 7fdab075 objstorage: add mutex in sharedReadable 4bfe993a objstorage: simplify ReadAt for objects ed8f90b0 sstable: don't try to read past end of object bd6e947b vfs: fix MemFS ReadAt EOF behavior 914e8582 objstorage: test shared objects with TestNotExistError 101876aa *: add skip-shared iteration mode to ScanInternal d54c3292 objstorage: fix bug in manifest 20d15bd8 tool: add EnableSharedStorage method 60cfeb46 metamorphic: export package 4e468412 sstable: include SINGLEDELs within NumDeletions table property 691db988 db: create new type for file numbers on disk fa2c2ec6 internal/base: add parsing for human-readable INGESTSST,LOGDATA kinds c0ccd694 internal/base: correct InternalKeyKind prefix on IngestSST key kind Release note: None Epic: none #### storage: enable TestCreateCheckpoint_SpanConstrained The problem was fixed in Pebble. Fixes: #100935 Release note: None 101942: kvserver: always bump epoch when acquiring expired epoch lease r=erikgrinaker a=erikgrinaker Previously, a lease acquisition only bumped the epoch of an expired epoch leaseholder if the new lease was also an epoch lease. However, we have to bump the epoch regardless of the new lease type, to properly invalidate the old leaseholder's leases. This patch bumps the epoch regardless of the new lease type. Resolves #101836. Epic: none Release note: None Co-authored-by: j82w <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Erik Grinaker <[email protected]>
0a342ecc crl-release-23.1: db: fix bug with restricted checkpoints fb56da99 sstable: include SINGLEDELs within NumDeletions table property Informs cockroachdb#100935. Release note: None Epic: none
0a342ecc crl-release-23.1: db: fix bug with restricted checkpoints fb56da99 sstable: include SINGLEDELs within NumDeletions table property 8a10d6a0 sstable: reuse block property filterers in all cases Informs cockroachdb#100935. Release note: None Epic: none
There appears to be some kind of race in the span-restrained checkpoint code. The checkpoints produced are sometimes unable to be opened.
#100919 (comment)
Jira issue: CRDB-26706
The text was updated successfully, but these errors were encountered: