backup: "invalid header size" error during restore #40670

solongordon · 2019-09-11T15:16:55Z

I encountered a concerning error while trying to restore a registration cluster backup to a roachprod cluster:

root@localhost:26257/defaultdb> RESTORE TABLE registration.* FROM 's3://cockroach-reg-backups/2019-09-01?AWS_ACCESS_KEY_ID=<redacted>&AWS_SECRET_ACCESS_KEY=<redacted>';
pq: importing 12095 ranges: importing span /Table/78/1/"\r{=RΡ\xadEj\x88iM\x03g+N\x0e"/4/1920-09-16T09:02:26.852719999Z/"SELECT _, _, _ FROM _ AS OF SYSTEM TIME _ WHERE _ = _"/1/0/"$ internal-read orphaned table leases"-k7~k\xd4N\xea\xa7\t\xa9\xe4\xa6\\\x01\x8a"/1/1920-10-04T04:48:11.013350999Z/"SELECT _, _ FROM _ WHERE (_ IN ($1, $2, __more1__)) AND (_ < $4) ORDER BY _ LIMIT _"/1/0/"$ internal-gc-jobs"}: adding to batch: /Table/71/1/"\ri\x91\x815}H\xf6\x9a\xdbDP2\x1e\xa2t"/1/1920-07-14T04:41:27.182670999Z/"UPDATE _ SET _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = -_, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _ WHERE _ = _"/0/0/"40f09bee"/0/1561058311.817775579,4 -> /TUPLE/4:4:Bytes/v2.0.2/1:5:Int/356418/1:6:False/false/5:11:Int/81/1:12:Int/0/1:13:Int/81/2:15:Float/0.5/1:16:Float/40.5/1:17:Float/0.0002460487839506174/1:18:Float/1.2843239990119444e-05/1:19:Float/0.00011604993209876539/1:20:Float/4.570659468070251e-06/1:21:Float/0.0009339686666666668/1:22:Float/0.00017130743693108405/1:23:Float/0.003720578672839506/1:24:Float/0.0030473801431714874/1:25:Float/0.005016646055555556/1:26:Float/0.005024995901326392: computing stats for SST [/Table/71/1/"\ri\x91\x815}H\xf6\x9a\xdbDP2\x1e\xa2t"/1/1920-05-05T15:38:08.254344999Z/"SELECT _, _, _, _, _, _, _, _, _ FROM _ WHERE _ IN (_, _)"/0/0/"40f09bee"/0, /Table/71/1/"\ri\x91\x815}H\xf6\x9a\xdbDP2\x1e\xa2t"/1/1920-07-14T04:41:27.178937999Z/"UPDATE _ SET _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = -_, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _ WHERE _ = _"/0/0/"40f09bee"/0/NULL): /Table/71/1/"\ri\x91\x815}H\xf6\x9a\xdbDP2\x1e\xa2t"/1/1920-05-05T21:38:06.091400999Z/"UPDATE _ SET _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ =
_, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _ WHERE _ = _"/0/0/"40f09bee"/0: invalid header size: 4

I tried the same restore out on a few different cockroach versions and observed the error on v19.2.0-beta.20190826 and later but not on v19.2.0-alpha.20190805 and earlier.

Repro steps:

CLUSTER=$USER-secure
roachprod create $CLUSTER -n 3 --clouds=aws --aws-machine-type-ssd=c5d.4xlarge
roachprod stage $CLUSTER:1-3 cockroach
roachprod start $CLUSTER:1-3 --secure
roachprod sql $CLUSTER:1 --secure

Then run the following statements, filling in sensitive info as necessary:

SET CLUSTER SETTING cluster.organization = '<redacted>';
SET CLUSTER SETTING enterprise.license = '<redacted>';
CREATE DATABASE registration;
RESTORE TABLE registration.* FROM 's3://cockroach-reg-backups/2019-09-01?AWS_ACCESS_KEY_ID=<redacted>&AWS_SECRET_ACCESS_KEY=<redacted>';

The error should appear within 30 seconds.

The text was updated successfully, but these errors were encountered:

jordanlewis · 2019-09-12T15:45:01Z

@pbardea @solongordon is this a release blocker?

solongordon · 2019-09-12T16:19:58Z

Yes, @lucy-zhang added it to the list this morning.

@pbardea has bisected this issue to a commit which bumped the Pebble version. So far the reg cluster backups are the only known example of the error.

pbardea · 2019-09-17T23:10:24Z

cc @petermattis
Through experimentation I found that the issue seems to be related to the introduction of the two level index block in pebble. Commenting out the line: https://github.com/cockroachdb/pebble/blob/master/sstable/writer.go#L434 seems to allow the import progress. It's not clear to me yet how this relates to a value.

pbardea · 2019-09-18T16:50:42Z

It also seems that after the import succeeds without that line, we are able to read the data from the restore. Unsure if that's expected considering that I think this means that the topLevleIndex is empty (I assume it may just scan all the data blocks in the SST in this case?)

petermattis · 2019-09-18T16:57:12Z

Huh, commenting out the line you indicated seems really problematic as we would create invalid sstables (the top-level index would be broken). Can you instead try setting Options.IndexBlockSize = math.MaxInt32? That is the "correct" way to disable two-level indexes.

pbardea · 2019-09-18T17:22:29Z

It looks like that also resolves the issue. (The RESTORE is not yet complete, but usually errors out quite quickly -- will update when the RESTORE completes).

In this case does it look like this is an Pebble issue? (I haven't found anything above this in the stack that looks amiss otherwise.) If so, I can file an issue and set the index block size as described above as a temporary work-around until the two level index issue is resolved?

(For posterity: Yesterday I also noticed that the issue disappeared when I toggled https://github.com/cockroachdb/pebble/blob/master/sstable/writer.go#L364 to be w.twoLevelIndex = false, which I believe would also force the usage of a single index block.)

petermattis · 2019-09-18T17:24:35Z

In this case does it look like this is an Pebble issue? (I haven't found anything above this in the stack that looks amiss otherwise.) If so, I can file an issue and set the index block size as described above as a temporary work-around until the two level index issue is resolved?

Yes. Two-level indexes were only recently added to pebble. We don't actually enable them for RocksDB. Totally fine to disable them.

It will be useful for the issue you file to have reproduction instructions. Please include the SHA of cockroachdb you were running at.

(For posterity: Yesterday I also noticed that the issue disappeared when I toggled https://github.com/cockroachdb/pebble/blob/master/sstable/writer.go#L364 to be w.twoLevelIndex = false, which I believe would also force the usage of a single index block.)

Right. That's the brute force way to disable two-level indexes.

Setting the IndexBlockSize to MaxInt disables two level indexes. Using two level indexes cause issues restoring some registration cluster backups. This change servers as a work-around until cockroachdb/pebble#285 is resolved. Fixes cockroachdb#40670. Release justification: RESTOREs on registration cluster backups started failing after enabling two level indexes in Pebble. This was a release blocking bug and this fix allows these backups to be restored again until more investigation is done in the two level index issue. Release note: None

petermattis · 2019-09-18T21:19:30Z

If I understand correctly, Pebble is being used to write the sstables which are then ingested into RocksDB, right? It is possible RocksDB has a bug in handling two-level indexes.

petermattis · 2019-09-19T15:54:35Z

Correcting my misunderstanding above: Pebble is being used to write the sstables and then golang/leveldb/table is being used to iterate over them in order to compute range stats. golang/leveldb/table doesn't understand two-level indexes. We should really change that code to use pebble/sstable instead, though there is also a bug in Pebble here. pebble/sstable.Writer should create LevelDB compatible tables when asked to do so (and it wasn't).

solongordon added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery labels Sep 11, 2019

andy-kimball mentioned this issue Sep 12, 2019

19.2 release blockers list #40447

Closed

53 tasks

pbardea self-assigned this Sep 12, 2019

pbardea mentioned this issue Sep 18, 2019

sstable: errors while using writer with two level index cockroachdb/pebble#285

Closed

pbardea mentioned this issue Sep 18, 2019

bulk: disable two level index in Pebble #40888

Merged

craig bot closed this as completed in dd5aa30 Sep 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backup: "invalid header size" error during restore #40670

backup: "invalid header size" error during restore #40670

solongordon commented Sep 11, 2019

jordanlewis commented Sep 12, 2019

solongordon commented Sep 12, 2019

pbardea commented Sep 17, 2019 •

edited

Loading

pbardea commented Sep 18, 2019

petermattis commented Sep 18, 2019

pbardea commented Sep 18, 2019

petermattis commented Sep 18, 2019

petermattis commented Sep 18, 2019

petermattis commented Sep 19, 2019

backup: "invalid header size" error during restore #40670

backup: "invalid header size" error during restore #40670

Comments

solongordon commented Sep 11, 2019

jordanlewis commented Sep 12, 2019

solongordon commented Sep 12, 2019

pbardea commented Sep 17, 2019 • edited Loading

pbardea commented Sep 18, 2019

petermattis commented Sep 18, 2019

pbardea commented Sep 18, 2019

petermattis commented Sep 18, 2019

petermattis commented Sep 18, 2019

petermattis commented Sep 19, 2019

pbardea commented Sep 17, 2019 •

edited

Loading