[wip] Revert "Disable recycle_log_file_num when it is incompatible with recovery mode (#6351)" #7252

ajkr · 2020-08-13T18:10:13Z

We don't know what the reverted PR solves but suspect it had something to do with 2PC. However, that PR disabled WAL recycling in general scenarios not specific to 2PC, which caused confusion. If we figure out the 2PC problem, it needs to be solved in a more targeted way.

…overy mode (facebook#6351)" This reverts commit 3316d29.

facebook-github-bot

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ajkr · 2020-08-13T23:57:27Z

Experimented with it a bit. This behavior doesn't look good:

Create two WALs containing unflushed data for default CF.

In [1]: import rocksdb

In [2]: db = rocksdb.DB('./test-db', rocksdb.Options(create_if_missing=True, wal_recovery_mode='kPointInTimeRecovery', recycle_log_file_num=2))

In [3]: db.create_column_family('a')
Out[3]: 'a'

In [4]: db.put(b'ok', b'ok', 'a')  # PUT("ok", "ok") in CF a

In [5]: db.put(b'key1', b'wal1')  # PUT("key1", "wal1") in CF default 

In [6]: db.flush('a')  # Flush CF a to trigger a new WAL to be cut. Now there are two unflushed WALs for CF default

In [7]: db.put(b'key2', b'wal2')  # PUT("key2", "wal2") in CF default

In [8]: del db

Manually corrupt the second entry in the first WAL (PUT("key1", "wal1"))
Reopen the DB. Observe recovery permitted a hole, which is not a consistent point in time.

In [3]: db = rocksdb.DB('./test-db', rocksdb.Options(create_if_missing=True, wal_recovery_mode='kPointInTimeRecovery', recycle_log_file_num=2), column_families=['default', 'a'])

In [4]: db.get(b'key1'). # DB contains nothing for key1

In [5]: db.get(b'key2'). # Meanwhile, it contains the value for "key2", which is supposed to be newer!
Out[5]: b'wal2'

ajkr · 2020-08-14T00:13:53Z

Well, the exact same problem still happens with kTolerateCorruptedTailRecords, which we have yet to mark incompatible with WAL recycling. Arguably that one is worse as it's supposed to give a stronger recovery guarantee. Anyways, the plan is now to enable it in kPointInTimeRecovery and disable it in kTolerateCorruptedTailRecords. To do this, we first need to make a checksum mismatch error actually stop the WAL replay, even if newer WALs are present.

ajkr · 2020-10-16T20:05:08Z

No longer WIP (at least not by me). For the short term fix for the problem above, #7271 disabled the feature more broadly.

ajkr added 2 commits August 13, 2020 11:03

Revert "Disable recycle_log_file_num when it is incompatible with rec…

7cfc3e8

…overy mode (facebook#6351)" This reverts commit 3316d29.

update history

0019fb4

facebook-github-bot added the CLA Signed label Aug 13, 2020

ajkr requested review from riversand963 and siying August 13, 2020 18:10

facebook-github-bot reviewed Aug 13, 2020

View reviewed changes

ajkr changed the title ~~Revert "Disable recycle_log_file_num when it is incompatible with recovery mode (#6351)"~~ [wip] Revert "Disable recycle_log_file_num when it is incompatible with recovery mode (#6351)" Aug 14, 2020

This was referenced Aug 14, 2020

Disable recycle_log_file_num when it is incompatible with recovery mode #6351

Closed

Corruption: missing start of fragmented record(2) when opening Pebble generated db in Rocksdb cockroachdb/pebble#566

Closed

petermattis mentioned this pull request Aug 14, 2020

db: problematic point-in-time handling of WAL checksum failures cockroachdb/pebble#864

Closed

ajkr closed this Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] Revert "Disable recycle_log_file_num when it is incompatible with recovery mode (#6351)" #7252

[wip] Revert "Disable recycle_log_file_num when it is incompatible with recovery mode (#6351)" #7252

ajkr commented Aug 13, 2020 •

edited

Loading

facebook-github-bot left a comment

ajkr commented Aug 13, 2020 •

edited

Loading

ajkr commented Aug 14, 2020 •

edited

Loading

ajkr commented Oct 16, 2020

[wip] Revert "Disable recycle_log_file_num when it is incompatible with recovery mode (#6351)" #7252

[wip] Revert "Disable recycle_log_file_num when it is incompatible with recovery mode (#6351)" #7252

Conversation

ajkr commented Aug 13, 2020 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

ajkr commented Aug 13, 2020 • edited Loading

ajkr commented Aug 14, 2020 • edited Loading

ajkr commented Oct 16, 2020

ajkr commented Aug 13, 2020 •

edited

Loading

ajkr commented Aug 13, 2020 •

edited

Loading

ajkr commented Aug 14, 2020 •

edited

Loading