Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corruption: missing start of fragmented record(2) when opening Pebble generated db in Rocksdb #566

Closed
lni opened this issue Mar 11, 2020 · 13 comments · Fixed by #571
Closed

Comments

@lni
Copy link
Contributor

lni commented Mar 11, 2020

Hi @petermattis

I got this error when trying to open a Pebble (v0.0.0-20200309175429-4cd6b7883221) generated DB in RocksDB 6.4/6.6. Zipped DB is attached below. Could you please help to have a quick look? Thanks!

data.zip

@petermattis
Copy link
Collaborator

Thanks for the report. I'll take a look.

@petermattis
Copy link
Collaborator

@lni How do you reproduce an error here? I've tried using rocksdb_ldb scan --db=logdb-0 and that runs without error. Similarly, rocksdb_ldb checkconsistency --db=logdb-0 reports OK.

@petermattis
Copy link
Collaborator

I'm using RocksDB 6.6.4 for the above commands.

@lni
Copy link
Contributor Author

lni commented Mar 12, 2020

@petermattis

I just realized that the error is reported when RocksDB's wal_recovery_mode is set to kTolerateCorruptedTailRecords. It can be reproduced using the cpp code below.

Any chance the generated wal has format compatibility issue?

#include <iostream>
#include "rocksdb/db.h"

int main() {
  rocksdb::DB* db;
  rocksdb::Options options;
  options.wal_recovery_mode = rocksdb::WALRecoveryMode::kTolerateCorruptedTailRecords;
  rocksdb::Status s = rocksdb::DB::Open(options, "logdb-0", &db);
  if(!s.ok()) {
    std::cerr << "failed to open the db: " << s.ToString() << std::endl;
  }
}

@lni
Copy link
Contributor Author

lni commented Mar 12, 2020

Setting the wal_recovery_mode to kAbsoluteConsistency will produce the same error. Also probably worth to mention that the attached DB was generated without any crash.

@petermattis
Copy link
Collaborator

Thanks for the code. I've got this reproduced. Looking further.

@petermattis
Copy link
Collaborator

I believe what we're seeing here is an incompatibility between WAL recycling and certain RocksDB WAL recovery modes. See facebook/rocksdb#6351, though note I think that PR is slightly flawed as I think kPointInTimeRecovery is the only WAL recovery mode that is compatible with WAL recycling. @ajkr you commented on facebook/rocksdb#6351 (comment) stating similar. Did you ever change your mind?

@lni Pebble currently doesn't provide an option to disable WAL recycling. I'm slightly disinclined to add one as I have an aversion to the proliferation of options (Pebble already has too many). My preference is to document the compatibility requirement here. It would be nice to verify that this is, in fact, the problem. In open.go, you can change the logRecycler.limit to 0. That should prevent any log recycling from taking place.

@ajkr
Copy link
Contributor

ajkr commented Mar 12, 2020

Hi @petermattis, I haven't changed my mind about that, it still feels like kPointInTimeRecovery should be the one that's compatible with recycling. Will ask around.

@ajkr
Copy link
Contributor

ajkr commented Mar 12, 2020

That change came from a 2PC incompatibility with WAL recycling + kPointInTimeRecovery. It is unclear whether there was ever a problem for non-2PC use cases. So we'll rethink the fix.

@petermattis
Copy link
Collaborator

Thanks, @ajkr. I did confirm that @lni's program plus his DB fails to open with kTolerateCorruptedTailRecords and kAbsoluteConsistency, but it does open successfully with kPointInTimeRecovery.

@lni
Copy link
Contributor Author

lni commented Mar 13, 2020

@petermattis Thanks for looking into this. I can confirm that the above reported corruption is gone once I changed logRecycler.limit to 1.

@lni
Copy link
Contributor Author

lni commented Mar 15, 2020

sorry for the typo above, logRecycler.limit was set to 0, not 1.

@ajkr
Copy link
Contributor

ajkr commented Aug 14, 2020

That change came from a 2PC incompatibility with WAL recycling + kPointInTimeRecovery. It is unclear whether there was ever a problem for non-2PC use cases. So we'll rethink the fix.

Finally got around to rethinking the fix. After experimenting a bit, it turns out there probably was a bug in RocksDB's implementation of kPointInTimeRecovery + WAL recycling. I still think the two features should be fundamentally compatible, but it looks we currently mishandle this case -- facebook/rocksdb#7252 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants