-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-2994 Tool required to recover log and snapshot entries with CRC errors #487
Conversation
b6fe47d
to
bd0880c
Compare
bd0880c
to
0b95efe
Compare
Looks promising - doesn't seem very useful (and potentially dangerous) without docs - perhaps add a troubleshooting or recovery section here? |
Useful addition, +1 As @phunt pointed out, docs in zookeeperAdmin.xml could be updated. |
I used your updated documentation, and managed to recover a corrupted log file: bin/zkTxnLogToolkit.sh -d ~/workspace/zookeeper/standalone/version-2/log.1 Corrupted log.1 file bin/zkTxnLogToolkit.sh -d ~/workspace/zookeeper/standalone/version-2/log.1 bin/zkTxnLogToolkit.sh -r ~/workspace/zookeeper/standalone/version-2/log.1 bin/zkTxnLogToolkit.sh -d ~/workspace/zookeeper/standalone/version-2/log.1.fixed LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested, works fine!
4c9ec91
to
221760c
Compare
…h CRC errors https://issues.apache.org/jira/browse/ZOOKEEPER-2994 In the event of ZooKeeper transaction log becomes corrupted and fail CRC checks (preventing startup) we should have a mechanism to get the cluster running again. Previously we achieved this by loading the broken transaction log with a modified version of ZK with disabled CRC check and forced it to write new txn log files. It has proven that once you end up with the corrupt txn log there is no way to recover except manually modifying the crc check. That's basically why the tool is needed. It's called TxnLogToolkit, a new console application similar to LogFormatter and SnapshotFormatter, but it's intentionally separated to keep backward compatibility in the existing tools. This PR contains TXN log tool only. You probably also notice a refactoring to extract file padding logic from FileTxnLog to reuse in the new tool. Related code changes can be reviewed alone in a separate commit if preferred. Author: Andor Molnar <[email protected]> Reviewers: [email protected] Closes #487 from anmolnar/ZOOKEEPER-2994 and squashes the following commits: 221760c [Andor Molnar] ZOOKEEPER-2994. Added documentation and startup scripts a69d729 [Andor Molnar] ZOOKEEPER-2994. Fix findbugs warning 0b95efe [Andor Molnar] ZOOKEEPER-2994. Fix for unit test 15fa45c [Andor Molnar] ZOOKEEPER-2994. Added padding, tool renamed to TxnLogToolkit, interactive mode, etc. 6a1ad0e [Andor Molnar] ZOOKEEPER-2994. Refactor FileTxnLog's padding logic to separate class for reusability 0d089cc [Andor Molnar] ZOOKEEPER-2994. Added new tool TxnLogTool for txn log file recovery Change-Id: I7560362633a7bc919ae6d3ca7e3588e196a1919c (cherry picked from commit 154f9c5) Signed-off-by: Patrick Hunt <[email protected]>
+1 Thanks @anmolnar this looks good. Please consider backporting to 3.4 (separate jira). Also in future please don't include any changed files from the toplevel docs directory (html/pdf files) as these are regenerated during commit. |
…h CRC errors https://issues.apache.org/jira/browse/ZOOKEEPER-2994 In the event of ZooKeeper transaction log becomes corrupted and fail CRC checks (preventing startup) we should have a mechanism to get the cluster running again. Previously we achieved this by loading the broken transaction log with a modified version of ZK with disabled CRC check and forced it to write new txn log files. It has proven that once you end up with the corrupt txn log there is no way to recover except manually modifying the crc check. That's basically why the tool is needed. It's called TxnLogToolkit, a new console application similar to LogFormatter and SnapshotFormatter, but it's intentionally separated to keep backward compatibility in the existing tools. This PR contains TXN log tool only. You probably also notice a refactoring to extract file padding logic from FileTxnLog to reuse in the new tool. Related code changes can be reviewed alone in a separate commit if preferred. Author: Andor Molnar <[email protected]> Reviewers: [email protected] Closes apache#487 from anmolnar/ZOOKEEPER-2994 and squashes the following commits: 221760c [Andor Molnar] ZOOKEEPER-2994. Added documentation and startup scripts a69d729 [Andor Molnar] ZOOKEEPER-2994. Fix findbugs warning 0b95efe [Andor Molnar] ZOOKEEPER-2994. Fix for unit test 15fa45c [Andor Molnar] ZOOKEEPER-2994. Added padding, tool renamed to TxnLogToolkit, interactive mode, etc. 6a1ad0e [Andor Molnar] ZOOKEEPER-2994. Refactor FileTxnLog's padding logic to separate class for reusability 0d089cc [Andor Molnar] ZOOKEEPER-2994. Added new tool TxnLogTool for txn log file recovery Change-Id: I7560362633a7bc919ae6d3ca7e3588e196a1919c
…h CRC errors https://issues.apache.org/jira/browse/ZOOKEEPER-2994 In the event of ZooKeeper transaction log becomes corrupted and fail CRC checks (preventing startup) we should have a mechanism to get the cluster running again. Previously we achieved this by loading the broken transaction log with a modified version of ZK with disabled CRC check and forced it to write new txn log files. It has proven that once you end up with the corrupt txn log there is no way to recover except manually modifying the crc check. That's basically why the tool is needed. It's called TxnLogToolkit, a new console application similar to LogFormatter and SnapshotFormatter, but it's intentionally separated to keep backward compatibility in the existing tools. This PR contains TXN log tool only. You probably also notice a refactoring to extract file padding logic from FileTxnLog to reuse in the new tool. Related code changes can be reviewed alone in a separate commit if preferred. Author: Andor Molnar <[email protected]> Reviewers: [email protected] Closes apache#487 from anmolnar/ZOOKEEPER-2994 and squashes the following commits: 221760c [Andor Molnar] ZOOKEEPER-2994. Added documentation and startup scripts a69d729 [Andor Molnar] ZOOKEEPER-2994. Fix findbugs warning 0b95efe [Andor Molnar] ZOOKEEPER-2994. Fix for unit test 15fa45c [Andor Molnar] ZOOKEEPER-2994. Added padding, tool renamed to TxnLogToolkit, interactive mode, etc. 6a1ad0e [Andor Molnar] ZOOKEEPER-2994. Refactor FileTxnLog's padding logic to separate class for reusability 0d089cc [Andor Molnar] ZOOKEEPER-2994. Added new tool TxnLogTool for txn log file recovery Change-Id: I7560362633a7bc919ae6d3ca7e3588e196a1919c
…h CRC errors https://issues.apache.org/jira/browse/ZOOKEEPER-2994 In the event of ZooKeeper transaction log becomes corrupted and fail CRC checks (preventing startup) we should have a mechanism to get the cluster running again. Previously we achieved this by loading the broken transaction log with a modified version of ZK with disabled CRC check and forced it to write new txn log files. It has proven that once you end up with the corrupt txn log there is no way to recover except manually modifying the crc check. That's basically why the tool is needed. It's called TxnLogToolkit, a new console application similar to LogFormatter and SnapshotFormatter, but it's intentionally separated to keep backward compatibility in the existing tools. This PR contains TXN log tool only. You probably also notice a refactoring to extract file padding logic from FileTxnLog to reuse in the new tool. Related code changes can be reviewed alone in a separate commit if preferred. Author: Andor Molnar <[email protected]> Reviewers: [email protected] Closes apache#487 from anmolnar/ZOOKEEPER-2994 and squashes the following commits: 221760c [Andor Molnar] ZOOKEEPER-2994. Added documentation and startup scripts a69d729 [Andor Molnar] ZOOKEEPER-2994. Fix findbugs warning 0b95efe [Andor Molnar] ZOOKEEPER-2994. Fix for unit test 15fa45c [Andor Molnar] ZOOKEEPER-2994. Added padding, tool renamed to TxnLogToolkit, interactive mode, etc. 6a1ad0e [Andor Molnar] ZOOKEEPER-2994. Refactor FileTxnLog's padding logic to separate class for reusability 0d089cc [Andor Molnar] ZOOKEEPER-2994. Added new tool TxnLogTool for txn log file recovery Change-Id: I7560362633a7bc919ae6d3ca7e3588e196a1919c
…h CRC errors (3.4) This is the 3.4 version of #487 phunt I've just realized that the patch must introduce a new dependency: commons-cli. Not sure if you're willing to merge it in this case. Author: Andor Molnar <[email protected]> Reviewers: [email protected] Closes #508 from anmolnar/ZOOKEEPER-2994_34 and squashes the following commits: 357ab2b [Andor Molnar] ZOOKEEPER-2994. Removed dependency of commons.cli. Use custom impl instead. 3bc2e5f [Andor Molnar] ZOOKEEPER-2994: Tool required to recover log and snapshot entries with CRC errors Change-Id: I7def29dc338726c3eccb0a4fd4530a1ffb0f3932
…h CRC errors https://issues.apache.org/jira/browse/ZOOKEEPER-2994 In the event of ZooKeeper transaction log becomes corrupted and fail CRC checks (preventing startup) we should have a mechanism to get the cluster running again. Previously we achieved this by loading the broken transaction log with a modified version of ZK with disabled CRC check and forced it to write new txn log files. It has proven that once you end up with the corrupt txn log there is no way to recover except manually modifying the crc check. That's basically why the tool is needed. It's called TxnLogToolkit, a new console application similar to LogFormatter and SnapshotFormatter, but it's intentionally separated to keep backward compatibility in the existing tools. This PR contains TXN log tool only. You probably also notice a refactoring to extract file padding logic from FileTxnLog to reuse in the new tool. Related code changes can be reviewed alone in a separate commit if preferred. Author: Andor Molnar <[email protected]> Reviewers: [email protected] Closes apache#487 from anmolnar/ZOOKEEPER-2994 and squashes the following commits: 221760c [Andor Molnar] ZOOKEEPER-2994. Added documentation and startup scripts a69d729 [Andor Molnar] ZOOKEEPER-2994. Fix findbugs warning 0b95efe [Andor Molnar] ZOOKEEPER-2994. Fix for unit test 15fa45c [Andor Molnar] ZOOKEEPER-2994. Added padding, tool renamed to TxnLogToolkit, interactive mode, etc. 6a1ad0e [Andor Molnar] ZOOKEEPER-2994. Refactor FileTxnLog's padding logic to separate class for reusability 0d089cc [Andor Molnar] ZOOKEEPER-2994. Added new tool TxnLogTool for txn log file recovery Change-Id: I7560362633a7bc919ae6d3ca7e3588e196a1919c
…h CRC errors https://issues.apache.org/jira/browse/ZOOKEEPER-2994 In the event of ZooKeeper transaction log becomes corrupted and fail CRC checks (preventing startup) we should have a mechanism to get the cluster running again. Previously we achieved this by loading the broken transaction log with a modified version of ZK with disabled CRC check and forced it to write new txn log files. It has proven that once you end up with the corrupt txn log there is no way to recover except manually modifying the crc check. That's basically why the tool is needed. It's called TxnLogToolkit, a new console application similar to LogFormatter and SnapshotFormatter, but it's intentionally separated to keep backward compatibility in the existing tools. This PR contains TXN log tool only. You probably also notice a refactoring to extract file padding logic from FileTxnLog to reuse in the new tool. Related code changes can be reviewed alone in a separate commit if preferred. Author: Andor Molnar <[email protected]> Reviewers: [email protected] Closes apache#487 from anmolnar/ZOOKEEPER-2994 and squashes the following commits: 221760c [Andor Molnar] ZOOKEEPER-2994. Added documentation and startup scripts a69d729 [Andor Molnar] ZOOKEEPER-2994. Fix findbugs warning 0b95efe [Andor Molnar] ZOOKEEPER-2994. Fix for unit test 15fa45c [Andor Molnar] ZOOKEEPER-2994. Added padding, tool renamed to TxnLogToolkit, interactive mode, etc. 6a1ad0e [Andor Molnar] ZOOKEEPER-2994. Refactor FileTxnLog's padding logic to separate class for reusability 0d089cc [Andor Molnar] ZOOKEEPER-2994. Added new tool TxnLogTool for txn log file recovery Change-Id: I7560362633a7bc919ae6d3ca7e3588e196a1919c
https://issues.apache.org/jira/browse/ZOOKEEPER-2994
In the event of ZooKeeper transaction log becomes corrupted and fail CRC checks (preventing startup) we should have a mechanism to get the cluster running again.
Previously we achieved this by loading the broken transaction log with a modified version of ZK with disabled CRC check and forced it to write new txn log files.
It has proven that once you end up with the corrupt txn log there is no way to recover except manually modifying the crc check. That's basically why the tool is needed.
It's called TxnLogToolkit, a new console application similar to LogFormatter and SnapshotFormatter, but it's intentionally separated to keep backward compatibility in the existing tools.
This PR contains TXN log tool only.
You probably also notice a refactoring to extract file padding logic from FileTxnLog to reuse in the new tool. Related code changes can be reviewed alone in a separate commit if preferred.