Skip to content

Commit

Permalink
ZOOKEEPER-2994: Tool required to recover log and snapshot entries wit…
Browse files Browse the repository at this point in the history
…h CRC errors (3.4)

This is the 3.4 version of #487
phunt I've just realized that the patch must introduce a new dependency: commons-cli.
Not sure if you're willing to merge it in this case.

Author: Andor Molnar <[email protected]>

Reviewers: [email protected]

Closes #508 from anmolnar/ZOOKEEPER-2994_34 and squashes the following commits:

357ab2b [Andor Molnar] ZOOKEEPER-2994. Removed dependency of commons.cli. Use custom impl instead.
3bc2e5f [Andor Molnar] ZOOKEEPER-2994: Tool required to recover log and snapshot entries with CRC errors

Change-Id: I7def29dc338726c3eccb0a4fd4530a1ffb0f3932
  • Loading branch information
anmolnar authored and phunt committed Apr 25, 2018
1 parent eacb4e4 commit 126fb0f
Show file tree
Hide file tree
Showing 33 changed files with 967 additions and 85 deletions.
24 changes: 24 additions & 0 deletions bin/zkTxnLogToolkit.cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
@echo off
REM Licensed to the Apache Software Foundation (ASF) under one or more
REM contributor license agreements. See the NOTICE file distributed with
REM this work for additional information regarding copyright ownership.
REM The ASF licenses this file to You under the Apache License, Version 2.0
REM (the "License"); you may not use this file except in compliance with
REM the License. You may obtain a copy of the License at
REM
REM http://www.apache.org/licenses/LICENSE-2.0
REM
REM Unless required by applicable law or agreed to in writing, software
REM distributed under the License is distributed on an "AS IS" BASIS,
REM WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
REM See the License for the specific language governing permissions and
REM limitations under the License.

setlocal
call "%~dp0zkEnv.cmd"

set ZOOMAIN=org.apache.zookeeper.server.persistence.TxnLogToolkit
call %JAVA% -cp "%CLASSPATH%" %ZOOMAIN% %*

endlocal

38 changes: 38 additions & 0 deletions bin/zkTxnLogToolkit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# If this scripted is run out of /usr/bin or some other system bin directory
# it should be linked to and not copied. Things like java jar files are found
# relative to the canonical path of this script.
#

# use POSIX interface, symlink is followed automatically
ZOOBIN="${BASH_SOURCE-$0}"
ZOOBIN="$(dirname "${ZOOBIN}")"
ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"

if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR"/../libexec/zkEnv.sh
else
. "$ZOOBINDIR"/zkEnv.sh
fi

"$JAVA" -cp "$CLASSPATH" $JVMFLAGS \
org.apache.zookeeper.server.persistence.TxnLogToolkit "$@"


Binary file modified docs/bookkeeperConfig.pdf
Binary file not shown.
Binary file modified docs/bookkeeperOverview.pdf
Binary file not shown.
Binary file modified docs/bookkeeperProgrammer.pdf
Binary file not shown.
Binary file modified docs/bookkeeperStarted.pdf
Binary file not shown.
Binary file modified docs/bookkeeperStream.pdf
Binary file not shown.
Binary file modified docs/index.pdf
Binary file not shown.
Binary file modified docs/javaExample.pdf
Binary file not shown.
Binary file modified docs/linkmap.pdf
Binary file not shown.
Binary file modified docs/recipes.pdf
Binary file not shown.
62 changes: 62 additions & 0 deletions docs/zookeeperAdmin.html
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,9 @@ <h3>A Guide to Deployment and Administration</h3>
<li>
<a href="#sc_filemanagement">File Management</a>
</li>
<li>
<a href="#Recovery+-+TxnLogToolkit">Recovery - TxnLogToolkit</a>
</li>
</ul>
</li>
<li>
Expand Down Expand Up @@ -2063,6 +2066,65 @@ <h4>File Management</h4>

</div>
</div>
<a name="Recovery+-+TxnLogToolkit"></a>
<h4>Recovery - TxnLogToolkit</h4>
<p>TxnLogToolkit is a command line tool shipped with ZooKeeper which
is capable of recovering transaction log entries with broken CRC.</p>
<p>Running it without any command line parameters or with the "-h,--help"
argument, it outputs the following help page:</p>
<pre class="code">
$ bin/zkTxnLogToolkit.sh

usage: TxnLogToolkit [-dhrv] txn_log_file_name
-d,--dump Dump mode. Dump all entries of the log file. (this is the default)
-h,--help Print help message
-r,--recover Recovery mode. Re-calculate CRC for broken entries.
-v,--verbose Be verbose in recovery mode: print all entries, not just fixed ones.
-y,--yes Non-interactive mode: repair all CRC errors without asking
</pre>
<p>The default behaviour is safe: it dumps the entries of the given
transaction log file to the screen: (same as using '-d,--dump' parameter)</p>
<pre class="code">
$ bin/zkTxnLogToolkit.sh log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
4/5/18 2:15:58 PM CEST session 0x16295bafcc40000 cxid 0x0 zxid 0x100000001 createSession 30000
<strong>CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null</strong>
4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
4/5/18 2:16:12 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x100000003 createSession 30000
4/5/18 2:17:34 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x200000001 closeSession null
4/5/18 2:17:34 PM CEST session 0x16295bd23720000 cxid 0x0 zxid 0x200000002 createSession 30000
4/5/18 2:18:02 PM CEST session 0x16295bd23720000 cxid 0x2 zxid 0x200000003 create '/andor,#626262,v{s{31,s{'world,'anyone}}},F,1
EOF reached after 6 txns.
</pre>
<p>There's a CRC error in the 2nd entry of the above transaction log file. In <strong>dump</strong>
mode, the toolkit only prints this information to the screen without touching the original file. In
<strong>recovery</strong> mode (-r,--recover flag) the original file still remains
untouched and all transactions will be copied over to a new txn log file with ".fixed" suffix. It recalculates
CRC values and copies the calculated value, if it doesn't match the original txn entry.
By default, the tool works interactively: it asks for confirmation whenever CRC error encountered.</p>
<pre class="code">
$ bin/zkTxnLogToolkit.sh -r log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ?
</pre>
<p>Answering <strong>Yes</strong> means the newly calculated CRC value will be outputted
to the new file. <strong>No</strong> means that the original CRC value will be copied over.
<strong>Abort</strong> will abort the entire operation and exits.
(In this case the ".fixed" will not be deleted and left in a half-complete state: contains only entries which
have already been processed or only the header if the operation was aborted at the first entry.)</p>
<pre class="code">
$ bin/zkTxnLogToolkit.sh -r log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ? y
EOF reached after 6 txns.
Recovery file log.100000001.fixed has been written with 1 fixed CRC error(s)
</pre>
<p>The default behaviour of recovery is to be silent: only entries with CRC error get printed to the screen.
One can turn on verbose mode with the -v,--verbose parameter to see all records.
Interactive mode can be turned off with the -y,--yes parameter. In this case all CRC errors will be fixed
in the new transaction file.</p>
<a name="sc_commonProblems"></a>
<h3 class="h4">Things to Avoid</h3>
<p>Here are some common problems you can avoid by configuring
Expand Down
Binary file modified docs/zookeeperAdmin.pdf
Binary file not shown.
Binary file modified docs/zookeeperHierarchicalQuorums.pdf
Binary file not shown.
Binary file modified docs/zookeeperInternals.pdf
Binary file not shown.
Binary file modified docs/zookeeperJMX.pdf
Binary file not shown.
Binary file modified docs/zookeeperObservers.pdf
Binary file not shown.
Binary file modified docs/zookeeperOver.pdf
Binary file not shown.
Binary file modified docs/zookeeperProgrammers.pdf
Binary file not shown.
Binary file modified docs/zookeeperQuotas.pdf
Binary file not shown.
Binary file modified docs/zookeeperStarted.pdf
Binary file not shown.
Binary file modified docs/zookeeperTutorial.pdf
Binary file not shown.
70 changes: 70 additions & 0 deletions src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1702,6 +1702,76 @@ imok
individual settings in which it is being deployed. </para>
</note>
</section>

<section>
<title>Recovery - TxnLogToolkit</title>

<para>TxnLogToolkit is a command line tool shipped with ZooKeeper which
is capable of recovering transaction log entries with broken CRC.</para>
<para>Running it without any command line parameters or with the "-h,--help"
argument, it outputs the following help page:</para>

<programlisting>
$ bin/zkTxnLogToolkit.sh

usage: TxnLogToolkit [-dhrv] txn_log_file_name
-d,--dump Dump mode. Dump all entries of the log file. (this is the default)
-h,--help Print help message
-r,--recover Recovery mode. Re-calculate CRC for broken entries.
-v,--verbose Be verbose in recovery mode: print all entries, not just fixed ones.
-y,--yes Non-interactive mode: repair all CRC errors without asking
</programlisting>

<para>The default behaviour is safe: it dumps the entries of the given
transaction log file to the screen: (same as using '-d,--dump' parameter)</para>

<programlisting>
$ bin/zkTxnLogToolkit.sh log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
4/5/18 2:15:58 PM CEST session 0x16295bafcc40000 cxid 0x0 zxid 0x100000001 createSession 30000
<emphasis role="bold">CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null</emphasis>
4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
4/5/18 2:16:12 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x100000003 createSession 30000
4/5/18 2:17:34 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x200000001 closeSession null
4/5/18 2:17:34 PM CEST session 0x16295bd23720000 cxid 0x0 zxid 0x200000002 createSession 30000
4/5/18 2:18:02 PM CEST session 0x16295bd23720000 cxid 0x2 zxid 0x200000003 create '/andor,#626262,v{s{31,s{'world,'anyone}}},F,1
EOF reached after 6 txns.
</programlisting>

<para>There's a CRC error in the 2nd entry of the above transaction log file. In <emphasis role="bold">dump</emphasis>
mode, the toolkit only prints this information to the screen without touching the original file. In
<emphasis role="bold">recovery</emphasis> mode (-r,--recover flag) the original file still remains
untouched and all transactions will be copied over to a new txn log file with ".fixed" suffix. It recalculates
CRC values and copies the calculated value, if it doesn't match the original txn entry.
By default, the tool works interactively: it asks for confirmation whenever CRC error encountered.</para>

<programlisting>
$ bin/zkTxnLogToolkit.sh -r log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ?
</programlisting>

<para>Answering <emphasis role="bold">Yes</emphasis> means the newly calculated CRC value will be outputted
to the new file. <emphasis role="bold">No</emphasis> means that the original CRC value will be copied over.
<emphasis role="bold">Abort</emphasis> will abort the entire operation and exits.
(In this case the ".fixed" will not be deleted and left in a half-complete state: contains only entries which
have already been processed or only the header if the operation was aborted at the first entry.)</para>

<programlisting>
$ bin/zkTxnLogToolkit.sh -r log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ? y
EOF reached after 6 txns.
Recovery file log.100000001.fixed has been written with 1 fixed CRC error(s)
</programlisting>

<para>The default behaviour of recovery is to be silent: only entries with CRC error get printed to the screen.
One can turn on verbose mode with the -v,--verbose parameter to see all records.
Interactive mode can be turned off with the -y,--yes parameter. In this case all CRC errors will be fixed
in the new transaction file.</para>
</section>
</section>

<section id="sc_commonProblems">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

public class TraceFormatter {

static String op2String(int op) {
public static String op2String(int op) {
switch (op) {
case OpCode.notification:
return "notification";
Expand Down
105 changes: 105 additions & 0 deletions src/java/main/org/apache/zookeeper/server/persistence/FilePadding.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.zookeeper.server.persistence;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class FilePadding {
private static final Logger LOG;
private static long preAllocSize = 65536 * 1024;
private static final ByteBuffer fill = ByteBuffer.allocateDirect(1);

static {
LOG = LoggerFactory.getLogger(FileTxnLog.class);

String size = System.getProperty("zookeeper.preAllocSize");
if (size != null) {
try {
preAllocSize = Long.parseLong(size) * 1024;
} catch (NumberFormatException e) {
LOG.warn(size + " is not a valid value for preAllocSize");
}
}
}

private long currentSize;

/**
* method to allow setting preallocate size
* of log file to pad the file.
*
* @param size the size to set to in bytes
*/
public static void setPreallocSize(long size) {
preAllocSize = size;
}

public void setCurrentSize(long currentSize) {
this.currentSize = currentSize;
}

/**
* pad the current file to increase its size to the next multiple of preAllocSize greater than the current size and position
*
* @param fileChannel the fileChannel of the file to be padded
* @throws IOException
*/
long padFile(FileChannel fileChannel) throws IOException {
long newFileSize = calculateFileSizeWithPadding(fileChannel.position(), currentSize, preAllocSize);
if (currentSize != newFileSize) {
fileChannel.write((ByteBuffer) fill.position(0), newFileSize - fill.remaining());
currentSize = newFileSize;
}
return currentSize;
}

/**
* Calculates a new file size with padding. We only return a new size if
* the current file position is sufficiently close (less than 4K) to end of
* file and preAllocSize is > 0.
*
* @param position the point in the file we have written to
* @param fileSize application keeps track of the current file size
* @param preAllocSize how many bytes to pad
* @return the new file size. It can be the same as fileSize if no
* padding was done.
* @throws IOException
*/
// VisibleForTesting
public static long calculateFileSizeWithPadding(long position, long fileSize, long preAllocSize) {
// If preAllocSize is positive and we are within 4KB of the known end of the file calculate a new file size
if (preAllocSize > 0 && position + 4096 >= fileSize) {
// If we have written more than we have previously preallocated we need to make sure the new
// file size is larger than what we already have
if (position > fileSize) {
fileSize = position + preAllocSize;
fileSize -= fileSize % preAllocSize;
} else {
fileSize += preAllocSize;
}
}

return fileSize;
}
}
Loading

0 comments on commit 126fb0f

Please sign in to comment.