Skip to content

Commit

Permalink
HBASE-23899 [Flakey Test] Stabilizations and Debug
Browse files Browse the repository at this point in the history
A miscellaney. Add extra logging to help w/ debug to a bunch of tests.
Fix some issues particular where we ran into mismatched filesystem
complaint. Some modernizations, removal of unnecessary deletes
(especially after seeing tests fail in table delete), and cleanup.
Recategorized one tests because it starts four clusters in the one
JVM from  medium to large. Finally, zk standalone server won't come
on occasion; added debug and thread dumping to help figure why (
manifests as test failing in startup saying master didn't launch).

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshot.java
  Fixes occasional mismatched filesystems where the difference is file:// vs file:///
  or we pick up hdfs schema when it a local fs test. Had to do this
  vetting of how we do make qualified on a Path in a few places, not
  just here as a few tests failed with this same issue. Code in here is
  used by a lot of tests that each in turn suffered this mismatch.

  Refactor for clarity

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotV1NoCluster.java
  Unused import.

hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/store/wal/TestWALProcedureStore.java
  This test fails if tmp dir is not where it expects because tries to
  make rootdir there. Give it a rootdir under test data dir.

hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
  This change is probably useless. I think the issue is actually
  a problem addressed later where our test for zk server being
  up gets stuck and never times out.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestSplitOrMergeStatus.java
 Move off deprecated APIs.

hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
 Log when we fail balance check for DEBUG Currently just says 'false'

hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSplitWALProcedure.java
 NPEs on way out if setup failed.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
 Add logging when assert fails to help w/ DEBUG

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbortTimeout.java
 Don't bother removing stuff on teardown. All gets thrown away anyways.
 Saw a few hangs in here in the teardown where hdfs was down before
 expected messing up shutdown.

hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
 Add timeout on socket; was seeing check for zk server getting stuck
 and never timing out (test time out in startup)

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotWithTemporaryDirectory.java
 Write to test data dir instead.
 Be careful about how we make qualified paths.

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
 Remove snowflake configs.

hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationStatus.java
 Add a hacky pause. Tried adding barriers but didn't work. Needs deep
 dive.

hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
 Remove code copied from zk and use zk methods directly instead.
 A general problem is that zk cluster doesn't come up occasionally but
 no clue why. Add thread dumping and state check.
  • Loading branch information
saintstack committed Feb 28, 2020
1 parent 00ef6c6 commit 6777e2c
Show file tree
Hide file tree
Showing 29 changed files with 291 additions and 245 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,6 @@ public abstract class TestTableInputFormatScanBase {

@BeforeClass
public static void setUpBeforeClass() throws Exception {
// test intermittently fails under hadoop2 (2.0.2-alpha) if shortcircuit-read (scr) is on.
// this turns it off for this test. TODO: Figure out why scr breaks recovery.
System.setProperty("hbase.tests.use.shortcircuit.reads", "false");

// switch TIF to log at DEBUG level
TEST_UTIL.enableDebug(TableInputFormat.class);
TEST_UTIL.enableDebug(TableInputFormatBase.class);
// start mini hbase cluster
TEST_UTIL.startMiniCluster(3);
// create and fill table
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/**
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
Expand All @@ -21,9 +21,7 @@
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import java.io.IOException;
import java.net.URI;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
Expand Down Expand Up @@ -55,7 +53,6 @@
import org.junit.rules.TestName;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription;
import org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotRegionManifest;

Expand Down Expand Up @@ -90,12 +87,6 @@ public static void setUpBaseConf(Configuration conf) {
// If a single node has enough failures (default 3), resource manager will blacklist it.
// With only 2 nodes and tests injecting faults, we don't want that.
conf.setInt("mapreduce.job.maxtaskfailures.per.tracker", 100);
/*
conf.setInt("hbase.client.pause", 250);
conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 6);
conf.setBoolean("hbase.master.enabletable.roundrobin", true);
conf.setInt("mapreduce.map.maxattempts", 10);
*/
}

@BeforeClass
Expand Down Expand Up @@ -211,36 +202,38 @@ protected void testExportFileSystemState(final TableName tableName,
*/
protected static void testExportFileSystemState(final Configuration conf, final TableName tableName,
final String snapshotName, final String targetName, final int filesExpected,
final Path sourceDir, Path copyDir, final boolean overwrite,
final Path srcDir, Path rawTgtDir, final boolean overwrite,
final RegionPredicate bypassregionPredicate, boolean success) throws Exception {
URI hdfsUri = FileSystem.get(conf).getUri();
FileSystem fs = FileSystem.get(copyDir.toUri(), conf);
LOG.info("DEBUG FS {} {} {}, hdfsUri={}", fs, copyDir, copyDir.toUri(), hdfsUri);
copyDir = copyDir.makeQualified(fs.getUri(), fs.getWorkingDirectory());
FileSystem tgtFs = rawTgtDir.getFileSystem(conf);
FileSystem srcFs = srcDir.getFileSystem(conf);
Path tgtDir = rawTgtDir.makeQualified(tgtFs.getUri(), tgtFs.getWorkingDirectory());
LOG.info("tgtFsUri={}, tgtDir={}, rawTgtDir={}, srcFsUri={}, srcDir={}",
tgtFs.getUri(), tgtDir, rawTgtDir, srcFs.getUri(), srcDir);
List<String> opts = new ArrayList<>();
opts.add("--snapshot");
opts.add(snapshotName);
opts.add("--copy-to");
opts.add(copyDir.toString());
if (!targetName.equals(snapshotName)) {
opts.add("--target");
opts.add(targetName);
}
if (overwrite) opts.add("--overwrite");
if (overwrite) {
opts.add("--overwrite");
}

// Export Snapshot
int res = run(conf, new ExportSnapshot(), opts.toArray(new String[opts.size()]));
assertEquals("success " + success + ", res=" + res, success ? 0 : 1, res);
if (!success) {
final Path targetDir = new Path(HConstants.SNAPSHOT_DIR_NAME, targetName);
assertFalse(copyDir.toString() + " " + targetDir.toString(),
fs.exists(new Path(copyDir, targetDir)));
assertFalse(tgtDir.toString() + " " + targetDir.toString(),
tgtFs.exists(new Path(tgtDir, targetDir)));
return;
}
LOG.info("Exported snapshot");

// Verify File-System state
FileStatus[] rootFiles = fs.listStatus(copyDir);
FileStatus[] rootFiles = tgtFs.listStatus(tgtDir);
assertEquals(filesExpected > 0 ? 2 : 1, rootFiles.length);
for (FileStatus fileStatus: rootFiles) {
String name = fileStatus.getPath().getName();
Expand All @@ -251,11 +244,10 @@ protected static void testExportFileSystemState(final Configuration conf, final
LOG.info("Verified filesystem state");

// Compare the snapshot metadata and verify the hfiles
final FileSystem hdfs = FileSystem.get(hdfsUri, conf);
final Path snapshotDir = new Path(HConstants.SNAPSHOT_DIR_NAME, snapshotName);
final Path targetDir = new Path(HConstants.SNAPSHOT_DIR_NAME, targetName);
verifySnapshotDir(hdfs, new Path(sourceDir, snapshotDir), fs, new Path(copyDir, targetDir));
Set<String> snapshotFiles = verifySnapshot(conf, fs, copyDir, tableName,
verifySnapshotDir(srcFs, new Path(srcDir, snapshotDir), tgtFs, new Path(tgtDir, targetDir));
Set<String> snapshotFiles = verifySnapshot(conf, tgtFs, tgtDir, tableName,
targetName, bypassregionPredicate);
assertEquals(filesExpected, snapshotFiles.size());
}
Expand All @@ -266,8 +258,6 @@ protected static void testExportFileSystemState(final Configuration conf, final
@Test
public void testExportRetry() throws Exception {
Path copyDir = getLocalDestinationDir();
FileSystem fs = FileSystem.get(copyDir.toUri(), new Configuration());
copyDir = copyDir.makeQualified(fs);
Configuration conf = new Configuration(TEST_UTIL.getConfiguration());
conf.setBoolean(ExportSnapshot.Testing.CONF_TEST_FAILURE, true);
conf.setInt(ExportSnapshot.Testing.CONF_TEST_FAILURE_COUNT, 2);
Expand Down Expand Up @@ -321,14 +311,13 @@ protected static Set<String> verifySnapshot(final Configuration conf, final File
@Override
public void storeFile(final RegionInfo regionInfo, final String family,
final SnapshotRegionManifest.StoreFile storeFile) throws IOException {
if (bypassregionPredicate != null && bypassregionPredicate.evaluate(regionInfo))
if (bypassregionPredicate != null && bypassregionPredicate.evaluate(regionInfo)) {
return;
}

String hfile = storeFile.getName();
snapshotFiles.add(hfile);
if (storeFile.hasReference()) {
// Nothing to do here, we have already the reference embedded
} else {
if (!storeFile.hasReference()) {
verifyNonEmptyFile(new Path(exportedArchive,
new Path(FSUtils.getTableDir(new Path("./"), tableName),
new Path(regionInfo.getEncodedName(), new Path(family, hfile)))));
Expand All @@ -339,7 +328,7 @@ private void verifyNonEmptyFile(final Path path) throws IOException {
assertTrue(path + " should exists", fs.exists(path));
assertTrue(path + " should not be empty", fs.getFileStatus(path).getLen() > 0);
}
});
});

// Verify Snapshot description
SnapshotDescription desc = SnapshotDescriptionUtils.readSnapshotInfo(fs, exportedSnapshot);
Expand All @@ -352,7 +341,7 @@ private static Set<String> listFiles(final FileSystem fs, final Path root, final
throws IOException {
Set<String> files = new HashSet<>();
LOG.debug("List files in {} in root {} at {}", fs, root, dir);
int rootPrefix = root.makeQualified(fs.getUri(), root).toString().length();
int rootPrefix = root.makeQualified(fs.getUri(), fs.getWorkingDirectory()).toString().length();
FileStatus[] list = FSUtils.listStatus(fs, dir);
if (list != null) {
for (FileStatus fstat: list) {
Expand All @@ -376,8 +365,13 @@ private Path getHdfsDestinationDir() {

private Path getLocalDestinationDir() {
Path path = TEST_UTIL.getDataTestDir("local-export-" + System.currentTimeMillis());
LOG.info("Local export destination path: " + path);
return path;
try {
FileSystem fs = FileSystem.getLocal(TEST_UTIL.getConfiguration());
LOG.info("Local export destination path: " + path);
return path.makeQualified(fs.getUri(), fs.getWorkingDirectory());
} catch (IOException ioe) {
throw new RuntimeException(ioe);
}
}

private static void removeExportDir(final Path path) throws IOException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,7 @@

import static org.junit.Assert.assertTrue;

import java.io.File;
import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;
Expand Down Expand Up @@ -55,29 +53,32 @@ public class TestExportSnapshotV1NoCluster {

private HBaseCommonTestingUtility testUtil = new HBaseCommonTestingUtility();
private Path testDir;
private FileSystem fs;

@Before
public void setUpBefore() throws Exception {
this.testDir = setup(this.testUtil);
// Make sure testDir is on LocalFileSystem
this.fs = FileSystem.getLocal(this.testUtil.getConfiguration());
this.testDir = setup(fs, this.testUtil);
LOG.info("fs={}, fsuri={}, fswd={}, testDir={}", this.fs, this.fs.getUri(),
this.fs.getWorkingDirectory(), this.testDir);
assertTrue("FileSystem '" + fs + "' is not local", fs instanceof LocalFileSystem);
}

/**
* Setup for test. Returns path to test data dir.
* Setup for test. Returns path to test data dir. Sets configuration into the passed
* hctu.getConfiguration.
*/
static Path setup(HBaseCommonTestingUtility hctu) throws IOException {
// Make sure testDir is on LocalFileSystem
Path testDir =
hctu.getDataTestDir().makeQualified(URI.create("file:///"), new Path("/"));
FileSystem fs = testDir.getFileSystem(hctu.getConfiguration());
assertTrue("FileSystem '" + fs + "' is not local", fs instanceof LocalFileSystem);
static Path setup(FileSystem fs, HBaseCommonTestingUtility hctu) throws IOException {
Path testDir = hctu.getDataTestDir().makeQualified(fs.getUri(), fs.getWorkingDirectory());
hctu.getConfiguration().setBoolean(SnapshotManager.HBASE_SNAPSHOT_ENABLED, true);
hctu.getConfiguration().setInt("hbase.regionserver.msginterval", 100);
hctu.getConfiguration().setInt("hbase.client.pause", 250);
hctu.getConfiguration().setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 6);
hctu.getConfiguration().setBoolean("hbase.master.enabletable.roundrobin", true);
hctu.getConfiguration().setInt("mapreduce.map.maxattempts", 10);
hctu.getConfiguration().set(HConstants.HBASE_DIR, testDir.toString());
return testDir;
return testDir.makeQualified(fs.getUri(), fs.getWorkingDirectory());
}

/**
Expand All @@ -86,18 +87,19 @@ static Path setup(HBaseCommonTestingUtility hctu) throws IOException {
@Test
public void testSnapshotWithRefsExportFileSystemState() throws Exception {
final SnapshotMock snapshotMock = new SnapshotMock(testUtil.getConfiguration(),
testDir.getFileSystem(testUtil.getConfiguration()), testDir);
this.fs, testDir);
final SnapshotMock.SnapshotBuilder builder = snapshotMock.createSnapshotV1("tableWithRefsV1",
"tableWithRefsV1");
testSnapshotWithRefsExportFileSystemState(builder, testUtil, testDir);
testSnapshotWithRefsExportFileSystemState(this.fs, builder, testUtil, testDir);
}

/**
* Generates a couple of regions for the specified SnapshotMock,
* and then it will run the export and verification.
*/
static void testSnapshotWithRefsExportFileSystemState(SnapshotMock.SnapshotBuilder builder,
HBaseCommonTestingUtility testUtil, Path testDir) throws Exception {
static void testSnapshotWithRefsExportFileSystemState(FileSystem fs,
SnapshotMock.SnapshotBuilder builder, HBaseCommonTestingUtility testUtil, Path testDir)
throws Exception {
Path[] r1Files = builder.addRegion();
Path[] r2Files = builder.addRegion();
builder.commit();
Expand All @@ -106,14 +108,16 @@ static void testSnapshotWithRefsExportFileSystemState(SnapshotMock.SnapshotBuild
TableName tableName = builder.getTableDescriptor().getTableName();
TestExportSnapshot.testExportFileSystemState(testUtil.getConfiguration(),
tableName, snapshotName, snapshotName, snapshotFilesCount,
testDir, getDestinationDir(testUtil, testDir), false, null, true);
testDir, getDestinationDir(fs, testUtil, testDir), false, null, true);
}

static Path getDestinationDir(HBaseCommonTestingUtility hctu, Path testDir) throws IOException {
FileSystem fs = FileSystem.get(hctu.getConfiguration());
static Path getDestinationDir(FileSystem fs, HBaseCommonTestingUtility hctu, Path testDir)
throws IOException {
Path path = new Path(new Path(testDir, "export-test"),
"export-" + System.currentTimeMillis()).makeQualified(fs.getUri(), fs.getWorkingDirectory());
LOG.info("HDFS export destination path: " + path);
"export-" + System.currentTimeMillis()).makeQualified(fs.getUri(),
fs.getWorkingDirectory());
LOG.info("Export destination={}, fs={}, fsurl={}, fswd={}, testDir={}", path, fs, fs.getUri(),
fs.getWorkingDirectory(), testDir);
return path;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
* limitations under the License.
*/
package org.apache.hadoop.hbase.snapshot;

import static org.junit.Assert.assertTrue;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseClassTestRule;
import org.apache.hadoop.hbase.HBaseCommonTestingUtility;
Expand Down Expand Up @@ -44,10 +46,15 @@ public class TestExportSnapshotV2NoCluster {

private HBaseCommonTestingUtility testUtil = new HBaseCommonTestingUtility();
private Path testDir;
private FileSystem fs;

@Before
public void before() throws Exception {
this.testDir = TestExportSnapshotV1NoCluster.setup(this.testUtil);
// Make sure testDir is on LocalFileSystem
this.fs = FileSystem.getLocal(this.testUtil.getConfiguration());
this.testDir = TestExportSnapshotV1NoCluster.setup(this.fs, this.testUtil);
LOG.info("fs={}, testDir={}", this.fs, this.testDir);
assertTrue("FileSystem '" + fs + "' is not local", fs instanceof LocalFileSystem);
}

@Test
Expand All @@ -56,7 +63,7 @@ public void testSnapshotWithRefsExportFileSystemState() throws Exception {
testDir.getFileSystem(testUtil.getConfiguration()), testDir);
final SnapshotMock.SnapshotBuilder builder = snapshotMock.createSnapshotV2("tableWithRefsV2",
"tableWithRefsV2");
TestExportSnapshotV1NoCluster.testSnapshotWithRefsExportFileSystemState(builder, this.testUtil,
this.testDir);
TestExportSnapshotV1NoCluster.testSnapshotWithRefsExportFileSystemState(this.fs, builder,
this.testUtil, this.testDir);
}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/**
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
Expand All @@ -17,11 +17,10 @@
*/
package org.apache.hadoop.hbase.snapshot;

import java.io.File;
import java.nio.file.Paths;
import java.io.IOException;
import java.util.UUID;
import org.apache.commons.io.FileUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseClassTestRule;
import org.apache.hadoop.hbase.testclassification.MediumTests;
Expand All @@ -37,9 +36,6 @@ public class TestExportSnapshotWithTemporaryDirectory extends TestExportSnapshot
public static final HBaseClassTestRule CLASS_RULE =
HBaseClassTestRule.forClass(TestExportSnapshotWithTemporaryDirectory.class);

protected static String TEMP_DIR = Paths.get("").toAbsolutePath().toString() + Path.SEPARATOR
+ UUID.randomUUID().toString();

@BeforeClass
public static void setUpBeforeClass() throws Exception {
setUpBaseConf(TEST_UTIL.getConfiguration());
Expand All @@ -50,11 +46,18 @@ public static void setUpBeforeClass() throws Exception {
@AfterClass
public static void tearDownAfterClass() throws Exception {
TestExportSnapshot.tearDownAfterClass();
FileUtils.deleteDirectory(new File(TEMP_DIR));
}

public static void setUpBaseConf(Configuration conf) {
Path tmpDir = null;
try {
FileSystem localFs = FileSystem.getLocal(conf);
tmpDir = TEST_UTIL.getDataTestDir(UUID.randomUUID().toString()).
makeQualified(localFs.getUri(), localFs.getWorkingDirectory());
} catch (IOException ioe) {
throw new RuntimeException(ioe);
}
TestExportSnapshot.setUpBaseConf(conf);
conf.set(SnapshotDescriptionUtils.SNAPSHOT_WORKING_DIR, "file://" + new Path(TEMP_DIR, ".tmpdir").toUri());
conf.set(SnapshotDescriptionUtils.SNAPSHOT_WORKING_DIR, tmpDir.toUri().toString());
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseClassTestRule;
import org.apache.hadoop.hbase.HBaseCommonTestingUtility;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.procedure2.Procedure;
import org.apache.hadoop.hbase.procedure2.ProcedureStateSerializer;
import org.apache.hadoop.hbase.procedure2.ProcedureTestingUtility;
Expand Down Expand Up @@ -85,7 +86,9 @@ private void setupConfig(final Configuration conf) {
public void setUp() throws IOException {
htu = new HBaseCommonTestingUtility();
testDir = htu.getDataTestDir();
htu.getConfiguration().set(HConstants.HBASE_DIR, testDir.toString());
fs = testDir.getFileSystem(htu.getConfiguration());
htu.getConfiguration().set(HConstants.HBASE_DIR, testDir.toString());
assertTrue(testDir.depth() > 1);

setupConfig(htu.getConfiguration());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ public void startReplicationService() throws IOException {
this.scheduleThreadPool.scheduleAtFixedRate(
new ReplicationStatisticsTask(this.replicationSink, this.replicationManager),
statsThreadPeriod, statsThreadPeriod, TimeUnit.SECONDS);
LOG.info("{} started", this.server.toString());
}

/**
Expand Down
Loading

0 comments on commit 6777e2c

Please sign in to comment.