HBASE-23202 ExportSnapshot (import) will fail if copying files to root directory takes longer than cleaner TTL #769

guangxuCheng · 2019-10-28T12:39:59Z

Detail describe: https://issues.apache.org/jira/browse/HBASE-23202

Apache-HBase · 2019-10-28T16:31:19Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
💙	reexec	0m 36s	Docker mode activated.
		_ Prechecks _
💚	dupname	0m 0s	No case conflicting files found.
💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
💚	@author	0m 0s	The patch does not contain any @author tags.
💚	test4tests	0m 0s	The patch appears to include 2 new or modified test files.
		_ master Compile Tests _
💚	mvninstall	6m 0s	master passed
💚	compile	1m 1s	master passed
💚	checkstyle	1m 37s	master passed
💚	shadedjars	5m 35s	branch has no errors when building our shaded downstream artifacts.
💚	javadoc	0m 40s	master passed
💙	spotbugs	4m 36s	Used deprecated FindBugs config; considering switching to SpotBugs.
💚	findbugs	4m 34s	master passed
		_ Patch Compile Tests _
💚	mvninstall	5m 45s	the patch passed
💚	compile	0m 57s	the patch passed
💚	javac	0m 57s	the patch passed
💔	checkstyle	1m 18s	hbase-server: The patch generated 2 new + 5 unchanged - 0 fixed = 7 total (was 5)
💚	whitespace	0m 0s	The patch has no whitespace issues.
💚	shadedjars	4m 49s	patch has no errors when building our shaded downstream artifacts.
💚	hadoopcheck	15m 52s	Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
💚	javadoc	0m 36s	the patch passed
💚	findbugs	4m 12s	the patch passed
		_ Other Tests _
💚	unit	160m 36s	hbase-server in the patch passed.
💚	asflicense	0m 35s	The patch does not generate ASF License warnings.
		220m 59s

Subsystem	Report/Notes
Docker	Client=19.03.4 Server=19.03.4 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/artifact/out/Dockerfile
GITHUB PR	#769
Optional Tests	dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname	Linux 228abb5f3da1 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-769/out/precommit/personality/provided.sh
git revision	master / `4c75485`
Default Java	1.8.0_181
checkstyle	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/artifact/out/diff-checkstyle-hbase-server.txt
Test Results	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/testReport/
Max. process+thread count	4611 (vs. ulimit of 10000)
modules	C: hbase-server U: hbase-server
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/console
versions	git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by	Apache Yetus 0.11.0 https://yetus.apache.org

This message was automatically generated.

z-york · 2019-10-28T18:29:49Z

hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotFileCache.java

+        try {
+          snapshotInProgress.addAll(fileInspector.filesUnderSnapshot(run.getPath()));
+        } catch (CorruptedSnapshotException e) {
+          // See HBASE-16464
+          if (e.getCause() instanceof FileNotFoundException) {
+            // If the snapshot is corrupt, we will delete it
+            fs.delete(run.getPath(), true);
+            LOG.warn("delete the " + run.getPath() + " due to exception:", e.getCause());


Will this actually work for the ExportSnapshot case? The snapshot manifest is added to tmp before all the files are present on cluster so it looks like this will delete the snapshot manifest which would mess up the import job.

Hmmm, there maybe race condition between ExportSnapshot and SnapshotCleaner.
Copying Snapshot Manifest is a fast operation. Maybe we can add a time threshold. When we catch CorruptedSnapshotException, if the modification time of the snapshot folder exceeds a certain time threshold, we will delete it, otherwise we will ignore this cleanup operation. WDYT?

copying the snapshot manifest is not always fast since it can be hundreds of MB and the link between clusters can be poor.

and when the snapshot contains a large number of files, copying the snapshot can take a long time even when there isn't a lot of data. Also copying the actual data for a large export can take tens-of-days.

In fact, when CorruptedSnapshotException is thrown, we can ignore the exception and continue to clean up HFile instead of skip.

If the CorruptedSnapshotException is thrown, which means that the ExportSnapshot has not copy the snapshot manifest successfully, and the data file of the snapshot has not yet started to copy, so it will have no effect on the snapshot if the snapshotCleaner continues.

The main purpose of adding a delete snapshot manifest logic is to clean up the abnormal snapshot manifest. Of course, it is OK to remove the logic.

any progress on this issue review?? I faced exactly same problem, and hope it to be resolved.

Yeah, if it reads into the middle of copying manifest files, it is ok to remove this snapshot as copying HFiles has not started yet. So there is no impact for the logic in snapshotCleaner.

The logic of getUnreferencedFiles() is that for an HFile which is not in cache, it will refreshCache to get the latest snapshot hfiles. If one hfile from this exortSnapshot job is in the list, this means that manifest files have been copied over, so refreshCache() will get the latest snapshot file list.

@busbey @z-york Unless you see something missing, I think this one is good to go, thanks.

I rebased the patch and posted a new pull request,
#1791

It is same as the original one, except some minor changes (like some of utilities are moved, change to use new utility class).

binlijin · 2019-10-29T07:46:31Z

LGTM

Apache-HBase · 2019-10-31T04:03:28Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
💙	reexec	1m 6s	Docker mode activated.
		_ Prechecks _
💚	dupname	0m 0s	No case conflicting files found.
💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
💚	@author	0m 0s	The patch does not contain any @author tags.
💚	test4tests	0m 0s	The patch appears to include 2 new or modified test files.
		_ master Compile Tests _
💚	mvninstall	5m 53s	master passed
💚	compile	0m 57s	master passed
💚	checkstyle	1m 29s	master passed
💚	shadedjars	5m 3s	branch has no errors when building our shaded downstream artifacts.
💚	javadoc	0m 37s	master passed
💙	spotbugs	4m 32s	Used deprecated FindBugs config; considering switching to SpotBugs.
💚	findbugs	4m 28s	master passed
		_ Patch Compile Tests _
💚	mvninstall	5m 25s	the patch passed
💚	compile	0m 59s	the patch passed
💚	javac	0m 59s	the patch passed
💔	checkstyle	1m 27s	hbase-server: The patch generated 3 new + 5 unchanged - 0 fixed = 8 total (was 5)
💚	whitespace	0m 0s	The patch has no whitespace issues.
💚	shadedjars	4m 58s	patch has no errors when building our shaded downstream artifacts.
💚	hadoopcheck	17m 18s	Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
💚	javadoc	0m 36s	the patch passed
💚	findbugs	4m 36s	the patch passed
		_ Other Tests _
💔	unit	31m 18s	hbase-server in the patch failed.
💚	asflicense	0m 15s	The patch does not generate ASF License warnings.
		93m 15s

Reason	Tests
Failed junit tests	hadoop.hbase.master.snapshot.TestSnapshotHFileCleaner

Subsystem	Report/Notes
Docker	Client=19.03.4 Server=19.03.4 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/2/artifact/out/Dockerfile
GITHUB PR	#769
Optional Tests	dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname	Linux a0bf09f39b92 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-769/out/precommit/personality/provided.sh
git revision	master / `2451c2c`
Default Java	1.8.0_181
checkstyle	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/2/artifact/out/diff-checkstyle-hbase-server.txt
unit	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/2/artifact/out/patch-unit-hbase-server.txt
Test Results	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/2/testReport/
Max. process+thread count	672 (vs. ulimit of 10000)
modules	C: hbase-server U: hbase-server
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/2/console
versions	git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by	Apache Yetus 0.11.0 https://yetus.apache.org

This message was automatically generated.

…t directory takes longer than cleaner TTL

Apache-HBase · 2019-10-31T13:17:38Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
💙	reexec	1m 13s	Docker mode activated.
		_ Prechecks _
💚	dupname	0m 0s	No case conflicting files found.
💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
💚	@author	0m 0s	The patch does not contain any @author tags.
💚	test4tests	0m 0s	The patch appears to include 2 new or modified test files.
		_ master Compile Tests _
💚	mvninstall	5m 55s	master passed
💚	compile	0m 58s	master passed
💚	checkstyle	1m 28s	master passed
💚	shadedjars	5m 0s	branch has no errors when building our shaded downstream artifacts.
💚	javadoc	0m 40s	master passed
💙	spotbugs	5m 7s	Used deprecated FindBugs config; considering switching to SpotBugs.
💚	findbugs	5m 4s	master passed
		_ Patch Compile Tests _
💚	mvninstall	6m 22s	the patch passed
💚	compile	1m 7s	the patch passed
💚	javac	1m 7s	the patch passed
💚	checkstyle	1m 30s	the patch passed
💚	whitespace	0m 0s	The patch has no whitespace issues.
💚	shadedjars	5m 14s	patch has no errors when building our shaded downstream artifacts.
💚	hadoopcheck	17m 37s	Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
💚	javadoc	0m 34s	the patch passed
💚	findbugs	4m 36s	the patch passed
		_ Other Tests _
💚	unit	227m 43s	hbase-server in the patch passed.
💚	asflicense	0m 26s	The patch does not generate ASF License warnings.
		292m 20s

Subsystem	Report/Notes
Docker	Client=19.03.4 Server=19.03.4 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/3/artifact/out/Dockerfile
GITHUB PR	#769
Optional Tests	dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname	Linux b545485db603 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-769/out/precommit/personality/provided.sh
git revision	master / `2451c2c`
Default Java	1.8.0_181
Test Results	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/3/testReport/
Max. process+thread count	4407 (vs. ulimit of 10000)
modules	C: hbase-server U: hbase-server
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/3/console
versions	git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by	Apache Yetus 0.11.0 https://yetus.apache.org

This message was automatically generated.

ferhui · 2020-05-11T09:31:37Z

Face the same problem. Any progress on this issue? @guangxuCheng @binlijin

Apache-HBase · 2020-05-22T21:15:16Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 0s	Docker mode activated.
-1 ❌	patch	0m 4s	#769 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.

Subsystem	Report/Notes
GITHUB PR	#769
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/console
versions	git=2.17.1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2020-05-22T21:15:40Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 0s	Docker mode activated.
-1 ❌	patch	0m 3s	#769 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.

Subsystem	Report/Notes
GITHUB PR	#769
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/console
versions	git=2.17.1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2020-05-22T21:16:40Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 0s	Docker mode activated.
-1 ❌	patch	0m 2s	#769 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.

Subsystem	Report/Notes
GITHUB PR	#769
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/console
versions	git=2.17.1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2020-05-22T22:11:56Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 0s	Docker mode activated.
-1 ❌	patch	0m 3s	#769 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.

Subsystem	Report/Notes
GITHUB PR	#769
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/console
versions	git=2.17.1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2020-05-22T22:12:05Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 0s	Docker mode activated.
-1 ❌	patch	0m 2s	#769 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.

Subsystem	Report/Notes
GITHUB PR	#769
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/console
versions	git=2.17.1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2020-05-22T22:12:11Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 0s	Docker mode activated.
-1 ❌	patch	0m 3s	#769 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.

Subsystem	Report/Notes
GITHUB PR	#769
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-769/1/console
versions	git=2.17.1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

huaxiangsun · 2020-05-25T00:25:50Z

We run into this issue when exportSnapshot with large size hfiles, will spend some time on reviewing.

huaxiangsun

Looks good to me, will try to rebase and run test locally.

z-york requested changes Oct 28, 2019

View reviewed changes

guangxuCheng force-pushed the HBASE-23202 branch from 263f2f1 to ae40ed4 Compare October 31, 2019 02:28

HBASE-23202 ExportSnapshot (import) will fail if copying files to roo…

37d1e5b

…t directory takes longer than cleaner TTL

guangxuCheng force-pushed the HBASE-23202 branch from ae40ed4 to 37d1e5b Compare October 31, 2019 08:21

guangxuCheng requested review from busbey and z-york November 7, 2019 02:44

huaxiangsun approved these changes May 26, 2020

View reviewed changes

huaxiangsun mentioned this pull request May 27, 2020

HBASE-23202 ExportSnapshot (import) will fail if copying files to roo… #1791

Merged

Apache9 closed this Jun 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-23202 ExportSnapshot (import) will fail if copying files to root directory takes longer than cleaner TTL #769

HBASE-23202 ExportSnapshot (import) will fail if copying files to root directory takes longer than cleaner TTL #769

guangxuCheng commented Oct 28, 2019

Apache-HBase commented Oct 28, 2019

z-york Oct 28, 2019

guangxuCheng Oct 29, 2019

busbey Oct 29, 2019

busbey Oct 29, 2019

guangxuCheng Oct 30, 2019 •

edited

Loading

eomiks Mar 27, 2020 •

edited

Loading

huaxiangsun May 26, 2020

huaxiangsun May 26, 2020

huaxiangsun May 26, 2020

huaxiangsun May 27, 2020

binlijin commented Oct 29, 2019

Apache-HBase commented Oct 31, 2019

Apache-HBase commented Oct 31, 2019

ferhui commented May 11, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

huaxiangsun commented May 25, 2020

huaxiangsun left a comment

HBASE-23202 ExportSnapshot (import) will fail if copying files to root directory takes longer than cleaner TTL #769

HBASE-23202 ExportSnapshot (import) will fail if copying files to root directory takes longer than cleaner TTL #769

Conversation

guangxuCheng commented Oct 28, 2019

Apache-HBase commented Oct 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guangxuCheng Oct 30, 2019 • edited Loading

Choose a reason for hiding this comment

eomiks Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binlijin commented Oct 29, 2019

Apache-HBase commented Oct 31, 2019

Apache-HBase commented Oct 31, 2019

ferhui commented May 11, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

Apache-HBase commented May 22, 2020

huaxiangsun commented May 25, 2020

huaxiangsun left a comment

Choose a reason for hiding this comment

guangxuCheng Oct 30, 2019 •

edited

Loading

eomiks Mar 27, 2020 •

edited

Loading