HBASE-27621 Also clear the Dictionary when resetting when reading compressed WAL file #5016

Apache9 · 2023-02-08T10:58:05Z

No description provided.

thangTang · 2023-02-08T11:21:52Z

This seems like an ingenious idea. But I want to confirm that due to the eviction mechanism of LRUMap, even if findEntry is used instead of addEntry, is there still a possibility of inconsistent read-write path behavior in theory?

Apache-HBase · 2023-02-08T12:04:33Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 21s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 14s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 12s	master passed
+1 💚	compile	3m 33s	master passed
+1 💚	checkstyle	0m 55s	master passed
+1 💚	spotless	0m 46s	branch has no errors when running spotless:check.
+1 💚	spotbugs	2m 54s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 10s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 34s	the patch passed
+1 💚	compile	3m 19s	the patch passed
+1 💚	javac	3m 19s	the patch passed
+1 💚	checkstyle	0m 15s	The patch passed checkstyle in hbase-common
+1 💚	checkstyle	0m 36s	hbase-server: The patch generated 0 new + 5 unchanged - 2 fixed = 5 total (was 7)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	20m 14s	Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚	spotless	0m 57s	patch has no errors when running spotless:check.
+1 💚	spotbugs	3m 10s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 20s	The patch does not generate ASF License warnings.
		59m 6s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux 0655804e6aa8 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `1a9e465`
Default Java	Eclipse Adoptium-11.0.17+8
Max. process+thread count	84 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/console
versions	git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

bbeaudreault · 2023-02-08T14:42:35Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/TagCompressionContext.java

      if (status == Dictionary.NOT_IN_DICTIONARY) {
        int tagLen = StreamUtils.readRawVarint32(src);
        offset = Bytes.putAsShort(dest, offset, tagLen);
        IOUtils.readFully(src, dest, offset, tagLen);
-        tagDict.addEntry(dest, offset, tagLen);
+        tagDict.findEntry(dest, offset, tagLen);


Do you think this change could be expensive? In the normal case, the entry will not exist in the dict. But now we're adding an extra map lookup for every call. Granted o(1), but involves cpu for hashcode, allocating lookup key, etc.

I wonder if we could trigger findEntry only if context has been reset? Otherwise use addEntry for first pass?

May not be a big issue, just checking

It is slower than before but I always think correctness comes first, and then we consider the performance. For log splitting and replication, reading is usually not the bottleneck.

Can file an follow on issue to do the optimization, maybe we could add a reset flag in CompressionContext too, to indicate that whether we need to do a lookup first.

Thanks.

Sounds good, agree on correctness first.

Also agree on bottleneck for splitting/replications. However, this uncompressTags method is in the hot path of normal reads when DataBlockEncoding is used: here.

Apache-HBase · 2023-02-08T14:59:55Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	3m 5s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 7s	master passed
+1 💚	compile	1m 23s	master passed
+1 💚	shadedjars	5m 25s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 47s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 44s	the patch passed
+1 💚	compile	0m 55s	the patch passed
+1 💚	javac	0m 55s	the patch passed
+1 💚	shadedjars	4m 37s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 34s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 1s	hbase-common in the patch passed.
-1 ❌	unit	202m 55s	hbase-server in the patch failed.
		234m 28s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 89a079cfa76f 5.4.0-1094-aws #102~18.04.1-Ubuntu SMP Tue Jan 10 21:07:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `1a9e465`
Default Java	Eclipse Adoptium-11.0.17+8
unit	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/testReport/
Max. process+thread count	2668 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-08T15:12:28Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 19s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 14s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 25s	master passed
+1 💚	compile	1m 10s	master passed
+1 💚	shadedjars	5m 8s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 49s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 10s	the patch passed
+1 💚	compile	1m 16s	the patch passed
+1 💚	javac	1m 16s	the patch passed
+1 💚	shadedjars	5m 15s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 46s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 19s	hbase-common in the patch passed.
-1 ❌	unit	215m 9s	hbase-server in the patch failed.
		247m 1s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 099dc80b3993 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `1a9e465`
Default Java	Temurin-1.8.0_352-b08
unit	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/testReport/
Max. process+thread count	2338 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/1/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache9 · 2023-02-08T15:31:57Z

This seems like an ingenious idea. But I want to confirm that due to the eviction mechanism of LRUMap, even if findEntry is used instead of addEntry, is there still a possibility of inconsistent read-write path behavior in theory?

The most important thing here is to read WAL entries in order, and not skip any entries. If these two rules are guaranteed, it is OK to restart as many times as you want. And I think for replication, we must follow these two rules otherwise there will be data loss...

thangTang · 2023-02-08T15:52:19Z

This seems like an ingenious idea. But I want to confirm that due to the eviction mechanism of LRUMap, even if findEntry is used instead of addEntry, is there still a possibility of inconsistent read-write path behavior in theory?

The most important thing here is to read WAL entries in order, and not skip any entries. If these two rules are guaranteed, it is OK to restart as many times as you want. And I think for replication, we must follow these two rules otherwise there will be data loss...

Agree about that, but I think I didn't express my question clearly.

For WAL Compression, The core logic is to build an index (LRUMap) in memory while writing/reading WAL. There is another key point here, that is, when operating a WAL file, the behavior of both read/write path needs to be exactly same.

Using findEntry instead of addEntry in this patch, I think it could solve a part of problem. But however, for example, we did not resetPosition when we wrote WAL, but a certain position was reset many times when we read WAL. The implicit operation here is: this node has been movedToHead many times in LRUMap. So is it possible that the node evicted in the write path(write WAL) has inconsistencies in the read path(replication)?

Apache-HBase · 2023-02-08T16:05:43Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 25s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 13s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 30s	master passed
+1 💚	compile	2m 55s	master passed
+1 💚	checkstyle	0m 43s	master passed
+1 💚	spotless	0m 40s	branch has no errors when running spotless:check.
+1 💚	spotbugs	1m 56s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 9s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 13s	the patch passed
+1 💚	compile	2m 51s	the patch passed
+1 💚	javac	2m 51s	the patch passed
+1 💚	checkstyle	0m 12s	The patch passed checkstyle in hbase-common
+1 💚	checkstyle	0m 31s	hbase-server: The patch generated 0 new + 5 unchanged - 2 fixed = 5 total (was 7)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	12m 43s	Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚	spotless	0m 38s	patch has no errors when running spotless:check.
+1 💚	spotbugs	2m 3s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 16s	The patch does not generate ASF License warnings.
		40m 45s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux af035ff80124 5.4.0-1094-aws #102~18.04.1-Ubuntu SMP Tue Jan 10 21:07:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6a34aa8`
Default Java	Eclipse Adoptium-11.0.17+8
Max. process+thread count	86 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/2/console
versions	git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-08T19:15:36Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 26s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 22s	master passed
+1 💚	compile	0m 57s	master passed
+1 💚	shadedjars	4m 36s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 35s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 11s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 16s	the patch passed
+1 💚	compile	0m 56s	the patch passed
+1 💚	javac	0m 56s	the patch passed
+1 💚	shadedjars	4m 34s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 35s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 58s	hbase-common in the patch passed.
+1 💚	unit	204m 47s	hbase-server in the patch passed.
		230m 42s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 685adc51353b 5.4.0-1094-aws #102~18.04.1-Ubuntu SMP Tue Jan 10 21:07:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6a34aa8`
Default Java	Eclipse Adoptium-11.0.17+8
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/2/testReport/
Max. process+thread count	2707 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/2/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-08T19:19:45Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 49s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 11s	Maven dependency ordering for branch
+1 💚	mvninstall	2m 45s	master passed
+1 💚	compile	0m 56s	master passed
+1 💚	shadedjars	4m 15s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 13s	Maven dependency ordering for patch
+1 💚	mvninstall	2m 49s	the patch passed
+1 💚	compile	0m 58s	the patch passed
+1 💚	javac	0m 58s	the patch passed
+1 💚	shadedjars	4m 17s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 37s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 48s	hbase-common in the patch passed.
+1 💚	unit	209m 56s	hbase-server in the patch passed.
		234m 51s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 8b13800be3c3 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6a34aa8`
Default Java	Temurin-1.8.0_352-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/2/testReport/
Max. process+thread count	2387 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/2/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-08T20:19:42Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 12s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 14s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 19s	master passed
+1 💚	compile	3m 41s	master passed
+1 💚	checkstyle	0m 58s	master passed
+1 💚	spotless	0m 46s	branch has no errors when running spotless:check.
+1 💚	spotbugs	2m 55s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 9s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 41s	the patch passed
+1 💚	compile	3m 17s	the patch passed
+1 💚	javac	3m 17s	the patch passed
+1 💚	checkstyle	0m 13s	The patch passed checkstyle in hbase-common
+1 💚	checkstyle	0m 43s	hbase-server: The patch generated 0 new + 5 unchanged - 2 fixed = 5 total (was 7)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	19m 58s	Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚	spotless	0m 55s	patch has no errors when running spotless:check.
+1 💚	spotbugs	3m 11s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 21s	The patch does not generate ASF License warnings.
		58m 38s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux 3f1e83ca043a 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6a34aa8`
Default Java	Eclipse Adoptium-11.0.17+8
Max. process+thread count	86 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/3/console
versions	git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-08T23:21:50Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	4m 39s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 56s	master passed
+1 💚	compile	1m 6s	master passed
+1 💚	shadedjars	4m 38s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 42s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 10s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 33s	the patch passed
+1 💚	compile	1m 7s	the patch passed
+1 💚	javac	1m 7s	the patch passed
+1 💚	shadedjars	4m 33s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 40s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 15s	hbase-common in the patch passed.
+1 💚	unit	209m 12s	hbase-server in the patch passed.
		240m 56s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 6d7399d360d4 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6a34aa8`
Default Java	Eclipse Adoptium-11.0.17+8
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/3/testReport/
Max. process+thread count	2638 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/3/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-08T23:27:13Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 14s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 13s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 37s	master passed
+1 💚	compile	1m 7s	master passed
+1 💚	shadedjars	5m 13s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 49s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 11s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 9s	the patch passed
+1 💚	compile	1m 14s	the patch passed
+1 💚	javac	1m 14s	the patch passed
+1 💚	shadedjars	5m 17s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 17s	hbase-common in the patch passed.
+1 💚	unit	214m 48s	hbase-server in the patch passed.
		246m 17s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 6a2964fb26a9 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6a34aa8`
Default Java	Temurin-1.8.0_352-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/3/testReport/
Max. process+thread count	2462 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/3/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache9 · 2023-02-09T03:01:29Z

This seems like an ingenious idea. But I want to confirm that due to the eviction mechanism of LRUMap, even if findEntry is used instead of addEntry, is there still a possibility of inconsistent read-write path behavior in theory?

The most important thing here is to read WAL entries in order, and not skip any entries. If these two rules are guaranteed, it is OK to restart as many times as you want. And I think for replication, we must follow these two rules otherwise there will be data loss...

Agree about that, but I think I didn't express my question clearly.

For WAL Compression, The core logic is to build an index (LRUMap) in memory while writing/reading WAL. There is another key point here, that is, when operating a WAL file, the behavior of both read/write path needs to be exactly same.

Using findEntry instead of addEntry in this patch, I think it could solve a part of problem. But however, for example, we did not resetPosition when we wrote WAL, but a certain position was reset many times when we read WAL. The implicit operation here is: this node has been movedToHead many times in LRUMap. So is it possible that the node evicted in the write path(write WAL) has inconsistencies in the read path(replication)?

After deep consideration I think you are right. The solution here can only work perfectly when the dict is infinite, i.e, no eviction. If we also consider eviction, if go back for a long distance, the word of a given index will change due to eviction, then when reading, if we use a index to get the word(a qualifier, a row, for example), we may get a incorrect word on the given index.
In real world, although it is not likely that we will go back for a very long distance, but it is possible that a single WAL entry has a lot of cells, and our qualifier dict capacity is only 127, it is still possible to fall into the above scenario...

So, it seems that rebuilding the dict is necessary when reseting. But anyway, I could try to refactor the readNext method in ProtrobufLogReader, to have more fine-grained control on whether we need to reconstruct the dict. For example, if we just return before reading the actual WAL entry, i.e, we quit earlier after checking available bytes, we do not need to reconstruct the dict.

Thanks for pointing this out!

thangTang · 2023-02-09T03:48:21Z

So, it seems that rebuilding the dict is necessary when reseting.

Agree. Although this solution will have a performance loss, but it should be the best way I can think of to completely solve this problem.
Another idea is to refactor dict and design an LRUMap that can support precise rollback. I've spent some time in this direction, but found nothing out. At least, it also can't be free (such as memory overhead). . .

Apache9 · 2023-02-09T07:33:01Z

The PR can not solve all the problem so I convert it to draft to avoid others may merge it accidentally.

Thanks all for help reviewing and testing, especially @thangTang for pointing out the problem.

Will change the title and provide a new PR soon.

sunhelly · 2023-02-09T10:43:32Z

For ensure the compress and uncompress construct same dictionary, we should only use LRUDictionary#findEntry() to add entries, but need to keep LRUDictionary#getEntry() do not move ahead the entry?

Apache9 · 2023-02-09T13:21:32Z

For ensure the compress and uncompress construct same dictionary, we should only use LRUDictionary#findEntry() to add entries, but need to keep LRUDictionary#getEntry() do not move ahead the entry?

This is still not enough... As said above, if we go back for a long distance, the word on a given index could be completely different, and then lead to incorrect result when you find a field is 'in dictionary'...

Apache9 · 2023-02-09T13:27:22Z

I tried to refactoring a bit but the implementation of ProtobufLogReader is too complicated. I think we'd better abstract two types of WAL.Reader for reading WAL file.
One is StreamingReader, which is used in most cases, for example, WAL splitting, WAL printing, etc, where we only need to read the file once and usually for closed WAL files. There is no need to support reset and seek.
The other is TailingReader, which is used by Replication, where we need to support reset and seek, and also we need to tell the upper layer whether we need to reset the compress context when calling reseting. The logic will be more complicated as we need to consider the requirements for tailing a WAL file which is currently being written.
The refactoring will be a bit big so I do not think we should apply it to branch-2.5 and branch-2.4. So let's apply the simple fix here and file another issue to implement the big refactoring.

Thanks.

thangTang · 2023-02-09T14:00:06Z

I tried to refactoring a bit but the implementation of ProtobufLogReader is too complicated. I think we'd better abstract two types of WAL.Reader for reading WAL file. One is StreamingReader, which is used in most cases, for example, WAL splitting, WAL printing, etc, where we only need to read the file once and usually for closed WAL files. There is no need to support reset and seek. The other is TailingReader, which is used by Replication, where we need to support reset and seek, and also we need to tell the upper layer whether we need to reset the compress context when calling reseting. The logic will be more complicated as we need to consider the requirements for tailing a WAL file which is currently being written. The refactoring will be a bit big so I do not think we should apply it to branch-2.5 and branch-2.4. So let's apply the simple fix here and file another issue to implement the big refactoring.

Thanks.

I understand that this is a complicated and dirty job, I am ashamed that I didn't solve it thoroughly before...
But by the way, just for this PR, Would you mind taking a look at https://issues.apache.org/jira/browse/HBASE-26850 and #4233?
At that time, I thought that it could not fundamentally solve the problem, so I did not continue to push forward, but these two patches seem a bit similar? The difference is that I changed the implementation of addEntry.

thangTang · 2023-02-09T14:03:44Z

@apurtell FYI, I think you may also be interested in this patch~

thangTang · 2023-02-09T14:04:28Z

After all, manual +1 from me: )

…pressed WAL file

Apache-HBase · 2023-02-09T19:51:45Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 24s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 31s	master passed
+1 💚	compile	0m 56s	master passed
+1 💚	shadedjars	4m 36s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 36s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 11s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 15s	the patch passed
+1 💚	compile	0m 56s	the patch passed
+1 💚	javac	0m 56s	the patch passed
+1 💚	shadedjars	4m 35s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 33s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 0s	hbase-common in the patch passed.
+1 💚	unit	197m 53s	hbase-server in the patch passed.
		224m 25s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 8f57a4aad181 5.4.0-1093-aws #102~18.04.2-Ubuntu SMP Wed Dec 7 00:31:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `a854cba`
Default Java	Eclipse Adoptium-11.0.17+8
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/4/testReport/
Max. process+thread count	2493 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/4/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-09T20:03:30Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 46s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for branch
+1 💚	mvninstall	2m 45s	master passed
+1 💚	compile	0m 55s	master passed
+1 💚	shadedjars	4m 16s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 37s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 13s	Maven dependency ordering for patch
+1 💚	mvninstall	2m 53s	the patch passed
+1 💚	compile	0m 56s	the patch passed
+1 💚	javac	0m 56s	the patch passed
+1 💚	shadedjars	4m 15s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 45s	hbase-common in the patch passed.
+1 💚	unit	211m 19s	hbase-server in the patch passed.
		235m 55s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/4/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 32aa9c6efaca 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `a854cba`
Default Java	Temurin-1.8.0_352-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/4/testReport/
Max. process+thread count	2250 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/4/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache9 · 2023-02-10T04:16:44Z

I tried to refactoring a bit but the implementation of ProtobufLogReader is too complicated. I think we'd better abstract two types of WAL.Reader for reading WAL file. One is StreamingReader, which is used in most cases, for example, WAL splitting, WAL printing, etc, where we only need to read the file once and usually for closed WAL files. There is no need to support reset and seek. The other is TailingReader, which is used by Replication, where we need to support reset and seek, and also we need to tell the upper layer whether we need to reset the compress context when calling reseting. The logic will be more complicated as we need to consider the requirements for tailing a WAL file which is currently being written. The refactoring will be a bit big so I do not think we should apply it to branch-2.5 and branch-2.4. So let's apply the simple fix here and file another issue to implement the big refactoring.
Thanks.

I understand that this is a complicated and dirty job, I am ashamed that I didn't solve it thoroughly before... But by the way, just for this PR, Would you mind taking a look at https://issues.apache.org/jira/browse/HBASE-26850 and #4233? At that time, I thought that it could not fundamentally solve the problem, so I did not continue to push forward, but these two patches seem a bit similar? The difference is that I changed the implementation of addEntry.

I think this can be done step by step.
First, we apply the patch here to fix the problem first, where the performance maybe bad than before. And then, we refactor the Reader, to introduce two types of Reader, so we can focus on how to improve the performance of tailing the WAL file which is being written currently in Repliaction without affecting the WAL splitting logic. Then we could try to introduce fine-grained control on whether we should reconstruct the dictionary, and finally, we could try to improve the LRUDictionary to support checkpoint and rollback, and do a checkpoint at a proper place and use rollback instead of clear and reconstruct, to get all the performance back.

WDYT?

Thanks.

thangTang · 2023-02-10T04:50:38Z

I tried to refactoring a bit but the implementation of ProtobufLogReader is too complicated. I think we'd better abstract two types of WAL.Reader for reading WAL file. One is StreamingReader, which is used in most cases, for example, WAL splitting, WAL printing, etc, where we only need to read the file once and usually for closed WAL files. There is no need to support reset and seek. The other is TailingReader, which is used by Replication, where we need to support reset and seek, and also we need to tell the upper layer whether we need to reset the compress context when calling reseting. The logic will be more complicated as we need to consider the requirements for tailing a WAL file which is currently being written. The refactoring will be a bit big so I do not think we should apply it to branch-2.5 and branch-2.4. So let's apply the simple fix here and file another issue to implement the big refactoring.
Thanks.

I understand that this is a complicated and dirty job, I am ashamed that I didn't solve it thoroughly before... But by the way, just for this PR, Would you mind taking a look at https://issues.apache.org/jira/browse/HBASE-26850 and #4233? At that time, I thought that it could not fundamentally solve the problem, so I did not continue to push forward, but these two patches seem a bit similar? The difference is that I changed the implementation of addEntry.

I think this can be done step by step. First, we apply the patch here to fix the problem first, where the performance maybe bad than before. And then, we refactor the Reader, to introduce two types of Reader, so we can focus on how to improve the performance of tailing the WAL file which is being written currently in Repliaction without affecting the WAL splitting logic. Then we could try to introduce fine-grained control on whether we should reconstruct the dictionary, and finally, we could try to improve the LRUDictionary to support checkpoint and rollback, and do a checkpoint at a proper place and use rollback instead of clear and reconstruct, to get all the performance back.

WDYT?

Thanks.

Make sense.
+1 from me.

Apache-HBase · 2023-02-10T05:00:32Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 23s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 15s	master passed
+1 💚	compile	2m 52s	master passed
+1 💚	checkstyle	0m 43s	master passed
+1 💚	spotless	0m 37s	branch has no errors when running spotless:check.
+1 💚	spotbugs	1m 46s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 10s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 20s	the patch passed
+1 💚	compile	3m 0s	the patch passed
+1 💚	javac	3m 0s	the patch passed
+1 💚	checkstyle	0m 12s	The patch passed checkstyle in hbase-common
+1 💚	checkstyle	0m 31s	hbase-server: The patch generated 0 new + 9 unchanged - 2 fixed = 9 total (was 11)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	12m 45s	Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚	spotless	0m 37s	patch has no errors when running spotless:check.
+1 💚	spotbugs	2m 3s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 14s	The patch does not generate ASF License warnings.
		40m 21s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux e86193e8d189 5.4.0-1094-aws #102~18.04.1-Ubuntu SMP Tue Jan 10 21:07:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `a854cba`
Default Java	Eclipse Adoptium-11.0.17+8
Max. process+thread count	86 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/5/console
versions	git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache9 · 2023-02-10T06:10:42Z

@sunhelly Could you please try to see if this PR can also solve your problem?

And is it possible to contribute your replication test case to hbase-it?

Thanks.

Apache-HBase · 2023-02-10T08:16:23Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 48s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for branch
+1 💚	mvninstall	2m 48s	master passed
+1 💚	compile	0m 56s	master passed
+1 💚	shadedjars	4m 15s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for patch
+1 💚	mvninstall	2m 46s	the patch passed
+1 💚	compile	0m 56s	the patch passed
+1 💚	javac	0m 56s	the patch passed
+1 💚	shadedjars	4m 18s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 37s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 45s	hbase-common in the patch passed.
+1 💚	unit	211m 35s	hbase-server in the patch passed.
		236m 15s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/5/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux cd3608f7d3ac 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `a854cba`
Default Java	Temurin-1.8.0_352-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/5/testReport/
Max. process+thread count	2345 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/5/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-10T08:18:11Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 58s	Docker mode activated.
-0 ⚠️	yetus	0m 2s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 36s	master passed
+1 💚	compile	1m 6s	master passed
+1 💚	shadedjars	4m 34s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 11s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 36s	the patch passed
+1 💚	compile	1m 5s	the patch passed
+1 💚	javac	1m 5s	the patch passed
+1 💚	shadedjars	4m 35s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 13s	hbase-common in the patch passed.
+1 💚	unit	210m 12s	hbase-server in the patch passed.
		238m 2s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/5/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#5016
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 144cedb192cc 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `a854cba`
Default Java	Eclipse Adoptium-11.0.17+8
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/5/testReport/
Max. process+thread count	2622 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5016/5/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

sunhelly · 2023-02-10T08:50:01Z

I tested morning, sadly still something wrong...The problem is focus on one scenario, replicated mostly whole row deletes. It seems should not be relevant to the operation, but I can't find more relevant changes.
We have already reset the compress context to fix the issue in the last months, it resolved most problems and seems more stable than before. But we have one circumstance, the replication always stuck. The senerio is as follows.
There is two-way replications between cluster A and cluster B(both using wal group), A without WAL compression, B with WAL compression, write operations only on A. Now there are many whole row deletes on A, the replication of A->B is OK, the replication of B->A is always stucks, and the stuck is not rare, it is very easy to happen.
I can not reproduce this problem locally until now, maybe it's not relevant to the uncompress progress, maybe something wrong when compress and the WAL is corrupt. I used WALPrettyPrinter to read these WALs, the printer always stopped at the same position for one WAL, and no exceptions output, but the end read position of the printer is in the middle of the WAL.

Apache9 · 2023-02-10T09:14:48Z

I tested morning, sadly still something wrong...The problem is focus on one scenario, replicated mostly whole row deletes. It seems should not be relevant to the operation, but I can't find more relevant changes. We have already reset the compress context to fix the issue in the last months, it resolved most problems and seems more stable than before. But we have one circumstance, the replication always stuck. The senerio is as follows. There is two-way replications between cluster A and cluster B(both using wal group), A without WAL compression, B with WAL compression, write operations only on A. Now there are many whole row deletes on A, the replication of A->B is OK, the replication of B->A is always stucks, and the stuck is not rare, it is very easy to happen. I can not reproduce this problem locally until now, maybe it's not relevant to the uncompress progress, maybe something wrong when compress and the WAL is corrupt. I used WALPrettyPrinter to read these WALs, the printer always stopped at the same position for one WAL, and no exceptions output, but the end read position of the printer is in the middle of the WAL.

If WALPrettyPrinter can not output correct result, I think the problem is not about the replication implementation then, it should be something wrong when writing the WAL file. And I believe it will also make WAL splitting incorrect?

Do you also enabled WAL value compression? Or just the dictionary based compression...

Thanks.

sunhelly · 2023-02-10T09:27:18Z

Yes, I also enabled WAL value compression. I'll check if the stuck recurs after disable it.
And there are no WAL splitting issues until now. Thanks.

Apache9 · 2023-02-10T09:30:49Z

Maybe the problem is that, in replication, we will check whether we have parsed all the bytes. But in WAL splitting, we just return after getting EOF...

sunhelly · 2023-02-10T09:48:18Z

Oh, the cluster B really has lose data issue..

Apache9 · 2023-02-10T10:17:54Z

The stuck still occurs after diabling WAL value compression.

Is it OK for your company to upload the WAL file somewhere? So we can see the content of the WAL file and check what is the problem...

sunhelly · 2023-02-10T10:41:22Z

OK. I'll prepare one.

sunhelly · 2023-02-11T01:10:12Z

It works well after disabling WAL value compress with this fix PR on our cluster. We can reproduce the replication stuck by enable WAL value compression, while the WALPrettyPrinter stops at the middle position without any exceptions. The stuck issue now is not relevant to the dictionary.
Great job! Thanks.

Apache9 · 2023-02-11T11:33:50Z

Thanks @sunhelly for providing the useful feedback.

Let me merge this PR first to solve the dictionary problem.

For replication value compression, seems there are still other bugs and @apurtell also pointed out that there are some tricks in the buffer reuse mechanism, will dig more and file other issues to try to fix.

Thanks.

…pressed WAL file (#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)

…g when reading compressed WAL file (apache#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)

…pressed WAL file (apache#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e) (cherry picked from commit 8df3212) Change-Id: I469fa5b5a7ba6a41c3b8b28acb57a60f33c27fe9

Apache9 self-assigned this Feb 8, 2023

bbeaudreault reviewed Feb 8, 2023

View reviewed changes

sunhelly approved these changes Feb 9, 2023

View reviewed changes

Apache9 marked this pull request as draft February 9, 2023 07:31

HBASE-27621 Also clear the Dictionary when resetting when reading com…

4c5009b

…pressed WAL file

Apache9 force-pushed the HBASE-27621 branch from 2521729 to 4c5009b Compare February 9, 2023 15:17

Apache9 marked this pull request as ready for review February 9, 2023 15:17

Apache9 changed the title ~~HBASE-27621 Always use findEntry to fill the Dictionary when reading …~~ HBASE-27621 Also clear the Dictionary when resetting when reading compressed WAL file Feb 9, 2023

sunhelly approved these changes Feb 11, 2023

View reviewed changes

Apache9 merged commit 833b10e into apache:master Feb 11, 2023

Apache9 added a commit that referenced this pull request Feb 11, 2023

HBASE-27621 Also clear the Dictionary when resetting when reading com…

8df3212

…pressed WAL file (#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)

Apache9 added a commit that referenced this pull request Feb 11, 2023

HBASE-27621 Also clear the Dictionary when resetting when reading com…

d401913

…pressed WAL file (#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)

Apache9 added a commit that referenced this pull request Feb 11, 2023

HBASE-27621 Also clear the Dictionary when resetting when reading com…

1fb311f

…pressed WAL file (#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)

HBASE-27621 Also clear the Dictionary when resetting when reading compressed WAL file #5016

HBASE-27621 Also clear the Dictionary when resetting when reading compressed WAL file #5016

Conversation

Apache9 commented Feb 8, 2023 • edited Loading

thangTang commented Feb 8, 2023

Apache-HBase commented Feb 8, 2023

bbeaudreault Feb 8, 2023

Choose a reason for hiding this comment

Apache9 Feb 8, 2023

Choose a reason for hiding this comment

bbeaudreault Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

Apache-HBase commented Feb 8, 2023

Apache-HBase commented Feb 8, 2023

Apache9 commented Feb 8, 2023

thangTang commented Feb 8, 2023 • edited Loading

Apache-HBase commented Feb 8, 2023

Apache-HBase commented Feb 8, 2023

Apache-HBase commented Feb 8, 2023

Apache-HBase commented Feb 8, 2023

Apache-HBase commented Feb 8, 2023

Apache-HBase commented Feb 8, 2023

Apache9 commented Feb 9, 2023

thangTang commented Feb 9, 2023

Apache9 commented Feb 9, 2023

sunhelly commented Feb 9, 2023

Apache9 commented Feb 9, 2023

Apache9 commented Feb 9, 2023

thangTang commented Feb 9, 2023

thangTang commented Feb 9, 2023

thangTang commented Feb 9, 2023

Apache-HBase commented Feb 9, 2023

Apache-HBase commented Feb 9, 2023

Apache9 commented Feb 10, 2023

thangTang commented Feb 10, 2023

Apache-HBase commented Feb 10, 2023

Apache9 commented Feb 10, 2023

Apache-HBase commented Feb 10, 2023

Apache-HBase commented Feb 10, 2023

sunhelly commented Feb 10, 2023

Apache9 commented Feb 10, 2023

sunhelly commented Feb 10, 2023

Apache9 commented Feb 10, 2023

sunhelly commented Feb 10, 2023

Apache9 commented Feb 10, 2023

sunhelly commented Feb 10, 2023

sunhelly commented Feb 11, 2023

Apache9 commented Feb 11, 2023

Apache9 commented Feb 8, 2023 •

edited

Loading

bbeaudreault Feb 8, 2023 •

edited

Loading

thangTang commented Feb 8, 2023 •

edited

Loading