Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28643 An unbounded backup failure message can cause an irrecoverable state for the given backup #6088

Merged
merged 3 commits into from
Sep 2, 2024

Conversation

rmdmattingly
Copy link
Contributor

https://issues.apache.org/jira/browse/HBASE-28643

The BackupInfo class has a failedMsg field which is a string of unbounded length. When a DistCp job fails then its failure message contains all of its source paths, and its failure message gets propagated to this failedMsg field on the given BackupInfo.

If a DistCp job has enough source paths, then this will result in backup status updates being rejected:

java.lang.IllegalArgumentException: KeyValue size too large
        at org.apache.hadoop.hbase.client.ConnectionUtils.validatePut(ConnectionUtils.java:513)
        at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1095)
        at org.apache.hadoop.hbase.client.HTable.lambda$put$3(HTable.java:564)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:563)
        at org.apache.hadoop.hbase.backup.impl.BackupSystemTable.updateBackupInfo(BackupSystemTable.java:292)
        at org.apache.hadoop.hbase.backup.impl.BackupManager.updateBackupInfo(BackupManager.java:376)
        at org.apache.hadoop.hbase.backup.impl.TableBackupClient.failBackup(TableBackupClient.java:243)
        at org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:317)
        at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:603)

Without the ability to update the backup's state, it will never be returned as a failed backup by the client. This means that any mechanisms designed for repairing or cleaning failed backups won't work properly.

I think that a simple fix here would be fine: we should truncate the failedMsg field to a reasonable maximum size.

I've also tried to ensure that we'll propagate the failure if we ever fail to update the BackupInfo, for whatever reason

cc @hgromer @ndimiduk @DieterDP-ng

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 15s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 4s master passed
+1 💚 compile 0m 32s master passed
+1 💚 checkstyle 0m 12s master passed
+1 💚 spotbugs 0m 32s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 55s the patch passed
+1 💚 compile 0m 30s the patch passed
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 11s the patch passed
+1 💚 spotbugs 0m 39s the patch passed
+1 💚 hadoopcheck 10m 29s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 43s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
27m 33s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6088/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6088
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 550e0e2c0a19 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 2cd3a22
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6088/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 41s master passed
+1 💚 compile 0m 26s master passed
+1 💚 javadoc 0m 23s master passed
+1 💚 shadedjars 6m 21s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 22s the patch passed
+1 💚 compile 0m 19s the patch passed
+1 💚 javac 0m 19s the patch passed
+1 💚 javadoc 0m 13s the patch passed
+1 💚 shadedjars 5m 36s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 10m 27s hbase-backup in the patch passed.
32m 22s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6088/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6088
Optional Tests javac javadoc unit compile shadedjars
uname Linux 1400486ce2b8 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 2cd3a22
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6088/2/testReport/
Max. process+thread count 3851 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6088/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@ndimiduk ndimiduk merged commit dcffc4a into apache:master Sep 2, 2024
1 check passed
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 2, 2024
…rable state for the given backup (apache#6088)

Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 2, 2024
…rable state for the given backup (apache#6088)

Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 2, 2024
…rable state for the given backup (apache#6088)

Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
@ndimiduk ndimiduk deleted the HBASE-28643 branch September 2, 2024 08:30
ndimiduk pushed a commit that referenced this pull request Sep 2, 2024
…rable state for the given backup (#6088)

Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
ndimiduk pushed a commit that referenced this pull request Sep 2, 2024
…rable state for the given backup (#6088)

Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
ndimiduk pushed a commit that referenced this pull request Sep 2, 2024
…rable state for the given backup (#6088)

Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Nick Dimiduk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants