-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Re-packing rows during compaction, after alter table add column with default value can result in a crash #24050
Labels
2.20 Backport Required
2.20.7_blocker
2.20.7.1_blocker
2024.1 Backport Required
2024.2 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Comments
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Sep 19, 2024
rthallamko3
changed the title
[DocDB] Fix re-packing rows after alter table add column with default value
[DocDB] Re-packing rows during compaction, after alter table add column with default value can result in a crash
Sep 20, 2024
yugabyte-ci
added
priority/high
High Priority
and removed
priority/medium
Medium priority issue
labels
Sep 20, 2024
Huqicheng
added a commit
that referenced
this issue
Sep 26, 2024
… default value Summary: D17904 fixed re-packing failure due to packed row size overflow. There's a similar issue where compaction follows an `alter table` statement that adds a column with default value eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL` Compaction will fail with the error like `Unable to pack old value for 10` Fix this by allowing the packed row size exceeds the size limit during re-packing. Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with ``` E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1 ../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure Failed Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2 ``` Jira: DB-12940 Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*" Reviewers: sergei, arybochkin, rthallam Reviewed By: sergei Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D38242
fizaaluthra
pushed a commit
that referenced
this issue
Sep 27, 2024
Summary: 12b2c40 [#23999] DocDB: Big shared memory segments b1e6329 [PLAT-15279] Add gzip compression to core dumps from DB. 06472d5 [#24050] docdb: Fix re-packing rows after alter table add column with default value 9009d11 [#23837] YSQL: Temporarily disable some tests with Connection Manager enabled 11acca7 [#23325][#23326] yugabyted: Support for adding new databases for xCluster replication (Phase 2) 96703da [PLAT-15465][PLAT-15466] Minor fixes in YNP c5aca3b [PLAT-14924][PLAT-12829][PLAT-15446] - ui bugs and improvements 6e82692 [#23770] [#23797] YSQL: Stabilise some test failures with Connection Manager enabled b50bd1b [PLAT-15279] Adjusting the core pattern to create the cores with the core_ prefix for collect cores to catch it f692a60 [PLAT-14045] UBI-8 images don't have hostname d6a19da [PLAT-15377] Adding a global uncaught exception handler to yugaware acbb1ba [PLAT-15225] Verify there is no running master on nodes selected for master replacement Excluded: 3e93354 [#23686] YSQL: Build relcache foreign key list from YB catcache Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: tfoucher, fizaa, telgersma Differential Revision: https://phorge.dev.yugabyte.com/D38503
Huqicheng
added a commit
that referenced
this issue
Oct 2, 2024
…e add column with default value Summary: Original commit: 06472d5 / D38242 D17904 fixed re-packing failure due to packed row size overflow. There's a similar issue where compaction follows an `alter table` statement that adds a column with default value eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL` Compaction will fail with the error like `Unable to pack old value for 10` Fix this by allowing the packed row size exceeds the size limit during re-packing. Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with ``` E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1 ../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure Failed Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2 ``` Jira: DB-12940 Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*" Reviewers: sergei, arybochkin, rthallam Reviewed By: rthallam Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D38462
Huqicheng
added a commit
that referenced
this issue
Oct 2, 2024
…e add column with default value Summary: Original commit: 06472d5 / D38242 D17904 fixed re-packing failure due to packed row size overflow. There's a similar issue where compaction follows an `alter table` statement that adds a column with default value eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL` Compaction will fail with the error like `Unable to pack old value for 10` Fix this by allowing the packed row size exceeds the size limit during re-packing. Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with ``` E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1 ../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure Failed Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2 ``` Jira: DB-12940 Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*" Reviewers: sergei, arybochkin, rthallam Reviewed By: rthallam Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D38463
Huqicheng
added a commit
that referenced
this issue
Oct 7, 2024
…e add column with default value Summary: Original commit: 06472d5 / D38242 D17904 fixed re-packing failure due to packed row size overflow. There's a similar issue where compaction follows an `alter table` statement that adds a column with default value eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL` Compaction will fail with the error like `Unable to pack old value for 10` Fix this by allowing the packed row size exceeds the size limit during re-packing. Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with ``` E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1 ../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure Failed Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2 ``` Jira: DB-12940 Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*" Reviewers: sergei, arybochkin, rthallam Reviewed By: rthallam Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D38607
Huqicheng
added a commit
that referenced
this issue
Oct 7, 2024
…add column with default value Summary: Original commit: 06472d5 / D38242 D17904 fixed re-packing failure due to packed row size overflow. There's a similar issue where compaction follows an `alter table` statement that adds a column with default value eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL` Compaction will fail with the error like `Unable to pack old value for 10` Fix this by allowing the packed row size exceeds the size limit during re-packing. Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with ``` E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1 ../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure Failed Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2 ``` Jira: DB-12940 Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*" Reviewers: sergei, arybochkin, rthallam Reviewed By: rthallam Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D38606
1 task
Huqicheng
added a commit
that referenced
this issue
Nov 10, 2024
…ble add column with default value Summary: D17904 fixed re-packing failure due to packed row size overflow. There's a similar issue where compaction follows an `alter table` statement that adds a column with default value eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL` Compaction will fail with the error like `Unable to pack old value for 10` Fix this by allowing the packed row size exceeds the size limit during re-packing. Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with ``` E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1 ../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure Failed Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2 ``` Jira: DB-12940 Original commit: 06472d5 / D38242 Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*" Reviewers: sergei, arybochkin, rthallam Reviewed By: rthallam Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D39856
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.20 Backport Required
2.20.7_blocker
2.20.7.1_blocker
2024.1 Backport Required
2024.2 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Jira Link: DB-12940
Description
#13037 fixed re-packing failure due to packed row size overflow.
However, there's a similar issue wherein, after an alter table statement that adds a column with default value like.
ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL
subsequent compactions can fail with the error
Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for
This is because when compaction runs, it can repack the row, causing it to go over the gflag limit -
ysql_packed_row_size_limit
. The code doesn't handle this case very well and crashes the tserver process. After the crash, tserver will come back up online. Compactions can run again and encounter the same issue, resulting in a crash loop.Workaround: Increasing the tserver gflag
ysql_packed_row_size_limit
to a value higher than the expected row size can help avoid the crash loop, till the tserver is on a build with this fix.This issue tracks the code fix to prevent the repacking during compactions from causing a crash.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: