Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Re-packing rows during compaction, after alter table add column with default value can result in a crash #24050

Closed
1 task done
Huqicheng opened this issue Sep 19, 2024 · 0 comments

Comments

@Huqicheng
Copy link
Contributor

Huqicheng commented Sep 19, 2024

Jira Link: DB-12940

Description

#13037 fixed re-packing failure due to packed row size overflow.

However, there's a similar issue wherein, after an alter table statement that adds a column with default value like.

ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL

subsequent compactions can fail with the error

Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for
This is because when compaction runs, it can repack the row, causing it to go over the gflag limit - ysql_packed_row_size_limit. The code doesn't handle this case very well and crashes the tserver process. After the crash, tserver will come back up online. Compactions can run again and encounter the same issue, resulting in a crash loop.

Workaround: Increasing the tserver gflag ysql_packed_row_size_limit to a value higher than the expected row size can help avoid the crash loop, till the tserver is on a build with this fix.

This issue tracks the code fix to prevent the repacking during compactions from causing a crash.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@Huqicheng Huqicheng added area/docdb YugabyteDB core features 2.20 Backport Required labels Sep 19, 2024
@Huqicheng Huqicheng self-assigned this Sep 19, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Sep 19, 2024
@rthallamko3 rthallamko3 changed the title [DocDB] Fix re-packing rows after alter table add column with default value [DocDB] Re-packing rows during compaction, after alter table add column with default value can result in a crash Sep 20, 2024
@yugabyte-ci yugabyte-ci added priority/high High Priority and removed priority/medium Medium priority issue labels Sep 20, 2024
Huqicheng added a commit that referenced this issue Sep 26, 2024
… default value

Summary:
D17904 fixed re-packing failure due to packed row size overflow.
There's a similar issue where compaction follows an `alter table` statement that adds a column with default value
eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL`
Compaction will fail with the error like `Unable to pack old value for 10`

Fix this by allowing the packed row size exceeds the size limit during re-packing.

Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with
```
E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1
../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure
Failed
Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2
```

Jira: DB-12940

Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*"

Reviewers: sergei, arybochkin, rthallam

Reviewed By: sergei

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D38242
fizaaluthra pushed a commit that referenced this issue Sep 27, 2024
Summary:
 12b2c40 [#23999] DocDB: Big shared memory segments
 b1e6329 [PLAT-15279] Add gzip compression to core dumps from DB.
 06472d5 [#24050] docdb: Fix re-packing rows after alter table add column with default value
 9009d11 [#23837] YSQL: Temporarily disable some tests with Connection Manager enabled
 11acca7 [#23325][#23326] yugabyted: Support for adding new databases for xCluster replication (Phase 2)
 96703da [PLAT-15465][PLAT-15466] Minor fixes in YNP
 c5aca3b [PLAT-14924][PLAT-12829][PLAT-15446] - ui bugs and improvements
 6e82692 [#23770] [#23797] YSQL: Stabilise some test failures with Connection Manager enabled
 b50bd1b [PLAT-15279] Adjusting the core pattern to create the cores with the core_ prefix for collect cores to catch it
 f692a60 [PLAT-14045] UBI-8 images don't have hostname
 d6a19da [PLAT-15377] Adding a global uncaught exception handler to yugaware
 acbb1ba [PLAT-15225] Verify there is no running master on nodes selected for master replacement
 Excluded: 3e93354 [#23686] YSQL: Build relcache foreign key list from YB catcache

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: tfoucher, fizaa, telgersma

Differential Revision: https://phorge.dev.yugabyte.com/D38503
Huqicheng added a commit that referenced this issue Oct 2, 2024
…e add column with default value

Summary:
Original commit: 06472d5 / D38242
D17904 fixed re-packing failure due to packed row size overflow.
There's a similar issue where compaction follows an `alter table` statement that adds a column with default value
eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL`
Compaction will fail with the error like `Unable to pack old value for 10`

Fix this by allowing the packed row size exceeds the size limit during re-packing.

Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with
```
E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1
../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure
Failed
Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2
```

Jira: DB-12940

Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*"

Reviewers: sergei, arybochkin, rthallam

Reviewed By: rthallam

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38462
Huqicheng added a commit that referenced this issue Oct 2, 2024
…e add column with default value

Summary:
Original commit: 06472d5 / D38242
D17904 fixed re-packing failure due to packed row size overflow.
There's a similar issue where compaction follows an `alter table` statement that adds a column with default value
eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL`
Compaction will fail with the error like `Unable to pack old value for 10`

Fix this by allowing the packed row size exceeds the size limit during re-packing.

Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with
```
E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1
../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure
Failed
Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2
```

Jira: DB-12940

Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*"

Reviewers: sergei, arybochkin, rthallam

Reviewed By: rthallam

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38463
Huqicheng added a commit that referenced this issue Oct 7, 2024
…e add column with default value

Summary:
Original commit: 06472d5 / D38242
D17904 fixed re-packing failure due to packed row size overflow.
There's a similar issue where compaction follows an `alter table` statement that adds a column with default value
eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL`
Compaction will fail with the error like `Unable to pack old value for 10`

Fix this by allowing the packed row size exceeds the size limit during re-packing.

Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with
```
E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1
../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure
Failed
Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2
```

Jira: DB-12940

Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*"

Reviewers: sergei, arybochkin, rthallam

Reviewed By: rthallam

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38607
Huqicheng added a commit that referenced this issue Oct 7, 2024
…add column with default value

Summary:
Original commit: 06472d5 / D38242
D17904 fixed re-packing failure due to packed row size overflow.
There's a similar issue where compaction follows an `alter table` statement that adds a column with default value
eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL`
Compaction will fail with the error like `Unable to pack old value for 10`

Fix this by allowing the packed row size exceeds the size limit during re-packing.

Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with
```
E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1
../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure
Failed
Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2
```

Jira: DB-12940

Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*"

Reviewers: sergei, arybochkin, rthallam

Reviewed By: rthallam

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38606
Huqicheng added a commit that referenced this issue Nov 10, 2024
…ble add column with default value

Summary:
D17904 fixed re-packing failure due to packed row size overflow.
There's a similar issue where compaction follows an `alter table` statement that adds a column with default value
eg `ALTER TABLE t ADD COLUMN v3 TIMESTAMP DEFAULT CURRENT_TIMESTAMP NULL`
Compaction will fail with the error like `Unable to pack old value for 10`

Fix this by allowing the packed row size exceeds the size limit during re-packing.

Modified PgPackedRowTest.PackOverflow to also test with column with default value. Without this fix, compaction will fail with
```
E0919 21:12:51.649142 920988 db_impl.cc:3417] T 0685c068ecf34c9b8252aa37a12f7de7 P 5e3d933d73d2450180c36a3cf5b3a74b [R]: Waiting after background compaction error: Corruption (yb/docdb/docdb_compaction_context.cc:403): Unable to pack old value for 2, Accumulated background error counts: 1
../../src/yb/yql/pgwrapper/pg_packed_row-test.cc:712: Failure
Failed
Bad status: Corruption (yb/docdb/docdb_compaction_context.cc:403): Compact range failed: Unable to pack old value for 2
```

Jira: DB-12940

Original commit: 06472d5 / D38242

Test Plan: ./yb_build.sh release --cxx-test pg_packed_row-test --gtest_filter "PackingVersion/PgPackedRowTest.PackOverflow/*"

Reviewers: sergei, arybochkin, rthallam

Reviewed By: rthallam

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D39856
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants