Add dump/restore support for Hypercore TAM #7356

erimatnor · 2024-10-17T07:32:15Z

Add support for dumping and restoring hypertables that have chunks that use the Hypercore TAM.

Dumping a Hypercore table requires special consideration because its data is internally stored in two separate relations: one for compressed data and one for non-compressed data. The TAM returns data from both relations, but they may be dumped as separate tables. This risks dumping the compressed data twice: once via the TAM and once via the compressed table in compressed format.

The pg_dump tool uses COPY TO to create dumps of each table, and, to avoid data duplication when used on Hypercore tables, this change introduces a GUC that allows selecting one of these two behaviors:

A COPY TO on a Hypercore table returns all data via the TAM, including data stored in the compressed relation. A COPY TO on the internal compressed relation returns no data.
A COPY TO on a Hypercore returns only non-compressed data, while a COPY TO on the compressed relation returns compressed data. A SELECT still returns all the data as normal.

The second approach is the default because it is consistent with compression when Hypercore TAM is not used. It will produce a pg_dump archive that includes data in compressed form (if data was compressed when dumped). Conversely, option (1) will produce an archive that looks identical to a dump from an non-compressed table.

There are pros and cons of each dump format. A non-compressed archive is a platform-agnostic logical dump that can be restored to any platform and architecture, while a compressed archive includes data that is compressed in a platform-dependent way and needs to be restored to a compatible system.

A test is added that tests both these settings and corresponding dumping and restoring.

Disable-check: force-changelog-file

codecov · 2024-10-17T11:28:30Z

Codecov Report

Attention: Patch coverage is 91.04478% with 6 lines in your changes missing coverage. Please review.

Project coverage is 82.10%. Comparing base (59f50f2) to head (7e2d5fd).
Report is 601 commits behind head on main.

Files with missing lines	Patch %	Lines
tsl/src/process_utility.c	84.84%	3 Missing and 2 partials ⚠️
tsl/src/hypercore/hypercore_handler.c	95.23%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7356      +/-   ##
==========================================
+ Coverage   80.06%   82.10%   +2.03%     
==========================================
  Files         190      230      +40     
  Lines       37181    43128    +5947     
  Branches     9450    10835    +1385     
==========================================
+ Hits        29770    35409    +5639     
- Misses       2997     3401     +404     
+ Partials     4414     4318      -96

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

Replace the scankey flag used to skip compressed data when starting a Hypercore scan with a function that sets this option on the scan descriptor. Internally, use the scan flags instead of scankey flags to convey this setting. Overloading scankey flags was not ideal since this is supposed to be per-column settings and not overall scan settings. Note that it is possible to set the scan flags when calling the TAM's beginscan callback, but the table_beginscan() wrapper does not expose flags and instead there's a separate function for each flag settings. Hypercore could define its own beginscan function to do the same, but this is left for the future.

erimatnor · 2024-11-19T14:10:36Z

Currently depends on #7462 and #7454

Add support for dumping and restoring hypertables that have chunks that use the Hypercore TAM. Dumping a Hypercore table requires special consideration because its data is internally stored in two separate relations: one for compressed data and one for non-compressed data. The TAM returns data from both relations, but they may be dumped as separate tables. This risks dumping the compressed data twice: once via the TAM and once via the compressed table in compressed format. The `pg_dump` tool uses `COPY TO` to create dumps of each table, and, to avoid data duplication when used on Hypercore tables, this change introduces a GUC that allows selecting one of these two behaviors: 1. A `COPY TO` on a Hypercore table returns all data via the TAM, including data stored in the compressed relation. A `COPY TO` on the internal compressed relation returns no data. 2. A `COPY TO` on a Hypercore returns only non-compressed data, while a `COPY TO` on the compressed relation returns compressed data. A `SELECT` still returns all the data as normal. The second approach is the default because it is consistent with compression when Hypercore TAM is not used. It will produce a `pg_dump` archive that includes data in compressed form (if data was compressed when dumped). Conversely, option (1) will produce an archive that looks identical to a dump from an non-compressed table. There are pros and cons of each dump format. A non-compressed archive is a platform-agnostic logical dump that can be restored to any platform and architecture, while a compressed archive includes data that is compressed in a platform-dependent way and needs to be restored to a compatible system. A test is added that tests both these settings and corresponding dumping and restoring.

PG17 introduced a new Bump allocator which for per-tuple memory contexts. Bump doesn't support pfree(), which caused an error when detoasting compressed data on the per-tuple memory context since the detoasting needs to scan some catalog tables. Make sure Hypercore TAM always detoasts on a temporary memory to fix this issue and add a test for it.

erimatnor added the hypercore label Oct 17, 2024

erimatnor requested review from fabriziomello, mkindahl and antekresic October 17, 2024 07:32

erimatnor force-pushed the hyperstore-pgdump branch from 669a90e to d7fedcb Compare October 17, 2024 07:33

erimatnor changed the title ~~Add dump/restore support for Hypercore~~ Add dump/restore support for Hypercore TAM Oct 17, 2024

erimatnor force-pushed the hyperstore-pgdump branch 3 times, most recently from 098490f to ab64ed9 Compare October 17, 2024 11:17

erimatnor force-pushed the hyperstore-pgdump branch from ab64ed9 to 304334f Compare October 18, 2024 12:21

fabriziomello assigned erimatnor Oct 22, 2024

erimatnor force-pushed the hyperstore-pgdump branch from 304334f to 80fcb9f Compare November 19, 2024 14:09

erimatnor force-pushed the hyperstore-pgdump branch from 80fcb9f to e4ed218 Compare November 19, 2024 14:14

erimatnor added 2 commits November 19, 2024 15:17

erimatnor force-pushed the hyperstore-pgdump branch from e4ed218 to 7e2d5fd Compare November 19, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dump/restore support for Hypercore TAM #7356

Add dump/restore support for Hypercore TAM #7356

erimatnor commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 17, 2024 •

edited

Loading

erimatnor commented Nov 19, 2024 •

edited

Loading

Add dump/restore support for Hypercore TAM #7356

Are you sure you want to change the base?

Add dump/restore support for Hypercore TAM #7356

Conversation

erimatnor commented Oct 17, 2024 • edited Loading

codecov bot commented Oct 17, 2024 • edited Loading

Codecov Report

erimatnor commented Nov 19, 2024 • edited Loading

erimatnor commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 17, 2024 •

edited

Loading

erimatnor commented Nov 19, 2024 •

edited

Loading