Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dump/restore support for Hypercore TAM #7356

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Oct 17, 2024

Add support for dumping and restoring hypertables that have chunks that use the Hypercore TAM.

Dumping a Hypercore table requires special consideration because its data is internally stored in two separate relations: one for compressed data and one for non-compressed data. The TAM returns data from both relations, but they may be dumped as separate tables. This risks dumping the compressed data twice: once via the TAM and once via the compressed table in compressed format.

The pg_dump tool uses COPY TO to create dumps of each table, and, to avoid data duplication when used on Hypercore tables, this change introduces a GUC that allows selecting one of these two behaviors:

  1. A COPY TO on a Hypercore table returns all data via the TAM, including data stored in the compressed relation. A COPY TO on the internal compressed relation returns no data.

  2. A COPY TO on a Hypercore returns only non-compressed data, while a COPY TO on the compressed relation returns compressed data. A SELECT still returns all the data as normal.

The second approach is the default because it is consistent with compression when Hypercore TAM is not used. It will produce a pg_dump archive that includes data in compressed form (if data was compressed when dumped). Conversely, option (1) will produce an archive that looks identical to a dump from an non-compressed table.

There are pros and cons of each dump format. A non-compressed archive is a platform-agnostic logical dump that can be restored to any platform and architecture, while a compressed archive includes data that is compressed in a platform-dependent way and needs to be restored to a compatible system.

A test is added that tests both these settings and corresponding dumping and restoring.

Disable-check: force-changelog-file

@erimatnor erimatnor changed the title Add dump/restore support for Hypercore Add dump/restore support for Hypercore TAM Oct 17, 2024
@erimatnor erimatnor force-pushed the hyperstore-pgdump branch 3 times, most recently from 098490f to ab64ed9 Compare October 17, 2024 11:17
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 91.04478% with 6 lines in your changes missing coverage. Please review.

Project coverage is 82.10%. Comparing base (59f50f2) to head (7e2d5fd).
Report is 601 commits behind head on main.

Files with missing lines Patch % Lines
tsl/src/process_utility.c 84.84% 3 Missing and 2 partials ⚠️
tsl/src/hypercore/hypercore_handler.c 95.23% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7356      +/-   ##
==========================================
+ Coverage   80.06%   82.10%   +2.03%     
==========================================
  Files         190      230      +40     
  Lines       37181    43128    +5947     
  Branches     9450    10835    +1385     
==========================================
+ Hits        29770    35409    +5639     
- Misses       2997     3401     +404     
+ Partials     4414     4318      -96     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Replace the scankey flag used to skip compressed data when starting a
Hypercore scan with a function that sets this option on the scan
descriptor. Internally, use the scan flags instead of scankey flags to
convey this setting.

Overloading scankey flags was not ideal since this is supposed to be
per-column settings and not overall scan settings.

Note that it is possible to set the scan flags when calling the TAM's
beginscan callback, but the table_beginscan() wrapper does not expose
flags and instead there's a separate function for each flag
settings. Hypercore could define its own beginscan function to do the
same, but this is left for the future.
@erimatnor
Copy link
Contributor Author

erimatnor commented Nov 19, 2024

Currently depends on #7462 and #7454

Add support for dumping and restoring hypertables that have chunks
that use the Hypercore TAM.

Dumping a Hypercore table requires special consideration because its
data is internally stored in two separate relations: one for
compressed data and one for non-compressed data. The TAM returns data
from both relations, but they may be dumped as separate tables. This
risks dumping the compressed data twice: once via the TAM and once via
the compressed table in compressed format.

The `pg_dump` tool uses `COPY TO` to create dumps of each table, and,
to avoid data duplication when used on Hypercore tables, this change
introduces a GUC that allows selecting one of these two behaviors:

1. A `COPY TO` on a Hypercore table returns all data via the TAM,
   including data stored in the compressed relation. A `COPY TO` on
   the internal compressed relation returns no data.

2. A `COPY TO` on a Hypercore returns only non-compressed data, while
   a `COPY TO` on the compressed relation returns compressed data. A
   `SELECT` still returns all the data as normal.

The second approach is the default because it is consistent with
compression when Hypercore TAM is not used. It will produce a
`pg_dump` archive that includes data in compressed form (if data was
compressed when dumped). Conversely, option (1) will produce an
archive that looks identical to a dump from an non-compressed table.

There are pros and cons of each dump format. A non-compressed archive
is a platform-agnostic logical dump that can be restored to any
platform and architecture, while a compressed archive includes data
that is compressed in a platform-dependent way and needs to be
restored to a compatible system.

A test is added that tests both these settings and corresponding
dumping and restoring.
PG17 introduced a new Bump allocator which for per-tuple memory
contexts. Bump doesn't support pfree(), which caused an error when
detoasting compressed data on the per-tuple memory context since the
detoasting needs to scan some catalog tables.

Make sure Hypercore TAM always detoasts on a temporary memory to fix
this issue and add a test for it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant