Safekeeper peer recovery preparatory patches #5118

arssher · 2023-08-28T04:39:59Z

Implements #4875

Now available under GET /tenant/xxx/timeline/yyy for inspection.

arssher · 2023-08-28T04:42:49Z

@petuhovskiy Trying to make this easier to digest I attempt to do it in several smaller commits -- the ones pushed here are ready for review.

github-actions · 2023-08-28T05:02:17Z

1624 tests run: 1550 passed, 0 failed, 74 skipped (full report)

Flaky tests (1)

Postgres 14

test_crafted_wal_end[last_wal_record_xlog_switch_ends_on_page_boundary]: release

_{The comment gets automatically updated with the latest test results
8f0ae23 at 2023-08-29T19:38:46.837Z :recycle:}

Slightly refactors init: now load_tenant_timelines is also async to properly init the timeline, but to keep global map lock sync we just acquire it anew for each timeline. Recovery task itself is just a stub here. part of #4875

Add derive Ord for easy comparison of <term, lsn> pairs. part of #4875

We need them for safekeeper peer recovery #4875

knizhnik

I wonder if this recovery can be somehow combined with retrieving at compute WAL needed for logical replication? In principle - the approach is similar: we need to send WAL till some boundary (in this case determined by logical replication slot) to walproposer.

Or it is better not to mix this two things?

safekeeper/src/send_wal.rs

arssher · 2023-08-28T08:44:47Z

Or it is better not to mix this two things?

These are not very related. Interface for fetching WAL from safekeepers by pg protocol exists for a long time, and can be used for logical replication as well, in fact we already have fetching code in walproposer (which I plan to remove soon, but anyway, it is trivial). This patchset extends so that not committed part can also be dynamically fetched, but that's not much needed for replication, as not committed part most often can be still on the compute as it generates it.

I still think that hardest part about logical repl is persistency of replication slots and historical snapshots...

petuhovskiy

I forgot that this PR is splitted into commits and reviewed all changed as usual (altogether in Files changed tab).

Overall LGTM, let's merge and deploy.

safekeeper/src/recovery.rs

safekeeper/src/timelines_global_map.rs

safekeeper/src/send_wal.rs

It will be used by safekeeper as well.

Instead of fixed during the start of replication. To this end, create term_flush_lsn watch channel similar to commit_lsn one. This allows to continue recovery streaming if new data appears.

Add safekeeper peers to TimelineInfo.

704519d

Now available under GET /tenant/xxx/timeline/yyy for inspection.

arssher requested review from a team as code owners August 28, 2023 04:40

arssher requested review from petuhovskiy, knizhnik and koivunej and removed request for a team August 28, 2023 04:40

arssher added 3 commits August 28, 2023 08:20

Start and stop per timeline recovery task.

12310d9

Slightly refactors init: now load_tenant_timelines is also async to properly init the timeline, but to keep global map lock sync we just acquire it anew for each timeline. Recovery task itself is just a stub here. part of #4875

Rename TermSwitchEntry to TermLsn.

494d9fb

Add derive Ord for easy comparison of <term, lsn> pairs. part of #4875

Add term and http endpoint to broker messaged SkTimelineInfo.

5634c1c

We need them for safekeeper peer recovery #4875

arssher force-pushed the sk-peer-recovery-2 branch from 1686ba1 to 4f5d5b4 Compare August 28, 2023 05:20

knizhnik approved these changes Aug 28, 2023

View reviewed changes

koivunej reviewed Aug 28, 2023

View reviewed changes

safekeeper/src/send_wal.rs Show resolved Hide resolved

koivunej reviewed Aug 28, 2023

View reviewed changes

safekeeper/src/send_wal.rs Show resolved Hide resolved

petuhovskiy approved these changes Aug 28, 2023

View reviewed changes

safekeeper/src/recovery.rs Show resolved Hide resolved

safekeeper/src/recovery.rs Show resolved Hide resolved

safekeeper/src/timelines_global_map.rs Show resolved Hide resolved

safekeeper/src/send_wal.rs Show resolved Hide resolved

arssher added 2 commits August 29, 2023 22:14

Move wal_stream_connection_config to utils.

e6e68a2

It will be used by safekeeper as well.

Allow walsender in recovery mode give WAL till dynamic flush_lsn.

8f0ae23

Instead of fixed during the start of replication. To this end, create term_flush_lsn watch channel similar to commit_lsn one. This allows to continue recovery streaming if new data appears.

arssher force-pushed the sk-peer-recovery-2 branch from 4f5d5b4 to 8f0ae23 Compare August 29, 2023 19:14

arssher changed the title ~~Safekeeper peer recovery~~ Safekeeper peer recovery prep patches Aug 29, 2023

arssher changed the title ~~Safekeeper peer recovery prep patches~~ Safekeeper peer recovery preparatory patches Aug 29, 2023

arssher enabled auto-merge (rebase) August 29, 2023 19:34

arssher merged commit 81b6578 into main Aug 29, 2023
28 checks passed

arssher deleted the sk-peer-recovery-2 branch August 29, 2023 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safekeeper peer recovery preparatory patches #5118

Safekeeper peer recovery preparatory patches #5118

arssher commented Aug 28, 2023

arssher commented Aug 28, 2023

github-actions bot commented Aug 28, 2023 •

edited

Loading

Postgres 14

knizhnik left a comment

arssher commented Aug 28, 2023

petuhovskiy left a comment

Safekeeper peer recovery preparatory patches #5118

Safekeeper peer recovery preparatory patches #5118

Conversation

arssher commented Aug 28, 2023

arssher commented Aug 28, 2023

github-actions bot commented Aug 28, 2023 • edited Loading

1624 tests run: 1550 passed, 0 failed, 74 skipped (full report)

Postgres 14

knizhnik left a comment

Choose a reason for hiding this comment

arssher commented Aug 28, 2023

petuhovskiy left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 28, 2023 •

edited

Loading