[persist] Restore Blob state based on the contents of Consensus #20482

bkirwi · 2023-07-12T11:33:41Z

Persist state is a mix of stuff in CRDB and stuff in S3. This complicates restoring from backup: if we restore CRDB to an old state, the blobs it references might be deleted.

It's possible to "undelete" deleted blobs in S3 by deleting a delete marker from an unversioned bucket. This PR implements that logic as a CLI tool, and implements a basic test that restores a CRDB state and checks that the database is in the expected state after the CLI runs.

Motivation

See #18157 for the design.

Tips for reviewer

One subtlety here is that we can't rely on the usual state iterators, since those rely on the existence of rollups, which may not exist yet. The current draft just iterates over the diffs manually.

I have not yet tested this tool against a real environment. I will probably wait for at least a first pass of code review first!

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

src/persist/src/location.rs

src/persist-client/src/cli/admin.rs

philip-stoev · 2023-10-09T07:54:20Z

@bkirwi as discussed in Slack, we would like to add backup/restore to various other CI workloads. Can you please add me as a reviewer to this PR once it reaches the state where this can be done? Thanks!

bkirwi

@philip-stoev - I think the test is now in decent shape. I've left a few comments on the test file in case they're useful as you're thinking about extracting code.

(FWIW, I'm a bit nervous about this part: a correct restore process involves some environmental assumptions (ie. no MZs running) and a specific sequence of calls unrelated tools, which seems pretty brittle. I'm not sure if there's a good way to add guardrails to this logic that would make it harder to mess that process up, or if we'll just need to be very careful as we're adding tests...)

bkirwi · 2023-10-12T19:35:50Z

test/backup-restore/mzcompose.py

+    Service(
+        name="persistcli",
+        config={
+            # TODO: depends!


Dependencies definitely don't matter for this script, but I haven't thought about whether we'd want to make that explicit if we extract this out.

bkirwi · 2023-10-12T19:37:44Z

test/backup-restore/mzcompose.py

+        "minioadmin",
+    )
+    c.run("mc", "version", "enable", "persist/persist")
+    blob_uri = "s3://minioadmin:minioadmin@persist/persist?endpoint=http://minio:9000/&region=minio"


AFAICT this would always be worthwhile for minio-based tests, but it does involve yet another container in the mix, which the existing minio tool goes out of its way to avoid.

I am going to do this in Platform checks and other frameworks, so most minio-based tests will start using versioning .

bkirwi · 2023-10-12T19:48:18Z

test/backup-restore/mzcompose.py

+        "materialized"
+    )  # Very important that MZ is not running during the restore process!
+    c.exec(
+        "cockroach", "cockroach", "sql", "--insecure", "-e", "DROP DATABASE defaultdb;"


This immediately puts the database in an invalid state. It does not return to a valid state until the successful restore-blob call below. This is sort of a "critical section"; missing one of these steps, or having a materialize instance running while either of these things is happening, is likely to cause crashes or other breakage... but that breakage is expected.

bkirwi · 2023-10-12T20:27:09Z

Oh, and the backup/restore test seems to be "passing" but fails when uploading results... I'm not sure if that goes away once it's merged or whether we need some manual intervention. (Or if @philip-stoev & co would prefer to fold it into an existing suite instead?)

danhhz

cool, looks great!

ran out of time to completely finish the review, so I might trickle in a few more nits on the next pass, but I think this should be most of it

the python/mzcompose stuff all seems sane to me, but it's probably best to get another reviewer for that

src/persist-client/src/cli/admin.rs

src/persist-client/src/internal/metrics.rs

danhhz · 2023-10-12T22:06:19Z

src/persist/src/file.rs

+        .await?;
+
+        let temp_dir = tempfile::tempdir().map_err(Error::from)?;
+        blob_impl_test(move |path| {


let's give MemBlob the same treatment (a flag to enable tombstoneing and a test)

danhhz · 2023-10-12T22:10:19Z

src/persist-client/src/cli/admin.rs

    }
    Ok(())
 }

+/// Attempt to restore all the blobs referenced by the current state in consensus.
+/// Returns a list of blobs that were not possible to restore.
+async fn restore_blob(


would be nice to have some basic unit test coverage of this (the MemBlob comment I left will make this easier). kind just thinking: start up persist, write some data, delete blobs out from under it, restore them, and then read the data. I think that won't tickle any of the correctness issues that we'd expect in an online restore?

src/persist/src/mem.rs

src/persist-client/src/cli/admin.rs

danhhz · 2023-10-12T22:16:19Z

src/persist-client/src/cli/admin.rs

+
+            let mut shards = consensus.list_keys();
+            let mut not_restored = vec![];
+            while let Some(shard) = shards.next().await {


is it crazy to run restore_blob concurrently? feels like we might want some amount of concurrency in the tool for big envs and this might be an easy place to to do it. otoh, I'm 100% okay leaving this as a TODO

Not crazy but not trivial... I'll leave a TODO for now.

(If we do allow concurrency, that should probably be a CLI arg?)

👍 I think what I'm worried about here is that we end up in an incident and find out that the restore tool is gonna take many hours because we don't have any sort of concurrency. If we had some data that running this on a huge env still finished in O(minutes), then I definitely don't care. Fine with making it a cli arg, but I'd lean toward making the default of that arg to be some amount of concurrency, unless we have evidence that things tend to be fast enough even without it

philip-stoev · 2023-10-16T07:24:59Z

@bkirwi my plan on this is as follows:

I am going to look into the CI failure
I am going to add backup/restore to a few more testing frameworks. Depending on how straightfoward it is , I will push the required changes to this PR or a follow-up one.

src/persist-client/src/restore.rs

misc/python/materialize/mzcompose/services/minio.py

philip-stoev · 2023-10-16T11:38:22Z

I am getting the following on stdout when doing a restore:

2023-10-16T11:36:52.396497Z  WARN mz_persist_client::cli::admin: unhandled mz_persist_external_blob_sizes metric type: HISTOGRAM

is that expected?

bkirwi · 2023-10-16T14:18:34Z

I am getting the following on stdout when doing a restore [...]

That's not too concerning -- we're just dumping out a bunch of internal metrics as a convenience here. I'll see if that warning is easy to avoid, though if not I may take it as a followup.

shepherdlybot · 2023-10-16T16:49:07Z

This PR has higher risk. In addition to having a knowledgeable reviewer, it may be useful to add observability and/or a feature flag. What's This?

Risk Score	Probability	Buggy File Hotspots
🔴 80 / 100	60%	0

philip-stoev · 2023-10-16T18:51:34Z

@bkirwi I was able to get the entire stuff in Platform Checks to be backed up and then restored, but unfortunately persistcli admin --commit restore-blob seems to hang indefinitely, with all threads of the process blocked in tokio. I will continue investigating tomorrow.

danhhz

LGTM once QA is happy!

I'll see if that warning is easy to avoid, though if not I may take it as a followup.

I think just downgrade it to an info

danhhz · 2023-10-16T19:15:36Z

src/persist-client/src/cli/admin.rs

+
+            let mut shards = consensus.list_keys();
+            let mut not_restored = vec![];
+            while let Some(shard) = shards.next().await {


👍 I think what I'm worried about here is that we end up in an incident and find out that the restore tool is gonna take many hours because we don't have any sort of concurrency. If we had some data that running this on a huge env still finished in O(minutes), then I definitely don't care. Fine with making it a cli arg, but I'd lean toward making the default of that arg to be some amount of concurrency, unless we have evidence that things tend to be fast enough even without it

src/persist-client/src/cli/admin.rs

src/persist-client/src/internal/restore.rs

src/persist-client/tests/machine/restore_blob

philip-stoev · 2023-10-17T07:36:58Z

@bkirwi The platform checks test that fails can be found at:

To github.com:philip-stoev/materialize.git
 * [new branch]            undelete-platform-checks-test -> undelete-platform-checks-test

To run:

cd test/platform-checks
./mzcompose down -v ; ./mzcompose run default --scenario=BackupAndRestoreAfterManipulate

It will then hang as follows:

$ docker compose run persistcli admin --commit restore-blob --blob-uri=s3://minioadmin:minioadmin@persist/persist?endpoint=http://minio:9000/&region=minio --consensus-uri=postgres://root@cockroach:26257?options=--search_path=consensus
2023-10-17T06:39:36.391558Z  INFO tokio_postgres::connection: NOTICE: relation "consensus" already exists, skipping    
2023-10-17T06:39:36.402296Z  INFO lazy_load_credentials: aws_credential_types::cache::lazy_caching: credentials cache miss occurred; added new AWS credentials (took 32.472µs)
....

and can not ctrl-C it or SIGTERM it, SIGKILL is required to terminate persistcli.

Probably should not call this on the hot path, but it's useful in the same circumstances as `restore` is.

- do not use a mounted volume for Mc(), use a persistent container instead - do not make Mc() depend on Minio() as for some reason docker compose insists on restarting minio when mc is started, and minio fails to restart properly.

bkirwi · 2023-10-18T21:39:47Z

I spent a couple days looking to the issue Philip mentioned.

It turns out that the restore CLI is hanging during a query to CRDB. (In a fairly odd way: always the second query, without triggering connection or user timeouts, and the connection itself is obtained successfully.)

I'm at a bit of a loss as to why this is the case at the moment, but a timeout and retry seems to work around it effectively for now. I'll continue to investigate -- or write an issue, if I remain at a loss -- but for now hopefully this unblocks the PR.

@philip-stoev - The final commit is enough to get your platform check passing -- though if you don't mind, I'd rather do further CI tests as a followup PR.
@danhhz - I put the retry loop in the restore code itself, to avoid sullying the normal implementation. (Though it may eventually make sense to move it if it turns out to be a more general problem.) Let me know if you'd like to see any changes!

danhhz · 2023-10-19T14:29:12Z

Weird! Agreed that it would be good to figure this out at some point, but what you have looks fine to unblock things

bkirwi · 2023-10-19T19:00:41Z

Thanks for the reviews!

philip-stoev · 2023-10-20T11:47:48Z

The retry loop takes 5 minutes to exit so a separate ticket has been opened https://github.com/MaterializeInc/database-issues/issues/6812

bkirwi force-pushed the undelete branch from b2ef106 to b0ff6c6 Compare September 15, 2023 18:50

bkirwi force-pushed the undelete branch 2 times, most recently from 25ee5bb to 630a94e Compare October 6, 2023 16:27

bkirwi commented Oct 6, 2023

View reviewed changes

src/persist/src/location.rs Outdated Show resolved Hide resolved

bkirwi commented Oct 6, 2023

View reviewed changes

src/persist-client/src/cli/admin.rs Outdated Show resolved Hide resolved

bkirwi force-pushed the undelete branch from 46199d9 to 055f099 Compare October 12, 2023 19:18

bkirwi changed the title ~~[sketch] Restore Blob state based on the contents of Consensus~~ [persist] Restore Blob state based on the contents of Consensus Oct 12, 2023

bkirwi force-pushed the undelete branch from 055f099 to d77fdc5 Compare October 12, 2023 19:50

bkirwi commented Oct 12, 2023

View reviewed changes

bkirwi marked this pull request as ready for review October 12, 2023 20:27

bkirwi requested review from a team as code owners October 12, 2023 20:27

danhhz reviewed Oct 12, 2023

View reviewed changes

philip-stoev requested review from philip-stoev and removed request for a team October 16, 2023 07:10

philip-stoev reviewed Oct 16, 2023

View reviewed changes

src/persist-client/src/restore.rs Outdated Show resolved Hide resolved

philip-stoev reviewed Oct 16, 2023

View reviewed changes

misc/python/materialize/mzcompose/services/minio.py Outdated Show resolved Hide resolved

philip-stoev self-requested a review October 16, 2023 11:33

bkirwi force-pushed the undelete branch 2 times, most recently from e3c4767 to 2ed130d Compare October 16, 2023 17:47

bkirwi requested a review from danhhz October 16, 2023 18:44

danhhz approved these changes Oct 16, 2023

View reviewed changes

bkirwi force-pushed the undelete branch from 5a663de to b200ac1 Compare October 16, 2023 22:09

bkirwi and others added 15 commits October 18, 2023 16:36

Add a restore method to Blob

1729049

Add a list_keys method for Consensus

8bde101

Probably should not call this on the hot path, but it's useful in the same circumstances as `restore` is.

restore-blob commmand in the admin cli

7774cc2

Ignore generated mz-stash-debug image

8297965

Backup/restore test case

ac04a06

Register the new backup-restore mzcompose in CI

077aa09

Use streams instead of an unbounded Vec

8e9e2da

Extract a restore-shard function

a48d3f1

Various small PR feedbacks

d10dbd6

Move restore helper to its own package

4d1e33a

Add tombstoning to MemBlob

a60aa5f

Add a datadriven test for restore

dd0d024

Minor requested fixups

b5ff93e

mzcompose: Fix the backup-restore composition

ed1926d

- do not use a mounted volume for Mc(), use a persistent container instead - do not make Mc() depend on Minio() as for some reason docker compose insists on restarting minio when mc is started, and minio fails to restart properly.

Timeout and retry slow fetch-live-diffs calls

3ad93ee

bkirwi force-pushed the undelete branch from c019e06 to 3ad93ee Compare October 18, 2023 20:44

bkirwi merged commit 8608b5f into MaterializeInc:main Oct 19, 2023

bkirwi mentioned this pull request Oct 20, 2023

[persist] Restore script followups #22524

Merged

5 tasks

joacoc mentioned this pull request Oct 29, 2023

mz lsp server release #22763

Merged

joacoc mentioned this pull request Nov 10, 2023

LSP Server release v0.2.1 #23112

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[persist] Restore Blob state based on the contents of Consensus #20482

[persist] Restore Blob state based on the contents of Consensus #20482

bkirwi commented Jul 12, 2023 •

edited

Loading

philip-stoev commented Oct 9, 2023

bkirwi left a comment

bkirwi Oct 12, 2023

bkirwi Oct 12, 2023

philip-stoev Oct 16, 2023

bkirwi Oct 12, 2023

bkirwi commented Oct 12, 2023

danhhz left a comment

danhhz Oct 12, 2023

danhhz Oct 12, 2023

danhhz Oct 12, 2023

bkirwi Oct 13, 2023

danhhz Oct 16, 2023

philip-stoev commented Oct 16, 2023

philip-stoev commented Oct 16, 2023

bkirwi commented Oct 16, 2023

shepherdlybot bot commented Oct 16, 2023

philip-stoev commented Oct 16, 2023

danhhz left a comment

danhhz Oct 16, 2023

philip-stoev commented Oct 17, 2023

bkirwi commented Oct 18, 2023 •

edited

Loading

danhhz commented Oct 19, 2023

bkirwi commented Oct 19, 2023

philip-stoev commented Oct 20, 2023

[persist] Restore Blob state based on the contents of Consensus #20482

[persist] Restore Blob state based on the contents of Consensus #20482

Conversation

bkirwi commented Jul 12, 2023 • edited Loading

Motivation

Tips for reviewer

Checklist

philip-stoev commented Oct 9, 2023

bkirwi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkirwi commented Oct 12, 2023

danhhz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philip-stoev commented Oct 16, 2023

philip-stoev commented Oct 16, 2023

bkirwi commented Oct 16, 2023

shepherdlybot bot commented Oct 16, 2023

philip-stoev commented Oct 16, 2023

danhhz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philip-stoev commented Oct 17, 2023

bkirwi commented Oct 18, 2023 • edited Loading

danhhz commented Oct 19, 2023

bkirwi commented Oct 19, 2023

philip-stoev commented Oct 20, 2023

bkirwi commented Jul 12, 2023 •

edited

Loading

bkirwi commented Oct 18, 2023 •

edited

Loading