-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement swingStore data export/import in support of state sync #7026
Conversation
bf2eeb8
to
d34aa3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First-blush comments, will continue to review properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for take this work on!
First, a nit on the PR itself. I would have appreciated it being broken down in smaller, easier to review commits. At the very least, having the streamStore -> transcriptStore rename be an independent commit so that the modifications introduced by the export are easier to diff / observe.
The blocking changes are the following:
- The snapshot metadata is inconsistent whether it's pushed (callback) or pulled (getKVData): it contains
size
in the former, and bothcompressedSize
anduncompressedSize
in the latter. - I believe the transcript artifact does not round trip because of extraneous new lines.
- Host and local keys should not be included in the exported records.
- No completeness check is done on import to validate all required artifacts were imported.
- I believe a non-historical transcript artifact is generated for a terminated vat when arguably it should not.
I would like us to have a discussion about the following:
- I believe we should try to have artifact names have a 1:1 mapping to their content. It's already the case with snapshots, but I'd like it to be the same for transcript spans. That's why I had suggested that the transcript artifact's name include the end position.
- Whether it's useful to have the "size" of the artifacts in the recorded metadata. For transcript spans it doesn't correspond to the byte size of the exported data, so it's not useful for external consumers in the first place. Size seem to be used as a consistency check, but it's redundant with the hashing.
Regarding the SwingStoreExporter interface, I'm sorry for not bringing this up before, but while recently dealing with the consumer side of this, I realized we may need to make a few small adjustments:
- If possible use
undefined
instead ofnull
as the deletion "value" in an export entry. - Change the type of stream producing methods to
AsyncGenerator
instead ofAsyncIterable
. It doesn't impact the implementation which already uses generators, but is more explicit regarding how the return value is meant to be consumed.
And to finish, a nit on the inconsistent assertion style. I believe the latest code style if for expressionCheck || Fail`error message`
which is used in some places mixed with assert
style checks.
|
||
const endPos = getRequired(`${vatID}.t.endPosition`); | ||
kvStore.set(`${vatID}.t.startPosition`, endPos); | ||
transcriptStore.rolloverSpan(vatID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tracked the call sites, and it seems this is only used during upgrades, but not during vat termination. I've always assumed the extract-transcript-from-kerneldb.js
tool wasn't exporting terminated vats because their transcripts were fully pruned from the DB, but it seems to be that they're instead removed from the dynamicIDs
list?
If my understanding above is correct, is there a need to explicitly "close" the last span after such vat termination? At the very least, the span should no longer be "active", right? And its last transcript and snapshot should not be exported unless historical artifacts are included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vat termination should be allowed to fully delete the vat data: all transcripts, all heap snapshots, plus of course all the current-moment things like c-lists. Their vatIDs should also be removed from the dynamicIDs
list.
I think we're not fully clear on retaining/not-retaining historical spans for terminated vats (and we haven't yet added an openSwingStore()
option to retain/not-retain old data). One coherent stance would be for a "vat is terminated" transcriptStore
API to close off the current span, mark it as not-current in the same way that this rolloverSpan
API would have done (thus leaving all spans marked as not-current), and make sure that whatever DB record is used to decide that certain spans are required artifacts knows to not require one for the terminated vat.
Another coherent stance would be for this "vat is terminated" API to just delete everything immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by @mhofman's comment. rolloverSpan
is used every time there's a snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no snapshot when terminating a vat. The last span should be marked inactive on vat termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'm a little nervous about deleting everything on vat termination because it seems like the transcript leading up to a failure would be critical to diagnosing the problem that caused the failure (or is that already adequately captured by the slog?). So I guess the question is: mark the span as inactive or remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say mark the span as inactive, which makes it eligible for pruning based on (future) endpoint configuration. An archive node would keep the spans and could be used for diagnosis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-opening this conversation as I don't believe my blocking concern has been addressed: on vat termination, aka when cleanupAfterTerminatedVat
is executed, we do not seem to close the last transcript span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment thread was attached to a line number which may have drifted, but lemme see if @mhofman 's concerns are addressed now:
removeSnapshotAndTranscript
(called during vat termination, via kernelKeeper.cleanupAfterTerminatedVat) will:- delete all metadata and data for all heap snapshots
- delete all metadata and data for all transcript items
removeSnapshotAndResetTranscript
(called during vat upgrade, via vat-warehouse.jsresetWorker
) will:- delete all metadata and data for all heap snapshots (see other comment about whether this is ideal or not)
- rollover the transcript span, which means:
- retain the span hashes / metadata, always
- retain the transcript items if the SwingStore was built with
keepTranscripts = true
(the default) - delete the old transcript items if
false
@mhofman: I still want to act upon your opinon about whether upgrade should delete snapshots (in the adjacent comment thread), but for this comment thread, does the above behavior address your concerns about vat termination and "closing the last transcript span"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the transcript is now removed, the original issue can be considered addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I am still unconvinced that forcibly removing transcripts and snapshots on termination is the right approach at this time, but this is not a a blocker. I'd still prefer to drop snapshots and "close" the transcript span instead (aka rollover without setting a new current span).
I understand this is cruft we're leaving behind on vat termination, but that's something we can address later with a migration once we figure out how to deal with accumulated historical metadata in general. I don't believe we're in a position to recycle vatID
anyway, which would be the only other benefit.
To clear up my thinking about API names/etc, I finally wrote the developer-facing docs for this feature (which I should have done at the very beginning, or pushed you two to write :). I'll review the API with that writeup as a guideline, and (if we agree on the contents) I'll make a PR that adds these docs to Chip's branch. |
I concur with all of @mhofman 's points in the headline comment of #7026 (review) . A few thoughts:
I'm on the fence for this. I'm a fan of content-addressed data, but I also like having the exported artifacts names be human-readable. It would be a drag to see "b5af9e8a29de8b9e72fb0694ddb88a46.bin" in the list of artifacts and then need to manually start from the IAVL-shadowed data to figure out what it's for. Naming it "transcript-v14-d200-d600" can be easily unpacked into "a transcript span for vat 'v14' from deliveryNum 200 (inclusive) to deliveryNum 600 (exclusive)" without additional work. I'm ok with "transcript-v14-d200-d600-HASHHASH", except:
The importer is going to start with the export data (IAVL), compute both the name of the expected artifact and the hashes necessary to validate the contents, ask for the artifact by name, stash the contents somewhere while hashing, compare the computed hash against the expected one, and then either throw or continue forwards. |
I am not advocating for including the hash in the transcript artifact name, just for including the That way we effectively have content addressed artifacts, but terse and human readable. |
d34aa3e
to
879bde1
Compare
Note to self: diff from originally reviewed commit with current state: d34aa3e..879bde1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly naming and organizational nits. Wondering about what bypassHash
is.
We should discuss the async-ness of some of the exporter methods.
879bde1
to
078b5b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some general comments, I'm still reviewing
Reviewing process feedback: the force-push from 879bde1 to 078b5b7 was not a clean rebase but included changes in scope of this PR. For reference, the diff with the clean rebase: 0580b107..078b5b7 In the future, please do not mix rebase and squashing/amending. |
I think I'd like to see a test where the export records are streamed through the callback then compared to a |
291941e
to
843b6be
Compare
@FUDCo pointed out that the kernel is currently quite involved in deleting a vat's data, because the The reason I want that deletion part ahead of time is that it affects consensus. When we terminate a vat, we need to be consistent about whether the metadata keys are retained forever, or deleted (and I think they should be deleted). If we don't delete them, they must continue to be included in the exportData, which means they'll continue to be included in state-sync snapshots, and somehow the importer will need to know to not expect the backing artifacts (because those certainly should be deleted). |
Today's meeting with @FUDCo led to:
|
This should make it clear, if it wasn't already, that people going in and futzing with the database via the sqlite3 command line utility is dangerous. |
843b6be
to
a297102
Compare
Latest revision:
Fingers crossed we are finally final, CI willing and the river don't rise. |
Datadog ReportBranch report: ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew! Small changes to make:
- adding the ORDER BY clauses is important
- maybe retain the old snapshots during vat upgrade, let's see what @mhofman thinks
|
||
const endPos = getRequired(`${vatID}.t.endPosition`); | ||
kvStore.set(`${vatID}.t.startPosition`, endPos); | ||
transcriptStore.rolloverSpan(vatID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment thread was attached to a line number which may have drifted, but lemme see if @mhofman 's concerns are addressed now:
removeSnapshotAndTranscript
(called during vat termination, via kernelKeeper.cleanupAfterTerminatedVat) will:- delete all metadata and data for all heap snapshots
- delete all metadata and data for all transcript items
removeSnapshotAndResetTranscript
(called during vat upgrade, via vat-warehouse.jsresetWorker
) will:- delete all metadata and data for all heap snapshots (see other comment about whether this is ideal or not)
- rollover the transcript span, which means:
- retain the span hashes / metadata, always
- retain the transcript items if the SwingStore was built with
keepTranscripts = true
(the default) - delete the old transcript items if
false
@mhofman: I still want to act upon your opinon about whether upgrade should delete snapshots (in the adjacent comment thread), but for this comment thread, does the above behavior address your concerns about vat termination and "closing the last transcript span"?
c7448e2
to
45df6b2
Compare
OK, then, how about now? |
Looks good to me! @mhofman what say you? |
I started integrating with my branch to see if all is peachy. Will finish that and the review in the morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I'm assuming you might squash/rewrite this branch before landing, but if not, the "trip over your shoelaces" commit comment doesn't really explain why the ensureTxn
aren't necessary, or what was wrong with the previous DB-iteration-overlaps-with-DB-mutations approach, or why an IMMEDIATE txn is bad for export, all of which would be useful if someone in the future sees the individual commits.
e6689d2
to
f7f3347
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're almost there! Mostly some nits, but I'd like to see addressed the consistency issues, in particular when deleting a vat (the extra noteExport for an "historical" artifact for the latest snapshot/span). I would love if adding SQL constraint checks on the snapStore and transcriptStore tables is an easy change that would enforce consistency of the data.
|
||
const endPos = getRequired(`${vatID}.t.endPosition`); | ||
kvStore.set(`${vatID}.t.startPosition`, endPos); | ||
transcriptStore.rolloverSpan(vatID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I am still unconvinced that forcibly removing transcripts and snapshots on termination is the right approach at this time, but this is not a a blocker. I'd still prefer to drop snapshots and "close" the transcript span instead (aka rollover without setting a new current span).
I understand this is cruft we're leaving behind on vat termination, but that's something we can address later with a migration once we figure out how to deal with accumulated historical metadata in general. I don't believe we're in a position to recycle vatID
anyway, which would be the only other benefit.
f7f3347
to
cb0f1bf
Compare
Datadog ReportBranch report: ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved on the basis of the transcript export data correctness issue being fixed for the vat termination case.
As mentioned separately I have verified that this PR works against the state-sync branch I've been working on in parallel!
cb0f1bf
to
268e62f
Compare
Closes #6773