-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support state-sync #7225
feat: Support state-sync #7225
Conversation
I created this PR as draft since I'm still working on tests, but please start reviewing as the implementation is fully ready. I will not rebase this PR until approved. |
Datadog ReportBranch report: ❌ ❌ Failed Tests (2)
|
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #7225 +/- ##
==========================================
- Coverage 71.01% 70.61% -0.40%
==========================================
Files 450 461 +11
Lines 86398 89298 +2900
Branches 3 3
==========================================
+ Hits 61358 63062 +1704
- Misses 24974 26169 +1195
- Partials 66 67 +1
|
@michaelfig I'd like you to be the primary reviewer |
Gladly! Can you please rebase your |
The fixups are pretty minimal, but if you prefer, I'll do a clean rebase. FYI @warner @JimLarson |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a first pass. Looking good so far, but I'd like a few random things addressed, and will wait for other reviewers before approving.
import fsPromisesPower from 'fs/promises'; | ||
import pathPower from 'path'; | ||
|
||
import { Fail } from '@agoric/assert'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import { Fail } from '@agoric/assert'; | |
const { Fail } = assert; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the recommend style? Why not use the import form? I'm personally not a major fan of relying on globals that are not part of the JS standard library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recommended style has vacillated over time. globalThis.assert
is supplied by the SES shim, in the same way that globalThis.harden
is. @agoric/assert
is just a thin wrapper around that global.
Personally, I think the module import is more legible, but the direct manipulation of the global costs less overhead. @erights may have more of an opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this issue isn't decided:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we never did settle this. The fact that @agoric/assert
, as indicated by the @agoric/
part, is in agoric-sdk only, and so not available in the endo repository, is for me the main reason why I continue to prefer the assignment form. It works everywhere we use Hardened JS.
If this package were migrated to be @endo/assert
such that everything above SES could use it, then the points @turadg makes in that thread on the advantages of the import style would, for me, rise to the remaining important difference, causing me to switch to the import style.
No need to on my account. I was able to see what I needed to do a review. |
e68df6d
to
b5745e2
Compare
I just force pushed the branch to remove the 5 commits related to debug/testing features that were sitting atop the initial PR, but below the initial fixups. Here is what was removed (and were never planned to be merged anyway): e68df6d..b5745e2 |
I think I've addressed all actionable review feedback so far, or left comments where I didn't make any changes. FYI, given the extent of fixup commits, I'm maintaining a parallel branch where all those are squashed into their respective target: 5542-export-state-sync-fixedup PTAL @michaelfig |
import fsPromisesPower from 'fs/promises'; | ||
import pathPower from 'path'; | ||
|
||
import { Fail } from '@agoric/assert'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recommended style has vacillated over time. globalThis.assert
is supplied by the SES shim, in the same way that globalThis.harden
is. @agoric/assert
is just a thin wrapper around that global.
Personally, I think the module import is more legible, but the direct manipulation of the global costs less overhead. @erights may have more of an opinion.
I'm pretty happy about where this is going, and I'm fine with you autosquashing before you mark the PR ready for review. I would encourage @JimLarson and @gibson042 not to begin review until the PR is out of draft. |
Will extract a few tangential commits out in a separate PR, and re-request review once ready. |
c692911
to
a2587ba
Compare
@gibson042 @JimLarson I've just rebased this on master after extracting a couple housekeeping changes in separate PRs. While it's still missing the unit tests for the state-sync logic, I believe this is ready for review at this point, and will take out of draft to make it explicit. |
Reminder, please review commit-by-commit! |
The activityhash is now part of the swingstore export
This reverts commit fedb533.
3c3a9d3
to
283fded
Compare
@Mergifyio requeue |
☑️ This pull request is already queued |
@Mergifyio refresh |
✅ Pull request refreshed |
Review commit-by-commit
closes: #5542
closes: #3769
closes: #6563
closes: #5934
Description
This PR builds on #7115, #7026 and agoric-labs/cosmos-sdk#297 to implement state-sync in the cosmos and cosmic-swingset layers.
It takes over the cosmos state-sync process which is normally started in a go-routine without synchronization. Because the swingstore DB is not capable of doing a read at arbitrary historical savepoints, we need to introduce a synchronization with block commits.
When we reach a snapshot height, we first initiate a swingstore export before proceeding with the normal cosmos export process. We also add a synchronization step to stall the subsequent block commit until the swingstore export has been reported as having started. Once the swingstore export is in progress we can rely on its read transaction to proceed processing blocks, even if the export process spans multiple blocks.
On the cosmic-swingset side, we add a tool that leverages the new swingstore export logic to save the exported artifacts to disk. We also use the streaming export of swingstore data during block execution to save the main export data in vstorage under a new path. Conversely on the import side, we add a tool that reads a disk based export to produce an exporter that complies with the new swingstore import logic. These tools are usable independent of the state-sync logic. For state-sync they are driven by "blocking messages" sent by the golang side.
On the cosmos side, we implement the
ExtensionSnapshotter
interface. During snapshot taking, the extension running in the snapshot goroutine will request the path to the exported data from javascript and block until it is available. Then it will create a proto message for each artifact present in the exported data. On the restore side, we recreate a disk-based swingstore export by exporting the swingstore "export data" from vstorage (which was previously restored by state-sync), and the artifacts from the extension state-sync payloads. We then send a message to JavaScript to import this data into the expected swingstore location.Security Considerations
The artifact verification on restore will be performed by the swingstore import logic. The "export data" is the root of trust and must be from a trusted source. Since it is extracted from the verified cosmos DB, it is considered trusted.
Scaling Considerations
The export process is performed as much as possible in parallel of normal execution. The artifact export is spawned in a sub process, and the streaming export data is buffered and sent to golang after the kernel has finished processing it current I/O, which should put it at idle time while it awaits for a syscall or delivery result.
Documentation Considerations
Nothing special besides normal cosmos state-sync instructions.
Testing Considerations
Tested manually extensively, including introducing synthetic delays in the snapshot process to test synchronization, or edge conditions such as overlapping snapshots.
Updated the deployment integration test to have the validators produce state-sync artifacts and have the loadgen follower start from a state-sync snapshot.
We should increase coverage of the edge cases through unit tests. See #7349