-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delete historical transcript spans (unless config switch says to retain) #9174
Comments
A few more ideas:
|
I only skimmed through for now, but wanted to capture a couple thoughts:
|
Agreed, although one benefit of storing the old spans in a (separate) SQLite DB is that commit means commit: SQLite ensures the data will be properly flushed to disk, and an ill-timed power failure won't threaten it. If we use discrete files, we ought to do our own |
One note, the The tricky part is that it uses a single SQL statement, As part of #8928 I'm adding a delete-a-little-at-a-time API to the swing-store, but it's aimed at vat deletion: there's not an obvious way to incorporate it into One option is to change Then we'd need to decide what to do about A deeper fix would be to change swingstore to have a But the population status of transcript items is not part of consensus, partially to allow different validators to make different space-vs-replay-hassle decisions. To rate-limit the deletion of items for terminated vats, I'm having the kernel delete a budget-limited number of spans each block, and then the swingstore deletes both those span records (which will always already be present) and their transcript items (which may or may not be populated). Perhaps the way to go is for |
@mhofman and I decided:
Nodes which never change their Nodes which transition from one mode to another (existing nodes that change their Note that state-sync prune will get easier in more recent cosmos-sdk versions (maybe 0.47??), which introduces the ability to state-sync export to a local directory, and to import from the same, instead of only using the P2P network protocol (and thus depending upon some other node to publish their snapshot). |
We also sketched out the rest of the tools that we can build later to support the creation/consumption of historical spans:
The consumers of this data are going to be validators / RPC nodes / followers who have seen a forum post that says we'll be doing a whole-incarnation replay of vatID
|
Ref #9174 Fixes #9387 Fixes #9386 TODO: - [ ] #9389 ## Description Adds consensus-independent `vat-snapshot-retention` ("debug" vs. "operational") and `vat-transcript-retention` ("archival" vs. "operational" vs. "default") cosmos-sdk swingset configuration (values chosen to correspond with [`artifactMode`](https://github.com/Agoric/agoric-sdk/blob/master/packages/swing-store/docs/data-export.md#optional--historical-data)) for propagation in AG_COSMOS_INIT. The former defaults to "operational" and the latter defaults to "default", which infers a value from cosmos-sdk `pruning` to allow simple configuration of archiving nodes. It also updates the semantics of TranscriptStore `keepTranscripts: false` configuration to remove items from only the previously-current span rather than from all previous spans when rolling over (to avoid expensive database churn). Removal of older items can be accomplished by reloading from an export that does not include them. ### Security Considerations I don't think this changes any relevant security posture. ### Scaling Considerations This will reduce the SQLite disk usage for any node that is not explicitly configured to retain snapshots and/or transcripts. The latter in particular is expected to have significant benefits for mainnet (as noted in #9174, about 116 GB ÷ 147 GB ≈ 79% of the database on 2024-03-29 was vat transcript items). ### Documentation Considerations The new fields are documented in our default TOML template, and captured in a JSDoc type on the JavaScript side. ### Testing Considerations This PR extends coverage TranscriptStore to include `keepTranscripts` true vs. false, but I don't see a good way to cover Go→JS propagation other than manually (which I have done). It should be possible to add testing for the use and validation of `resolvedConfig` in AG_COSMOS_INIT handling, but IMO that is best saved for after completion of split-brain (to avoid issues with same-process Go–JS entanglement). ### Upgrade Considerations This is all kernel code that can be used at any node restart (i.e., because the configuration is consensus-independent, it doesn't even need to wait for a chain software upgrade). But we should mention the new cosmos-sdk configuration in release notes, because it won't be added to existing app.toml files already in use.
Transforming issue into an Epic with the following list of issues comprising it. Historical description and comments below.
1. Switch to keep only operational data for regular nodes
2. Prune old data
agd snapshots export
#91003. Store historical items as compressed files
What is the Problem Being Solved?
Our mainnet chain state is growing pretty fast, and we'd like to make it smaller.
The state is stored in two places: cosmos databases (
~/.agoric/data/
in LevelDBs likeapplication.db
andblockstore.db
), and the Agoric-specific "Swing-Store" (~/.agoric/data/agoric/swingstore.sqlite
). This ticket is focused on the swing-store. The largest component of the swing-store, in bytes, are the historical transcript spans, because they contain information about every delivery made to every vat since the beginning of the chain. These records, plus their SQL overhead, is an order of magnitude larger than anything else in the swing-store, and comprise about 97% of the total space.As of today (29-Mar-2024), the fully-VACUUM'ed SQLite DB is 147 GB, growing at about 1.1 GB/day. There are 38M transcript items, whose total size (
sum(length(item))
) is 116 GB.For normal chain operations, we only need the "current transcript span" for each vat: enough history to replay the deliveries since the last heap snapshot. That is enough to bring a worker online, to the same state it was at at the last commit point. Our current
snapshotInterval = 200
configuration means there will never be more than 200 deliveries in the current span (one span per vat), so they will be fairly small. The total size of all six thousand -ish current transcript items is a paltry 16 MB. A pruned version of today's swingstore DB would be about 4.6 GB in size.However, when we first launched the chain, we were concerned that we might find ourselves needing to replay the entire incarnation, from the very beginning, perhaps as a last-ditch way to upgrade XS. We decided to have cosmic-swingset configure the swing-store to retain all transcript spans, not just the current one.
I think it's no longer feasible to retain that much data. We carefully designed the swing-store to keep hashes of all historical spans, even if we delete the span data itself, so we retain the ability to safely re-install the historical data (i.e. with integrity, not relying upon the data provider for correctness, merely availability). So in a pinch, we could find a way to distribute the dataset to all validators and build a tool to restore their databases (perhaps one vat at a time).
Description of the Design
We need a user-accessible switch to control whether a node retains historical transcripts or not. The swingstore constructor call takes an option to control this, but it isn't plumbed into e.g.
app.toml
.Then, we need to configure two or more archive nodes to retain their historical transcripts, so we'll have data to recover from should we ever need it. All existing nodes have that data (and it is currently included in state-sync snapshots), so mainly we need to have at least two reliable existing nodes not change their configuration to prune the old spans.
Then, really, we should build and test some tooling to:
Then, either we change the default setting to prune the historical transcripts, or we tell all validators that they can save 90% of their disk space by changing the setting and let them make the decision.
This is closely related to what artifacts we put in state-sync snapshots. Currently we put all historical artifacts in those snapshots, and require them to all be present when restoring from a snapshot. We would need to change both sides: omit the historical spans during export, and stop requiring them during import (each of which is probably a one-line change to the swingstore API options bag). As a side-effect, state-sync snapshots would become a lot smaller, and would take less time to export and to import.
Security Considerations
The hashes we use on the transcript spans mean there are no integrity considerations. However, this represents a significant availability change.
We don't know that we'll ever need to replace-since-incarnation-start, and we don't know that we could afford to do so anyways:
We think, but have not implemented or tested, that we can restore this data later, given the (incomplete) plan above. We don't know how hard it will be to implement that plan, or to practically deliver the replacement artifacts. How large will they be? Will we need to deliver all of them, or just for a few vats? How can anyone get sufficiently up-to-date? We might have a situation where the chain halts for upgrade, and all validators must fetch the last few spans from an archive node before they can restart their nodes, introducing extra delays into an upgrade process that's already complicated (if we're resorting to such a big replay).
But we do know that the space this consumes is considerable, and growing fast. I'm really starting to think that we can't afford to have all nodes keep all that data anymore, and to hope/rely-upon the work we've done being sufficient to restore the data in the (unlikely?) event that we ever need it.
cc @mhofman @ivanlei for decisions
Scaling Considerations
Once deployed, this will remove about 1.0 GB per day from the disk-space growth of a mainnet validator (more, if transaction volume increases, e.g. because new price-feed oracles are deployed). If/when a validator does a "state-sync prune" (where they restore from a state-sync snapshot), they'll get a one-time reduction of 152 GB from their disk usage. The resulting swingstore should be about 4.6 GB, and will grow at about 34 MB per day. (The cosmos DB will be unaffected, and is currently about 182 GB, depending upon pruning and snapshot settings).
Test Plan
I believe @mhofman 's integration tests will exercise the state-sync export and import parts. I think we should have manual tests that an archive node will retain the data we care about.
Upgrade Considerations
We must decide whether the prune-historical-spans behavior is the default for all nodes (and have archive nodes configure themselves to retain those spans), or if retain-historical-spans is the default (and make sure validators know how to prune if desired). If we choose the former, then upgrade automatically makes things somewhat better (reduces the growth rate). A state-sync refresh/reload/pruning would still be necessary to shed the bulk of the data.
The text was updated successfully, but these errors were encountered: