History pruning (fixes #4419) #4445

arnetheduck · 2022-12-23T12:12:27Z

Introduce (optional) pruning of historical data - a pruned node will continue to answer queries for historical data up to MIN_EPOCHS_FOR_BLOCK_REQUESTS epochs, or roughly 5 months, capping typical database usage at around 60-70gb.

To enable pruning, add --history=prune to the command line - on the first start, old data will be cleared (which may take a while) - after that, data is pruned continuously.

When pruning an existing database, the database will not shrink - instead, the freed space is recycled as the node continues to run - to free up space, perform a trusted node sync with a fresh database.

When switching on archive mode in a pruned node, history is retained from that point onwards.

History pruning is scheduled to be enabled by default in a future release.

In this PR, minimal mode from #4419 is not implemented meaning retention periods for states and blocks are always the same - depending on user demand, a future PR may implement minimal as well.

github-actions · 2022-12-23T12:17:40Z

Unit Test Results

        9 files ±0   1 050 suites +3 37m 11s ⏱️ + 6m 18s
  3 408 tests +1   3 171 ✔️ +1 237 💤 ±0 0 ❌ ±0
14 738 runs +3 14 473 ✔️ +3 265 💤 ±0 0 ❌ ±0

Results for commit 0b901bb. ± Comparison against base commit d1b799e.

♻️ This comment has been updated with latest results.

arnetheduck · 2022-12-23T12:18:27Z

draft until status-im/nim-eth#574 is merged

beacon_chain/consensus_object_pools/blockchain_dag.nim

tersec · 2023-01-06T19:28:46Z

beacon_chain/beacon_chain_db.nim

+        # rolled back automatically by the error response, then the ROLLBACK command
+        # will fail with an error, but no harm is caused by this.
+        #
+        if isInsideTransaction(db.db): # calls `sqlite3_get_autocommit`


Is this nested transaction code newly useful specifically with pruning?

yes (at least it was, at some stage of the pruning development 😄) - it's a bit of a mess, but the function is available at different abstraction levels - since it's a tool for performance (and not atomicity of operations) it kind of make sense to apply it liberally when there are multiple writes to group, so it's slightly easier to allow nesting because a "super-composed" operation like wiping lots of states will be made up of "composed" operations like wiping a state and its corresponding state root

tersec · 2023-01-06T20:04:14Z

beacon_chain/beacon_chain_db.nim

+    legacy: bool = true): bool =
+  if db.statesNoVal[fork].contains(key.data).expectDb(): return true
+
+  (legacy and fork == BeaconStateFork.Phase0 and db.v0.containsState(key))


Is support for this v0 database schema still relevant? At the moment, it could still be backfilled, since the network block availability still stretches back to genesis. Another approach might be to cut off this data and let it be recreated, while that's still feasible.

the v0 data appears in any database that's been with us since genesis and covers the initial months - there's no simple way to "recreate" it that doesn't introduce complexity or performance issues, since it's a non-trivial amount of data whose performance profile is marred by the without_rowid mistake (ie deletes in particular are slow because they cause btree rebalances that rewrite all data).

We generally have never "rewritten" any tables in the database for the simple reason that there's never really a good time to do so - recreating states for example (as in this particular example, because it is a state function) would require reindexing / replay to reach parity, which is slow.

I'm not that strongly opposed to keeping it, but it's this special case which cuts across much of beacon_chain_db, and there's a relatively finite, albeit not that short, time window where it can be quite naturally reconstructed. Doing so is a one-time cost borne in the background to homogenize the database schema.

The way forward for "keeping" the data in a pruned world I think is to rely on era files - as such, changing to the current table format is of dubious value - also, we've gone through other changes to the tables such as switching to framed snappy in bellatrix, enabling rowid:s etc - if we did a migration for each of those, we'd be looking at hour-long restart times which certainly would not be popular - or days of background processing for data that ultimately is read only rarely.

Regarding reconstruction, we actually have --reindex - it is now able to read era files and create state table entries - I'm seeing this as the way forward for "upgrading" a database to state diffs as well - part of the "pruning" series of PR: was to ensure that this works with era and can start at any point in time (as opposed to genesis-only) - this allows a user to move between a pruned and a full node with relative ease.

I'm somewhat hesitant however to require that "defaults-everything" users that have used nimbus for a long while have to do this kind of manual work to keep using nimbus, so I find it easier to maintain minimal read-only support in chaindb.

Introduce (optional) pruning of historical data - a pruned node will continue to answer queries for historical data up to `MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping typical database usage at around 60-70gb. To enable pruning, add `--history=prune` to the command line - on the first start, old data will be cleared (which may take a while) - after that, data is pruned continuously. When pruning an existing database, the database will not shrink - instead, the freed space is recycled as the node continues to run - to free up space, perform a trusted node sync with a fresh database. When switching on archive mode in a pruned node, history is retained from that point onwards. History pruning is scheduled to be enabled by default in a future release. In this PR, `minimal` mode from #4419 is not implemented meaning retention periods for states and blocks are always the same - depending on user demand, a future PR may implement `minimal` as well.

arnetheduck marked this pull request as draft December 23, 2022 12:18

arnetheduck force-pushed the prune-history branch from 15fbe9b to a7747b4 Compare January 3, 2023 15:39

arnetheduck marked this pull request as ready for review January 3, 2023 15:40

arnetheduck force-pushed the prune-history branch 2 times, most recently from ef556d4 to 98cffd2 Compare January 4, 2023 17:32

henridf mentioned this pull request Jan 5, 2023

Implement EIP-4844 #4395

Closed

21 tasks

tersec reviewed Jan 6, 2023

View reviewed changes

beacon_chain/consensus_object_pools/blockchain_dag.nim Outdated Show resolved Hide resolved

tersec reviewed Jan 6, 2023

View reviewed changes

arnetheduck added 4 commits January 6, 2023 23:06

nim 1.2 compat

4c7d6cb

bump eth

d4a7c90

newline

0b901bb

arnetheduck force-pushed the prune-history branch from 98cffd2 to 0b901bb Compare January 6, 2023 22:10

tersec approved these changes Jan 6, 2023

View reviewed changes

arnetheduck enabled auto-merge (squash) January 6, 2023 22:15

arnetheduck merged commit 0ba9fc4 into unstable Jan 7, 2023

arnetheduck deleted the prune-history branch January 7, 2023 10:02

arnetheduck mentioned this pull request Jan 18, 2023

fix: gnosis build #4503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

History pruning (fixes #4419) #4445

History pruning (fixes #4419) #4445

arnetheduck commented Dec 23, 2022

github-actions bot commented Dec 23, 2022 •

edited

Loading

arnetheduck commented Dec 23, 2022

tersec Jan 6, 2023

arnetheduck Jan 6, 2023

tersec Jan 6, 2023

arnetheduck Jan 6, 2023

tersec Jan 6, 2023

arnetheduck Jan 6, 2023

History pruning (fixes #4419) #4445

History pruning (fixes #4419) #4445

Conversation

arnetheduck commented Dec 23, 2022

github-actions bot commented Dec 23, 2022 • edited Loading

Unit Test Results

arnetheduck commented Dec 23, 2022

tersec Jan 6, 2023

Choose a reason for hiding this comment

arnetheduck Jan 6, 2023

Choose a reason for hiding this comment

tersec Jan 6, 2023

Choose a reason for hiding this comment

arnetheduck Jan 6, 2023

Choose a reason for hiding this comment

tersec Jan 6, 2023

Choose a reason for hiding this comment

arnetheduck Jan 6, 2023

Choose a reason for hiding this comment

github-actions bot commented Dec 23, 2022 •

edited

Loading