Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf/HalfPath state db key #6331

Merged
merged 227 commits into from
Mar 28, 2024
Merged

Perf/HalfPath state db key #6331

merged 227 commits into from
Mar 28, 2024

Conversation

asdacap
Copy link
Contributor

@asdacap asdacap commented Dec 6, 2023

Prefix state db entry with part of path

  • Includes Perf/Live Pruning #6439, Allow configuring last state to be kept #6809, Fix/long running iterator #6840
  • Inspired by one of the guy I interviewed recently, what if we keep the whole hash db/pruning, but improve its data locality by ordering the keys similar to flat db layout.
  • Well, you could, by prefixing the hash with part of the path (you don't need all the path).
  • This is the default for fresh state db. If old hash key is needed, --Init.StateDbKeyScheme Hash can be specified. An explicit --Init.StateDbKeyScheme HalfPath is needed for full pruning to migrate to the new key layout.
  • Key structure look like this. For state (total 42 byte)
+--------------+------------------+------------------+--------------+
| section byte | 8 byte from path | path length byte | 32 byte hash |
+--------------+------------------+------------------+--------------+
  • For storage (total 74 byte)
+--------------+---------------------+------------------+------------------+--------------+
| section byte | 32 byte from address | 8 byte from path | path length byte | 32 byte hash |
+--------------+---------------------+------------------+------------------+--------------+
  • The section byte is:
    • 0 if state and path length is <= 5.
    • 1 if state and path length is > 5.
    • 2 if storage.

The keys are separated with section byte due to the different characteristics of these nodes. The idea being that top level
node can be up to 5 times bigger than lower node, and grew a lot due to pruning. So mixing them makes lower
node sparser and have poorer cache hit, and make traversing leaves for snap serving slower.

Performance

  • As it now need to keep track of the path, some extra compute and allocation is needed.
  • However, the improved IO efficiency reduces the overall block processing time by up to 50%.
  • In memory pruning now need to keep track of the path. This add extra memory use and cause previously deduplicated node by hash, to no longer get deduplicated. This seems to add about 25% per-node memory usage.
  • Because of this, when it detect that current state is using hash keys, and HalfPath layout is not explicitly mentioned, in memory pruning will only keep track of the hash causing it to behave as before.
  • In practice, it does not really make much of a difference in block processing time.
  • Graph is (master, after with existing hash db, after with --Init.StateDbKeyScheme HalfPath to also track path, after with HalfPath)x4. The last 2 set have 8GB memory hint.
    Screenshot_2023-12-23_17-00-52
  • If you squint really hard, the CPU time seems to reduce and flatten a little bit.

Sync performance

  • Reduced compaction. Bytes write per node reduced from arount 1KB to 0.5KB.
  • Reduce disk space from 196G to 168G,
  • Sync time tend to reduce a little but inconsistent.
  • Memory allocation per node increased by 25%.
  • Graph is master, (after but with hash), after
    Screenshot_2023-12-22_16-33-20

Migration.

  • Existing user can run with --Init.StateDbKeyScheme HalfPath and then run full pruning to migrate.
  • This is not enabled by default as in memory pruning will need to already keep track of the path at start. That said, there aren't that much of an overhead (in term of block processing) if this is enabled by default.
  • This also cause the database copied to track path which significantly increase memory requirement for batched trie visitor's job pool. Without increasing the memory budget, iops requirement increase significantly. Because of this, migration via full pruning is not recommended.

Database growth

  • Note: See Perf/Live Pruning #6439
  • Because of the need to track path, on the same memory budget, it will persist more nodes.
  • However, it seems that due to improved compression ratio, the size of the databse grows slower.
  • In this test, starting block is 18310000 from 9 Oct, and it was stopped on block 18400000 (21 Oct).`
  Persisted Nodes (Million) Size before (GB) Size after (GB) Increase (GB) Per node increase (byte) Increase (%)
Hash 193 173.3 205.6 32.3 167.4 18.64%
HalfPath 271 143.7 157.8 14.1 52.0 9.81%
  • When in memory pruning is disable, similar reducing in database increase is also seen:
  Persisted Nodes (Million) Size before (GB) Size after (GB) Increase (GB) Per node increase (byte) Increase (%)
Hash 1270 173.3 284.9 111.6 87.9 64.40%
HalfPath 1240 143.7 199.9 56.2 45.3 39.11%

Full pruning

  • Uses far less IO and memory.
  • Unthrottled troughput remains about the same.
  • Default setting (throttled) throughput increased by about 50%.
  • Graph is before, after, before, after
    Screenshot_2023-12-27_14-35-52

Changes

  • Includes Fix double write during full pruning #6415 (needed for full pruning to migrate key)
  • Includes Replace caching store with row cache #6405 (make code slightly simpler)
  • Created a class TreePath for representing path. A ref TreePath parameters is added everywhere in PatriciaTree and TrieNode. On traversal to child, the tree path is mutated to add necessary key to become the correct path for its child and it is mutated back to its parent after returning from traversing its child. Its messy, but it significantly reduce memory allocation overhead.
  • For the storage, the PatriciaTree and friends, accept a IScopedTreeStore which is a subinterface of INodeResolver. The implementation, ScopedTreeStore keep the hash of the storage, which is then proxied to ITreeStore that we've known and love.
  • ITreeStore has been modified to also accept the storage and path in addition to the standard hash.
  • TreeStore has been modified to be keyed by Hash256?, TreePath, Keccak tuple instead of just Keccak.
  • An abstraction INodeStorage replaces IKeyValueStore as the storage layer of the nodes, accepting the storage root and the path also. This is the class the determine the key and allow fallback read as well as configuring between hash and halfpath.
  • Sync code has been modified to interact with INodeStorage instead of IKeyValueStore.

Types of changes

What types of changes does your code introduce?

  • Optimization
  • Refactoring

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

  • Seems to work.

Documentation

Requires documentation update

  • Maybe?

Requires explanation in Release Notes

  • Yes

New state db key structure improve block processing time by up to 50%. Existing users are unaffected.

@kamilchodola
Copy link
Contributor

@asdacap Does this change requires resync of the DB?

@asdacap
Copy link
Contributor Author

asdacap commented Dec 6, 2023

It require resync to take effect. But we can add a fallback read, so it can be made to work without resync. However, we are effectively doing two read at each node which is not great. I guess the first read would get bloom filtered, but still not great.

@asdacap asdacap force-pushed the perf/somepath-hashdb branch 4 times, most recently from 1aca153 to 9ff1fbc Compare December 11, 2023 12:02
@asdacap asdacap force-pushed the perf/somepath-hashdb branch 3 times, most recently from af2cee0 to 5457d76 Compare December 18, 2023 07:26
@asdacap asdacap mentioned this pull request Dec 18, 2023
5 tasks
@asdacap asdacap force-pushed the perf/somepath-hashdb branch 4 times, most recently from 18ffa26 to e769795 Compare December 21, 2023 18:19
@asdacap asdacap marked this pull request as ready for review December 23, 2023 09:21
@asdacap asdacap requested a review from rubo as a code owner December 23, 2023 09:21
@asdacap asdacap changed the title Perf/somepath hashdb Perf/HalfPath state db key Dec 23, 2023
@asdacap asdacap marked this pull request as draft December 23, 2023 11:46
@asdacap asdacap force-pushed the perf/somepath-hashdb branch 4 times, most recently from 2fdb41c to f183e2b Compare December 24, 2023 12:00
@asdacap asdacap marked this pull request as ready for review December 24, 2023 12:51
@@ -237,26 +236,8 @@ void AddAgainAllItems()
NodeDataType nodeDataType = currentStateSyncItem.NodeDataType;
if (nodeDataType == NodeDataType.Code)
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complication was removed. State item always need path now.

Copy link
Member

@benaadams benaadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have worked with code base and made changes, so would put more weight on other reviewers

src/Nethermind/Nethermind.Trie/PatriciaTree.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Trie/TrieNode.cs Outdated Show resolved Hide resolved
@Scooletz Scooletz self-requested a review March 21, 2024 16:25
Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing a lot of Hash256, would it be better to move to ValueHash256 in those places?

Most my comments are just nitpicks.

src/Nethermind/Nethermind.Core/IKeyValueStore.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Db/MemDb.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Trie/TrieNode.cs Show resolved Hide resolved
src/Nethermind/Nethermind.Trie/TrieNode.cs Show resolved Hide resolved
src/Nethermind/Nethermind.Trie/Utils/WriteBatcher.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Trie/Utils/WriteBatcher.cs Outdated Show resolved Hide resolved
@asdacap
Copy link
Contributor Author

asdacap commented Mar 25, 2024

So, two place where Hash256 is kept:

  • TrieNode's Keccak. -> Not gonna touch that here.
  • The storage address -> Storage address is likely shared between multiple nodes so it make sense to use a reference type so that each node only use 8 byte.

Copy link
Contributor

@Scooletz Scooletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments, for better understanding of the decisions taken here.

src/Nethermind/Nethermind.Core/IKeyValueStore.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Trie/Utils/WriteBatcher.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated Show resolved Hide resolved
ReadFlags flags = visitor.ExtraReadFlag;
if (visitor.IsFullDbScan)
{
if (TrieStore.Scheme == INodeStorage.KeyScheme.HalfPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this code elsewhere. It's dependent on the TrieStore.Schemeso maybe TrieStore could provideFullDbScanFlags`?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. NodeStorage already change the readahead flag already.

Copy link
Contributor

@Scooletz Scooletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor and some comments.

@asdacap asdacap merged commit 3b70f9d into master Mar 28, 2024
68 checks passed
@asdacap asdacap deleted the perf/somepath-hashdb branch March 28, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants