Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: more compact exported snapshot #703

Merged
merged 6 commits into from
Apr 6, 2023

Conversation

yihuang
Copy link
Collaborator

@yihuang yihuang commented Mar 13, 2023

Closes: #690 #688

  • Purely additive changes for iavl library, add new APIs to support a more compact snapshot format.
  • Still use the existing ExportNode type as the exchange format with sdk, sdk can still use the protobuf encoding as before, which skips empty fields, which works well with the new optimizations.
  • Leave the issue of format negotiation and migration to sdk.

Where to put the code

With the approach taken in this PR, it's totally fine to move the logic to the sdk side as well, if it makes more sense.

Test Result

Dumped from cronos mainnet data, evm module, version 2000000:

Snapshot Type Size
Normal snapshot 469.4M
Compact snapshot 402M
memiavl snapshot 2.3G
  • The new format further reduce the zlib compressed size by 14%, not as big as expected, I guess mainly because zlib compression already done a good job.
  • Both normal/compact snapshots are dumped with sdk's snapshot manager, with and without the compact format introduced in this PR, sum the size of all the chunks.
  • memiavl snapshot is just put for reference, it's a different snapshot format that contains all the node hashes, with zero compression applied.

EDIT:

Just to confirm my suspicion, I recreate the snapshot with zlib compression disabled, it goes like this:

Snapshot Type Size
Normal snapshot 1.82G
Compact snapshot 856M

The compression rate is 54%.

CHANGELOG.md Outdated Show resolved Hide resolved
@yihuang yihuang changed the title more compact exported snapshot feat: more compact exported snapshot Mar 13, 2023
Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets leave this here for now, in store v2 this could be moved higher, into the sdk

n.Key = nil

// delta encode the version
maxVersion := maxInt64(e.versionStack[len(e.versionStack)-1], e.versionStack[len(e.versionStack)-2])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious about the efficiency of version delta encode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihuang
Copy link
Collaborator Author

yihuang commented Mar 17, 2023

just curious about the efficiency of version delta encode?

Just did some more detailed comparisons, in the "without zlib compression" table, version delta only contributes 2% more compression rate, but those contributions are carried over to the with-zlib scenario, where it has more significant impact, I guess it's a part that is not handled well by zlib compression:

With Zlib Compression(level 7)

Size (bytes) Compression Rate Time
Original 492170475 - 54s
Skip branch key 455839686 7% 52s
Skip branch key + key delta 444661628 10% 50s
Skip branch key + key delta + version delta 421542633 14% 43s

Without Zlib Compression

Size (bytes) Compression Rate Time
Original 1955535429 - 8.3s
Skip branch key 1308448614 33% 7.8s
Skip branch key + key delta 936722517 52% 8.8s
Skip branch key + key delta + version delta 897597115 54% 8.4s
  • Compression Rate is computed as: (original - new) / original.
  • Time is the elapsed wall time of the whole cmd recorded by time in shell, the cmd reads an existing memiavl snapshot and write the state sync snapshot.
  • It looks strange that in some cases the run time decrease while the compression rate increase, but it indeed reproducible with a few repeated runs here.

Zlib Level 3

I did another round with zlib level 3 compression, it seems provide better trade-off in general, lose some marginal compression rate but being much faster.
If you compare new compact format plus zlib level 3 to the original format with zlib level 7, it reduce 11% size while being much faster.
But of course the bottleneck of snapshot exporting right now is probably on the iavl tree traversal in db, so the compress speed improvements probably matters less in real world.

Size (bytes) Compression Rate Time
Original 508743095 - 23s
Skip branch key 474433915 6.7% 22s
Skip branch key + key delta 461333965 9.3% 22s
Skip branch key + key delta + version delta 436761440 14% 21s

@cool-develope
Copy link
Collaborator

txs for detail report

@tac0turtle tac0turtle merged commit b544dc0 into cosmos:master Apr 6, 2023
@yihuang yihuang deleted the compress-snapshot branch April 6, 2023 14:23
@tac0turtle
Copy link
Member

@Mergifyio backport release/v1.x.x

@mergify
Copy link
Contributor

mergify bot commented Apr 13, 2023

backport release/v1.x.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Apr 13, 2023
Co-authored-by: cool-developer <[email protected]>
Co-authored-by: Marko <[email protected]>
(cherry picked from commit b544dc0)

# Conflicts:
#	CHANGELOG.md
tac0turtle added a commit that referenced this pull request Apr 13, 2023
@yihuang yihuang mentioned this pull request Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

export: apply delta encoding for better compression rate
4 participants