Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Use memory map to speedup the load and untar of the snapshot archives #24798

Closed
HaoranYi opened this issue Apr 28, 2022 · 2 comments
Closed

Use memory map to speedup the load and untar of the snapshot archives #24798

HaoranYi opened this issue Apr 28, 2022 · 2 comments
Assignees

Comments

@HaoranYi
Copy link
Contributor

Problem

When reading large files, memory mapped files give much better I/O
performance. They benefit from OS virtual memory manager and avoid the copies
of data between kernel space buffer to user space buffer. Full Snapshot
achieve files are around 80G and incremental snapshot archives are around 10G.
Both of them should easily fit into the memory on the machine. Also, these
files are read sequentially during the uncompress and untar, which will
benefit greatly from the disk prefetch. By using memory map files for the
snapshot archives, we can achieve much better read performance for unpacking
the snapshot files during the start of the validator.

Proposed Solution

Use memmap2 crate to map the snapshot file to memory and read its content from
the memory map. memmap2 is the equivalent of boost::memory_mapped_file in C++.

@HaoranYi
Copy link
Contributor Author

#25259

It turns out that using memory map only shows improvement for "uncompressed" snapshot files. For compressed snapshot files, memory map doesn't gain any meaningful improvement.

@HaoranYi
Copy link
Contributor Author

HaoranYi commented May 17, 2022

image

https://gist.github.com/HaoranYi/7613f005d9f14a47772fdeeeca0b7d19

The breakdown timing for untar is as follows:

  • decompress: 16% 8% of it are spent on page fault.
  • writing file to disk: 49%
  • opening file: 6%
  • copying: 23%

memory map maybe helps on the 8% of the non page fault decompress. but it is not significant.

This run is from gce cluster. It looks like that the CPU only support avx1, 256bit. With a better cpu, that supports avx2, 512bit, we may see the improvement from the 23% copying time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant