perf: Implement segmented heap #53

slowli · 2024-08-09T09:55:25Z

What ❔

Another stab to segmented heap, similar to how it's implemented for old VMs (+ recycling heap pages).

Why ❔

Leads to performance improvement compared to the current linear heap design, esp. for single-transaction execution.

Checklist

PR title corresponds to the body of PR (we generate changelog entries from PRs).
Tests for the changes have been added / updated.
Documentation comments have been added / updated.
Code has been formatted via cargo fmt and cargo clippy.

slowli · 2024-08-09T10:07:22Z

Benchmarking results are reported on Linear. TL;DR is that this approach looks consistently faster than the current one (from ~20% on single-transaction benches to ~5% on large batches), and is almost always marginally faster than the previous segmented heap implementation.

joonazan

I added some suggestions, which may not be very impactful but they should at least improve performance with no downsides.

src/heap.rs

It makes performance worse in some cases and doesn't improve it.

joonazan · 2024-08-14T15:55:16Z

I like "recycle pages via linked list to save on allocations" but the memory use benefit is limited and it uses unsafe. I would keep PagePool anyway but we could make it safe instead of optimizing allocations.

Also, I'd like to free inaccessible pages in heaps that are kept alive by a fat pointer. But we don't have a benchmark that would exercise that functionality.

joonazan · 2024-08-14T15:57:06Z

read_u256_partially would be better with start and length like in @slowli's change that never got merged. Maybe we should switch to that even though it requires coordination with core.

slowli

So, to summarize: U256s are still stored in the heap in big-endian form, right?

Maybe we should switch to that even though it requires coordination with core.

Unless I'm missing something, coordination is well-contained in this case and doesn't require multi-step process, so I'm for it.

src/heap.rs

Making it correct would be tricky because the alignment of allocation and deallocation have to match.

slowli

Nice job! Can't approve this PR since I'm the author 🙃 Feel free to approve it yourself and merge.

src/heap.rs

🤖 I have created a release *beep* *boop* --- ## [0.2.0](v0.1.0...v0.2.0) (2024-09-23) ### Features * Brush up repo for publishing ([#58](#58)) ([69ac5ed](69ac5ed)) * exposes necessary methods on heap newtype ([#43](#43)) ([9342db7](9342db7)) * implement kernel mode ([#42](#42)) ([2407d39](2407d39)) * Stable tracer interface ([#46](#46)) ([dc73bb4](dc73bb4)) * Track `storage_refunds` and `pubdata_costs` stats ([#48](#48)) ([2882a12](2882a12)) ### Bug Fixes * base being in the kernel on address, not code address ([#31](#31)) ([d9cb911](d9cb911)) * bugs in initial writes change ([#36](#36)) ([8defb4a](8defb4a)) * don't repeatedly get initial values ([#35](#35)) ([50fdbfa](50fdbfa)) * filter out initial writes of zero ([#39](#39)) ([a291c24](a291c24)) * Fix `Heap` equality comparison ([#51](#51)) ([a0cf04b](a0cf04b)) * Fix decommit opcode semantics ([2882a12](2882a12)) * Fix decommitment cost divergence ([#57](#57)) ([d385127](d385127)) * Fix decommitment logic on out-of-gas ([#56](#56)) ([2276b7b](2276b7b)) * fuzz test ([#30](#30)) ([d516967](d516967)) * fuzz.sh ([#64](#64)) ([e8e72b5](e8e72b5)) * fuzzer now makes short programs; fix crash in near call ([#52](#52)) ([985a778](985a778)) * infinite test ([#34](#34)) ([81185a5](81185a5)) * invalid instruction unsoundness ([#61](#61)) ([74577d9](74577d9)) * record history for aux heap as well ([#49](#49)) ([2877059](2877059)) * record pubdata used by precompiles ([#27](#27)) ([a7de066](a7de066)) * report correct heap sizes in ContextMeta ([#26](#26)) ([493fcec](493fcec)) * revert pubdata on failed near call, too ([#28](#28)) ([fabc553](fabc553)) * StateInterface::current_frame did not work with near calls ([#65](#65)) ([53f8f88](53f8f88)) * track transaction number in changes ([#33](#33)) ([e683ae8](e683ae8)) ### Performance Improvements * Implement segmented heap ([#53](#53)) ([d2405bc](d2405bc)) * optimize external snapshots ([#47](#47)) ([952ecd4](952ecd4)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: zksync-era-bot <[email protected]>

slowli added 4 commits August 9, 2024 10:12

Implement segmented heap

1f70295

Use Vec<Option<_>> for pages

7eb3729

Recycle heap pages

89ae522

Fix compilation for instruction test

dc155a3

slowli requested a review from montekki August 9, 2024 10:09

slowli marked this pull request as ready for review August 9, 2024 10:09

slowli requested a review from joonazan as a code owner August 9, 2024 10:09

joonazan reviewed Aug 9, 2024

View reviewed changes

src/heap.rs Outdated Show resolved Hide resolved

src/heap.rs Outdated Show resolved Hide resolved

src/heap.rs Show resolved Hide resolved

slowli and others added 6 commits August 9, 2024 14:35

Use boxed array in HeapPage

3507757

optimize happy path of read_u256

8e8811c

use iterators over memcpy

1862c8c

optimize write

083e590

remove with_capacity

b851782

It makes performance worse in some cases and doesn't improve it.

recycle pages via linked list to save on allocations

42394d2

joonazan force-pushed the aov-pla-1006-benchmark-vm-performance-optimizations-segmented-heap branch from e4d071b to 42394d2 Compare August 14, 2024 15:48

slowli commented Aug 16, 2024

View reviewed changes

src/heap.rs Outdated Show resolved Hide resolved

src/heap.rs Show resolved Hide resolved

src/heap.rs Show resolved Hide resolved

joonazan added 3 commits August 23, 2024 11:02

optimize address comparison

5c79fb0

remove unsafe recycling

e6a017a

Making it correct would be tricky because the alignment of allocation and deallocation have to match.

add comment

7bc6edb

slowli commented Aug 23, 2024

View reviewed changes

src/heap.rs Outdated Show resolved Hide resolved

joonazan added 2 commits August 23, 2024 14:33

use address_to_page_offset more

a517674

clippy fix

5db50d6

joonazan approved these changes Aug 23, 2024

View reviewed changes

joonazan enabled auto-merge (squash) August 23, 2024 12:40

joonazan merged commit d2405bc into master Aug 23, 2024
7 checks passed

joonazan deleted the aov-pla-1006-benchmark-vm-performance-optimizations-segmented-heap branch August 23, 2024 12:40

github-actions bot mentioned this pull request Sep 20, 2024

chore(master): release 0.2.0 #63

Closed

zksync-era-bot mentioned this pull request Sep 23, 2024

chore(master): release 0.2.0 #67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Implement segmented heap #53

perf: Implement segmented heap #53

slowli commented Aug 9, 2024

slowli commented Aug 9, 2024

joonazan left a comment

joonazan commented Aug 14, 2024

joonazan commented Aug 14, 2024

slowli left a comment

slowli left a comment

perf: Implement segmented heap #53

perf: Implement segmented heap #53

Conversation

slowli commented Aug 9, 2024

What ❔

Why ❔

Checklist

slowli commented Aug 9, 2024

joonazan left a comment

Choose a reason for hiding this comment

joonazan commented Aug 14, 2024

joonazan commented Aug 14, 2024

slowli left a comment

Choose a reason for hiding this comment

slowli left a comment

Choose a reason for hiding this comment