Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth: make traceChain avoid OOM on long-running tracing #23736

Merged
merged 2 commits into from
Nov 4, 2021

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Oct 14, 2021

This PR changes long-running chain tracing, so that it at some points releases the memory trie db, and switch over to a fresh disk-backed trie.

// Prefer disk if the trie db memory grows over 10Mb
s1, s2 := trieDb.Size()
if !preferDisk && (s1+s2) > 10*1024*1024 {
log.Info("Switching to prefer-disk mode for tracing", "size", s1+s2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the accumulated in memory nodes exceed the limit, shouldn't we flush them into disk and then retrieve a disk-based statedb?
Although I'm not sure if we really want to push junks. Maybe we just want to check if there is a complete state in disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm not sure if we really want to push junks. Maybe we just want to check if there is a complete state in disk?

That's basically what it does. We don't want to push junk, but we start telling the backend "hey, if you happen to find the state on disk, can you please use that instead of using this accumulated junk?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. It can be helpful in the current codebase since we periodically push the state.

@holiman
Copy link
Contributor Author

holiman commented Nov 2, 2021

Finally got around to testing this on a node..

> debug.getAccessibleState(0,"latest")
13401259
> debug.getAccessibleState(13401260,"latest")
13414706

So if tracing starts at 13401259, then it will have to chug through 13414706-13401259=13447 blocks before the memory is cleared. Should be doable.

INFO [11-02|17:40:22.208] Tracing chain segment                    start=13,401,259 end=13,537,532 current=13,414,693 transactions=2,545,920 elapsed=4h3m22.508541811s
INFO [11-02|17:40:30.543] Tracing chain segment                    start=13,401,259 end=13,537,532 current=13,414,700 transactions=2,547,143 elapsed=4h3m30.843300651s
INFO [11-02|17:40:35.864] Deep froze chain segment                 blocks=3  elapsed=36.277ms           number=13,448,639 hash=c57e2d..edd170
INFO [11-02|17:40:39.146] Tracing chain segment                    start=13,401,259 end=13,537,532 current=13,414,706 transactions=2,548,254 elapsed=4h3m39.446252969s
INFO [11-02|17:40:39.578] Found disk backend for state trie 

Yay!

INFO [11-02|17:40:41.366] Downloader queue stats                   receiptTasks=0 blockTasks=0 itemSize=69.51KiB throttle=3772
WARN [11-02|17:40:43.371] Tracing failed                           hash=dff138..a8826e block=13,414,708 err="tracing failed: nonce too high: address 0xAe45a8240147E6179ec7c9f92c5A18F9a97B3fCA, tx: 680930 state: 680929"
WARN [11-02|17:40:46.235] Chain tracing failed                     start=13,401,259 end=13,537,532 transactions=2,548,566 elapsed=4h3m46.535467752s  err="processing block 13414708 failed: could not apply tx 71 [0xdff1382a7bd005a5e54425f65e937b8d477dfd31c21a78e71a5781f873a8826e]: nonce too high: address 0xAe45a8240147E6179ec7c9f92c5A18F9a97B3fCA, tx: 680930 state: 680929"

Nooo!


Memory increase during those 13k block tracing:
Screenshot 2021-11-02 at 18-24-30 Single Geth - Grafana

@holiman
Copy link
Contributor Author

holiman commented Nov 2, 2021

Yup, now this works.
Screenshot 2021-11-02 at 22-50-35 Single Geth - Grafana
The drop there is when it finds a disk trie:

INFO [11-02|17:40:46.488] Imported new chain segment               blocks=1  txs=481  mgas=29.983  elapsed=464.753ms          mgasps=64.514  number=13,538,640 hash=f11200..e24c65 dirty=689.33MiB
INFO [11-02|17:40:46.490] Unindexed transactions                   blocks=1  txs=97   tail=13,538,541 elapsed=1.890ms
INFO [11-02|17:41:06.013] Imported new chain segment               blocks=1  txs=498  mgas=29.541  elapsed=231.347ms          mgasps=127.693 number=13,538,641 hash=489220..d81a9f dirty=690.82MiB
INFO [11-02|17:41:06.017] Unindexed transactions                   blocks=1  txs=172  tail=13,538,542 elapsed=3.511ms
...
INFO [11-02|22:47:21.454] Tracing chain segment                    start=13,401,259 end=13,538,940 current=13,414,701 transactions=2,547,632 elapsed=3h59m24.619355258s
INFO [11-02|22:47:32.152] Tracing chain segment                    start=13,401,259 end=13,538,940 current=13,414,706 transactions=2,548,254 elapsed=3h59m35.316871285s
INFO [11-02|22:47:32.155] Found disk backend for state trie 
INFO [11-02|22:47:40.320] Tracing chain segment                    start=13,401,259 end=13,538,940 current=13,414,710 transactions=2,548,816 elapsed=3h59m43.485377022s
INFO [11-02|22:47:44.500] Switching to prefer-disk mode for tracing size=11.78MiB
INFO [11-02|22:47:49.864] Tracing chain segment                    start=13,401,259 end=13,538,940 current=13,414,717 transactions=2,550,574 elapsed=3h59m53.028526974s

After the mem drop, it continued fine

Screenshot 2021-11-02 at 23-12-16 Single Geth - Grafana

@holiman
Copy link
Contributor Author

holiman commented Nov 3, 2021

Rebased, lightly squashed, fixed up. PTAL

@holiman
Copy link
Contributor Author

holiman commented Nov 3, 2021

Has been chugging along the entire night (started a few minutes after midnight ), and switches over from time to time:
Screenshot 2021-11-03 at 10-11-52 Single Geth - Grafana

@holiman holiman added this to the 1.10.12 milestone Nov 3, 2021
Copy link
Contributor

@s1na s1na left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Also tested on a node

@holiman holiman merged commit 3bbeb94 into ethereum:master Nov 4, 2021
sidhujag pushed a commit to syscoin/go-ethereum that referenced this pull request Nov 4, 2021
This PR changes long-running chain tracing, so that it at some points releases the memory trie db, and switch over to a fresh disk-backed trie.
maoueh pushed a commit to streamingfast/go-ethereum that referenced this pull request Jan 31, 2022
* eth/tracers: implement debug.intermediateRoots (ethereum#23594)

This PR implements a new debug method, which I've talked briefly about to some other client developers. It allows the caller to obtain the intermediate state roots for a block (which might be either a canon block or a 'bad' block).
Signed-off-by: wenbiao <[email protected]>

* core, rpc: disable memory output by default in traces (ethereum#23558)

* core: cmd: invert disableMemory

* core: fix missed inversion

* cmd/evm: preserve Flags but change default value

* Apply suggestions from code review

Co-authored-by: Martin Holst Swende <[email protected]>

Co-authored-by: Martin Holst Swende <[email protected]>
Signed-off-by: wenbiao <[email protected]>

* eth/tracers: abort evm execution when trace is aborted (ethereum#23580)

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: avoid unsyncronized mutations on trie database (ethereum#23632)

This PR fixes an issue in traceChain, where the statedb Commit operation was performed asynchronously with dereference-operations agains the underlying trie.Database instance. Due to how the reference counting works within the trie database (where parent count is recursively updated when new parents are added), doing dereferencing in the middle of Commit can cause the refcount to become wrong, leading to an inconsistent state. 

This was fixed by doing Commit/Deref from the same routine.  
Signed-off-by: wenbiao <[email protected]>

* core,eth: call frame tracing (ethereum#23087)

This change introduces 2 new optional methods; `enter()` and `exit()` for js tracers, and makes `step()` optiona. The two new methods are invoked when entering and exiting a call frame (but not invoked for the outermost scope, which has it's own methods). Currently these are the data fields passed to each of them:

    enter: type (opcode), from, to, input, gas, value
    exit: output, gasUsed, error

The PR also comes with a re-write of the callTracer. As a backup we keep the previous tracing script under the name `callTracerLegacy`. Behaviour of both tracers are equivalent for the most part, although there are some small differences (improvements), where the new tracer is more correct / has more information.

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: re-write of 4byte tracer using enter/exit (ethereum#23622)

* eth/tracers: add re-write of 4byte tracer using enter/exit

* eth/tracers: fix 4byte indent
Signed-off-by: wenbiao <[email protected]>

* eth/tracers: tx.BaseFee not implemented

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: do the JSON serialization via .js to capture C faults

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: fix callTracer fault handling (ethereum#23667)

* eth/tracers: fix calltracer fault handling

* eth/tracers: fix calltracer indentation
Signed-off-by: wenbiao <[email protected]>

* eth/tracers: invoke enter/exit on 0-value calls to inex accounts (ethereum#23828)

Signed-off-by: wenbiao <[email protected]>

* eth: make traceChain avoid OOM on long-running tracing (ethereum#23736)

This PR changes long-running chain tracing, so that it at some points releases the memory trie db, and switch over to a fresh disk-backed trie.
Signed-off-by: wenbiao <[email protected]>

* eth/tracers: expose contextual infos (block hash, tx hash, tx index)

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: redefine Context

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: support for golang tracers + add golang callTracer (ethereum#23708)

* eth/tracers: add basic native loader

* eth/tracers: add GetResult to tracer interface

* eth/tracers: add native call tracer

* eth/tracers: fix call tracer json result

* eth/tracers: minor fix

* eth/tracers: fix

* eth/tracers: fix benchTracer

* eth/tracers: test native call tracer

* eth/tracers: fix

* eth/tracers: rm extra make

Co-authored-by: Martin Holst Swende <[email protected]>

* eth/tracers: rm extra make

* eth/tracers: make callFrame private

* eth/tracers: clean-up and comments

* eth/tracers: add license

* eth/tracers: rework the model a bit

* eth/tracers: move tracecall tests to subpackage

* cmd/geth: load native tracers

* eth/tracers: minor fix

* eth/tracers: impl stop

* eth/tracers: add native noop tracer

* renamings

Co-authored-by: Martin Holst Swende <[email protected]>

* eth/tracers: more renamings

* eth/tracers: make jstracer non-exported, avoid cast

* eth/tracers, core/vm: rename vm.Tracer to vm.EVMLogger for clarity

* eth/tracers: minor comment fix

* eth/tracers/testing: lint nitpicks

* core,eth: cancel evm on nativecalltracer stop

* Revert "core,eth: cancel evm on nativecalltracer stop"

This reverts commit 01bb908.

* eth/tracers: linter nits

* eth/tracers: fix output on err

Co-authored-by: Martin Holst Swende <[email protected]>
Signed-off-by: wenbiao <[email protected]>

* eth/tracers: make native calltracer default (ethereum#23867)

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: package restructuring (ethereum#23857)

* eth/tracers: restructure tracer package

* core/vm/runtime: load js tracers

* eth/tracers: mv bigint js code to own file

* eth/tracers: add method docs for native tracers

* eth/tracers: minor doc fix

* core,eth: cancel evm on nativecalltracer stop

* core/vm: fix failing test

Co-authored-by: Sina Mahmoodi <[email protected]>
Signed-off-by: wenbiao <[email protected]>

* eth/tracers: ethapi.TransactionArgs was not merged

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: fix the api_test with ErrInsufficientFunds to ErrInsufficientFundsForTransfer

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: check posa before statedb.Prepare in IntermiateRoots api

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: make js calltracer default, compatible with old version

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: fix the default callTrace name of callTracerJs

Signed-off-by: wenbiao <[email protected]>

* Revert "eth/tracers: fix the default callTrace name of callTracerJs"

This reverts commit 62a3bc215d9f07e422a4c659289bb3ba4f9ed2fa.

Signed-off-by: wenbiao <[email protected]>

* Revert "eth/tracers: make js calltracer default, compatible with old version"

This reverts commit 85ef42c0ea651f0b228d4209b1b2598b24e12f1f.

Signed-off-by: wenbiao <[email protected]>

* eth/tracers: fix the variable race condition

Signed-off-by: wenbiao <[email protected]>

Co-authored-by: Martin Holst Swende <[email protected]>
Co-authored-by: Marius van der Wijden <[email protected]>
Co-authored-by: Sina Mahmoodi <[email protected]>
Co-authored-by: Péter Szilágyi <[email protected]>
Co-authored-by: Sina Mahmoodi <[email protected]>
yongjun925 pushed a commit to DODOEX/go-ethereum that referenced this pull request Dec 3, 2022
This PR changes long-running chain tracing, so that it at some points releases the memory trie db, and switch over to a fresh disk-backed trie.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants