[Node Operator Question] Uneven archive node performance under load #45

jakobilobi · 2023-11-03T13:18:27Z

jakobilobi
Nov 3, 2023

Issue Description

Hello everyone,

I was referred here from the Optimism Discord after asking this question, so I'll pose it here as well:

We're running two Optimism archive nodes (legacy node + Bedrock/OP node) and have run into an elusive issue. One of our nodes is faring much worse under load than the other, becoming slower to respond and even falling behind in block height. The underlying hardware/infra is exactly the same for both nodes, and it is well above and minimum requirements listed in the docs. Both nodes are also running the same versions, op-geth v1.101304.0 and op-node v1.3.0, at the moment, though the issue started to appear before they were upgraded to the latest version.

I've tried experimenting with the op-geth settings --snapshot=... --cache=... --maxpeers=... but they don't seem to have any effect. Particularly the --snapshot setting I thought would have at least some kind of impact, from what I understand that it does (I'm just a node runner though, actual blockchain client knowledge isn't very deep). On the affected node, the --snapshot setting was true, with state snapshot generated, when the issue appeared while it on the unaffected node was false.

Does this ring a bell for anyone? Any advice on what other configurations I might try out?

Cheers,
Jakob

Node Logs

No response

ghost · 2023-11-03T14:15:47Z

ghost
Nov 3, 2023

They're archive nodes and have nothing to do with optimism actually, it is a op-geth issue.

You may try out some alternative EL nodes but notice that they are unstable, unproven to run for mission critical tasks.

https://github.com/testinprod-io/op-erigon
https://github.com/anton-rs/op-reth/tree/clabby/op-reth

4 replies

jakobilobi Nov 3, 2023
Author

They're archive nodes and have nothing to do with optimism actually, it is a geth issue.

So where would you advise to go looking for a solution, official geth repo? And is there any way I can tell if this is a geth issue or an op-geth issue?

Thanks for listing alternatives but we'll have to stick to what's proven for now.

sbvegan Nov 3, 2023
Maintainer

op-geth is very close to geth; so generally speaking, their behaviors and fixes will be the same. You can checkout out their diffs here: https://op-geth.optimism.io/

Those alternative clients are awesome, but unstable at the moment.

ghost Nov 4, 2023

Ah sorry for it, it could be the op-geth issue but if you encounter any RPC problem it will likely be inherited from upstream geth codebase

jakobilobi Nov 6, 2023
Author

Those alternative clients are awesome, but unstable at the moment.

Totally agree, we've opted to use Erigon over Geth for some of our Ethereum nodes so would totally be down to use op-erigon once it's more proven.

sbvegan · 2023-11-03T14:36:04Z

sbvegan
Nov 3, 2023
Maintainer

Hey @jakobilobi, can you provide the following:

system specs
op-node configuration
op-geth configuration

9 replies

jakobilobi Nov 6, 2023
Author

Trying out these settings, will update here if I see changes in performance.

RAM usage has not been out of the ordinary, e.g. very similar to our other, functioning, node.

sbvegan Nov 7, 2023
Maintainer

Hey @jakobilobi just checking back in, have you seen any improvements or any logs that might give us a clue to why your underperforming node is behaving that way?

sbvegan Nov 14, 2023
Maintainer

Hey @jakobilobi, just following up one more time. Anything new to report?

jakobilobi Nov 15, 2023
Author

Hey, sorry for being ignorant of my notifications. Lots going on!

After doing some more debugging and having a look at our infrastructure we're quite confident at this point that the issue didn't have anything to do with the client software. Not as the main reason at least. I'll be closing this discussion, thanks for all the help and answers!

sbvegan Nov 15, 2023
Maintainer

No worries, we're happy to help. Thanks for the follow up!

opfocus · 2023-11-06T11:29:09Z

opfocus
Nov 6, 2023

btw, how do you monitor these status data? Have you started a dedicated monitoring program (this is another topic and is essentially irrelevant to the current problem)

2 replies

jakobilobi Nov 6, 2023
Author

Irrelevant or not, happy to share :)

We export node client metrics via the supported settings (--metrics.influxdb etc.) and system metrics for the container running the client via our own setup, which uses Prometheus. The client metrics themselves don't have such low level info as disk I/O usage and similar, so that's sourced from the node monitoring.

opfocus Nov 6, 2023

Thanks for share!

jakobilobi · 2023-11-15T15:23:21Z

jakobilobi
Nov 15, 2023
Author

Found that there was a more probable external reason for the issues we saw with the node, thanks for your help though!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Node Operator Question] Uneven archive node performance under load #45

{{title}}

Replies: 4 comments 15 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Node Operator Question] Uneven archive node performance under load #45

jakobilobi Nov 3, 2023

Issue Description

Node Logs

Replies: 4 comments · 15 replies

ghost Nov 3, 2023

jakobilobi Nov 3, 2023 Author

sbvegan Nov 3, 2023 Maintainer

ghost Nov 4, 2023

jakobilobi Nov 6, 2023 Author

sbvegan Nov 3, 2023 Maintainer

jakobilobi Nov 6, 2023 Author

sbvegan Nov 7, 2023 Maintainer

sbvegan Nov 14, 2023 Maintainer

jakobilobi Nov 15, 2023 Author

sbvegan Nov 15, 2023 Maintainer

opfocus Nov 6, 2023

jakobilobi Nov 6, 2023 Author

opfocus Nov 6, 2023

jakobilobi Nov 15, 2023 Author

jakobilobi
Nov 3, 2023

Replies: 4 comments 15 replies

ghost
Nov 3, 2023

jakobilobi Nov 3, 2023
Author

sbvegan Nov 3, 2023
Maintainer

jakobilobi Nov 6, 2023
Author

sbvegan
Nov 3, 2023
Maintainer

jakobilobi Nov 6, 2023
Author

sbvegan Nov 7, 2023
Maintainer

sbvegan Nov 14, 2023
Maintainer

jakobilobi Nov 15, 2023
Author

sbvegan Nov 15, 2023
Maintainer

opfocus
Nov 6, 2023

jakobilobi Nov 6, 2023
Author

jakobilobi
Nov 15, 2023
Author