Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Massive increase in CPU usage with 1.11.1 #8696

Closed
folsen opened this issue May 23, 2018 · 11 comments
Closed

Massive increase in CPU usage with 1.11.1 #8696

folsen opened this issue May 23, 2018 · 11 comments
Assignees
Labels
F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible.
Milestone

Comments

@folsen
Copy link
Contributor

folsen commented May 23, 2018

  • Which Parity version?: 1.11.1
  • Which operating system?: Linux
  • How installed?: via binaries
  • Are you fully synchronized?: yes
  • Which network are you connected to?: ethereum
  • Did you try to restart the node?: yes

After upgrading from latest 1.10 to 1.11.1 I use way way more CPU, while network usage is down.

Stats

@folsen folsen added P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. labels May 23, 2018
@SwaroopH
Copy link

Same with us. Even did a full db reset couple of days later to ensure it wasn't some background db upgrade causing the load as you can see the graph.

screenshot 2018-05-25 13 02 41

@AyushyaChitransh
Copy link

I'm running:

  • Which Parity version?: v1.10.0-unstable-66755be8f-20180206/x86_64-linux-gnu/rustc1.23.0 and v1.12.0-unstable-8057e8df4-20180604/x86_64-linux-gnu/rustc1.26.1
  • Which operating system?: Amazon Linux
  • How installed?: from source
  • Are you fully synchronized?: yes! Fat DB and trace enabled.
  • Which network are you connected to?: Private network, PoA chain.
  • Did you try to restart the node?: so many times

More attempts to solve this issue: Tried following parameters:

  • increased the CPU from 2 core to 4 core
  • --no-hardware-wallets did not help.
  • --jsonrpc-threads 1 --jsonrpc-server-threads 1
  • --no-ancient-blocks also did not resolve the issue.

We are using Amazon Linux, 2 core CPU. This has been a gradual issue in our v1.10.0 versions. We received warnings for CPU usage upto 60% and then those issues transformed gradually into critical issues with CPU usage upto 104%

I am not able to pin this issue to any of these specific things:

  • a problem with parity
  • Issue with increasing blockchain size
  • Issue with server hardware configuration.

Is there any other thing/configuration which may help to resolve these CPU usage problems?

@folsen
Copy link
Contributor Author

folsen commented Jun 4, 2018

@AyushyaChitransh We're still investigating internally as well. We have some fixes in progress but nothing massive. A big part of the increase in CPU is simply that ethereum has way more transactions (both spam and not), tons of bots trying to outbit each other on DEX'es etc, and processing and relaying all of these transactions takes a lot of CPU, even though most of them don't make it into blocks. This is not the reason 1.11 uses more than 1.10, but it's a big part of why CPU usage has gone up over the last 6 months.

Once we have any solid recommendations, I'll post back to this issue.

@AyushyaChitransh
Copy link

In my case, I was monitoring the node using https://github.com/cubedro/eth-net-intelligence-api and as soon as I stopped the monitoring process, CPU usage came to normal in about 10 seconds. This happened three to four times.

I believe the RPC connections are causing the CPU issue in my case. If anyone else is facing this issue, can you verify if switching off RPC resolves this problem?

@tomusdrw
Copy link
Collaborator

@AyushyaChitransh what CLI flags are you running with? Do you mind providing logs of running with -lrpc=trace?

@5chdn 5chdn added this to the 1.12 milestone Jun 13, 2018
@jamespic
Copy link

jamespic commented Jun 14, 2018

Investigating the high CPU usage on my node, the culprit looks to be snappy compression, although I might be misreading the profiling data:

profile-data.zip

(frustratingly, GitHub won't let me upload SVGs, so it's zipped - the SVG is interactive if you open it in a browser).

If you want to measure your own workload, I got this profile by building a release build with debugging flags enabled (RUSTFLAGS=-g cargo build --release), downloading Brendan Gregg's Flamegraph tools, and then using:

sudo perf record -F 99 --call-graph dwarf -p $(pidof parity)
sudo perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > parity.svg

@ordian
Copy link
Collaborator

ordian commented Jun 14, 2018

Looks like rocksdb's snappy compression may also be responsible for an increased memory usage (it takes 28.8% + 16.9% = 45.7% of total allocated memory when warp syncing):
allocations
cc #8618

@5chdn
Copy link
Contributor

5chdn commented Jun 23, 2018

Any update on this @tomusdrw @folsen? Anything I can help with?

@folsen
Copy link
Contributor Author

folsen commented Jun 25, 2018

Seems like it is a mix of a lot of different issues, @tomusdrw has fixed some but is still working on more. I think in general we could use some more eyes on profiling and figuring out where are the cycles are spent.

@tomusdrw
Copy link
Collaborator

In general the increased CPU usage could be caused by couple of things depending on the settings:

  1. Increased peer count causing way ore communication (AES enc/dec)
  2. Due to higher peer count transactions were received multiple times hence verified multitple times. In case of small queues it lead to a lot of CPU spent on ecrecover, nonce checking etc, even though the transaction was rejected eventually. This is partially addressed in Minimal effective gas price in the queue #8934 second part will be addressed with rejection cache (soon)
  3. With large pools creating a pending set for propagation was also exteremely expensive, even though eventually we needed only hundreds/thousands transactions to propagate (due to package size limit and internal parity limitation of txs per packet). This is addressed in Limit the number of transactions in pending set #8777
  4. The last issues arethe complexity of pending transaction filter and NonceCache clearing. We'll have another PRs for both (cc @andresilva )

@5chdn 5chdn modified the milestones: 2.0, 2.1 Jul 17, 2018
@5chdn 5chdn modified the milestones: 2.1, 2.2 Sep 11, 2018
@5chdn 5chdn removed this from the 2.2 milestone Oct 29, 2018
@5chdn 5chdn added this to the 2.3 milestone Oct 29, 2018
@tomusdrw
Copy link
Collaborator

tomusdrw commented Nov 8, 2018

I believe this might be closed now. Let's reopen in case there is new data available for newer version of Parity.

@tomusdrw tomusdrw closed this as completed Nov 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible.
Projects
None yet
Development

No branches or pull requests

7 participants