Investigate higher thread usage since Polkadot 0.9.36 #12988

bkchr · 2022-12-21T08:41:10Z

People report that since switching to Polkadot 0.9.36 from 0.9.33 (last node release) they see a much higher thread usage. They report that the CPU usage didn't increased.

I think that this is maybe related to the switch of using tokio in libp2p. If it is confirmed, this issue can be closed. In general this is just some investigation to find the underlying change for this behavior.

Nexus2k · 2022-12-21T10:31:33Z

I'm the reporter of the issue, please find the two graphs attached:

CPU load itself is roughly the same:

Thread usage seems to spike with 1k threads when polkadot node is in active set and producing blocks.

dmitry-markin · 2022-12-21T10:45:09Z

@Nexus2k Thanks for the graphs! Sorry for the dumb question, what tool do you use to collect the statistics?

Do you have the same thread count graph for v0.9.33 to better understand the difference?

bkchr · 2022-12-21T11:30:54Z

@dmitry-markin that should be grafana.

See Tuesday before it started to grow. We released on Tuesday, so that should be the point when he upgraded.

Nexus2k · 2022-12-21T12:15:27Z

@dmitry-markin I use https://checkmk.com/ to monitor the few dozens of servers I run for substrate blockchains.

Here's the graph for the last 8 days, the only changing part according to my knowledge was the polkadot release upgrade:

The node that produces this graph is somewhat permanently active in Kusama and became active in Polkadot on Tuesday around 16:30 CET. The release upgrade happened on 14:30 CET due to automatic upgrading on new docker image publishing.

dmitry-markin · 2022-12-21T12:53:51Z

The node that produces this graph is somewhat permanently active in Kusama and became active in Polkadot ...

Do you mean you run two nodes on the same server, and the number of threads is the total number of threads in the system?

I don't have access to any validator nodes, but would expect the number of polkadot threads to be on the order of 50-100 (I just measured 42 during syncing on v0.9.36 and 54 on v0.9.33).

Nexus2k · 2022-12-21T12:57:09Z

Correct, it's a rather beefy machine and can handle multiple chains in parallel. And yes it's the overall system threads which were rather flat before the upgrade as you can see. I've rebooted the system for good measure just to make sure it's not a problem for any other component.

Nexus2k · 2022-12-21T13:00:03Z

Here's a thread graph of another system that also shows annomalies after the upgrade:

That one is was not active during the upgrade but became active in Kusama at 19:28 CET

dmitry-markin · 2022-12-21T13:06:36Z

@Nexus2k Could you get the number of polkadot threads during the anomaly, something like ps -e -O nlwp | grep polkadot?

Nexus2k · 2022-12-21T14:03:18Z

# ps -e -O nlwp | grep polkadot
   3442   84 S ?        01:36:12 /usr/bin/polkadot --name=🍁 HIGH/STAKE 🥩 | HEL1-KSM -lsync=warn,afg=warn,babe=warn --chain=kusama --validator --rpc-methods=Unsafe --rpc-external --rpc-cors=all --listen-addr=/ip4/0.0.0.0/tcp/30333 --public-addr=/ip4/<REDACTED>/tcp/30334 --prometheus-external --no-mdns --pruning=1000 --telemetry-url=wss://telemetry.polkadot.io/submit/ 1 --telemetry-url=wss://telemetry-backend.w3f.community/submit/ 1
   3767   85 S ?        01:18:52 /usr/bin/polkadot --name=🍁 HIGH/STAKE 🥩 | HEL1-DOT -lsync=warn,afg=warn,babe=warn --chain=polkadot --validator --rpc-methods=Unsafe --rpc-external --rpc-cors=all --listen-addr=/ip4/0.0.0.0/tcp/30333 --public-addr=/ip4/<REDACTED>/tcp/30333 --prometheus-external --no-mdns --pruning=1000 --telemetry-url=wss://telemetry.polkadot.io/submit/ 1 --telemetry-url=wss://telemetry-backend.w3f.community/submit/ 1
   4326   44 S ?        00:02:11 /usr/bin/polkadot prepare-worker /tmp/pvf-host-prepareK3WuNH4EJm
   4351  362 S ?        00:00:29 /usr/bin/polkadot execute-worker /tmp/pvf-host-executeBFOEmXjdbI
   4352  562 S ?        00:02:26 /usr/bin/polkadot execute-worker /tmp/pvf-host-executetYaDNMqDdJ

dmitry-markin · 2022-12-21T14:40:03Z

It looks like the high thread count is caused not by networking code, but by PVF subsystem, which I'm unfamiliar with. @bkchr do you know who better to assign the issue to?

bkchr · 2022-12-21T14:45:28Z

Ahh it just aggregates all the threads of all running polkadot processes.

There is currently some rework going on around the pvf worker which should solve this. I'm going to close this, as this is nothing problematic.

However, thank you @dmitry-markin for looking that fast into it and ty for @Nexus2k for your help here :)

Ping @m-cat as you are the one rewriting the pvf worker currently. As I said above, I don't think that we need to take any special action here.

mrcnski · 2022-12-21T15:20:47Z

Thanks for the ping @bkchr. I wasn't aware that the thread count was so high, so it's good that we will be closing them properly in paritytech/polkadot#6419.

For background, most of these threads are sleeping, which is why the CPU usage doesn't change. We don't currently kill them because killing threads is unsafe. But the PR above changes it so that we signal the thread to finish on its own, when the PVF job finishes.

bkchr added the I8-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint. label Dec 21, 2022

bkchr assigned dmitry-markin Dec 21, 2022

bkchr closed this as completed Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate higher thread usage since Polkadot 0.9.36 #12988

Investigate higher thread usage since Polkadot 0.9.36 #12988

bkchr commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 •

edited

Loading

dmitry-markin commented Dec 21, 2022 •

edited

Loading

bkchr commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 •

edited

Loading

dmitry-markin commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 •

edited

Loading

Nexus2k commented Dec 21, 2022 •

edited

Loading

dmitry-markin commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 •

edited

Loading

dmitry-markin commented Dec 21, 2022

bkchr commented Dec 21, 2022

mrcnski commented Dec 21, 2022

Investigate higher thread usage since Polkadot 0.9.36 #12988

Investigate higher thread usage since Polkadot 0.9.36 #12988

Comments

bkchr commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 • edited Loading

dmitry-markin commented Dec 21, 2022 • edited Loading

bkchr commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 • edited Loading

dmitry-markin commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 • edited Loading

Nexus2k commented Dec 21, 2022 • edited Loading

dmitry-markin commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 • edited Loading

dmitry-markin commented Dec 21, 2022

bkchr commented Dec 21, 2022

mrcnski commented Dec 21, 2022

Nexus2k commented Dec 21, 2022 •

edited

Loading

dmitry-markin commented Dec 21, 2022 •

edited

Loading

Nexus2k commented Dec 21, 2022 •

edited

Loading

Nexus2k commented Dec 21, 2022 •

edited

Loading

Nexus2k commented Dec 21, 2022 •

edited

Loading

Nexus2k commented Dec 21, 2022 •

edited

Loading