-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Investigate higher thread usage since Polkadot 0.9.36 #12988
Comments
@Nexus2k Thanks for the graphs! Sorry for the dumb question, what tool do you use to collect the statistics? Do you have the same thread count graph for v0.9.33 to better understand the difference? |
@dmitry-markin that should be grafana. See Tuesday before it started to grow. We released on Tuesday, so that should be the point when he upgraded. |
@dmitry-markin I use https://checkmk.com/ to monitor the few dozens of servers I run for substrate blockchains. Here's the graph for the last 8 days, the only changing part according to my knowledge was the polkadot release upgrade: The node that produces this graph is somewhat permanently active in Kusama and became active in Polkadot on Tuesday around 16:30 CET. The release upgrade happened on 14:30 CET due to automatic upgrading on new docker image publishing. |
Do you mean you run two nodes on the same server, and the number of threads is the total number of threads in the system? I don't have access to any validator nodes, but would expect the number of polkadot threads to be on the order of 50-100 (I just measured 42 during syncing on v0.9.36 and 54 on v0.9.33). |
Correct, it's a rather beefy machine and can handle multiple chains in parallel. And yes it's the overall system threads which were rather flat before the upgrade as you can see. I've rebooted the system for good measure just to make sure it's not a problem for any other component. |
@Nexus2k Could you get the number of polkadot threads during the anomaly, something like |
|
It looks like the high thread count is caused not by networking code, but by PVF subsystem, which I'm unfamiliar with. @bkchr do you know who better to assign the issue to? |
Ahh it just aggregates all the threads of all running polkadot processes. There is currently some rework going on around the pvf worker which should solve this. I'm going to close this, as this is nothing problematic. However, thank you @dmitry-markin for looking that fast into it and ty for @Nexus2k for your help here :) Ping @m-cat as you are the one rewriting the pvf worker currently. As I said above, I don't think that we need to take any special action here. |
Thanks for the ping @bkchr. I wasn't aware that the thread count was so high, so it's good that we will be closing them properly in paritytech/polkadot#6419. For background, most of these threads are sleeping, which is why the CPU usage doesn't change. We don't currently kill them because killing threads is unsafe. But the PR above changes it so that we signal the thread to finish on its own, when the PVF job finishes. |
People report that since switching to Polkadot 0.9.36 from 0.9.33 (last node release) they see a much higher thread usage. They report that the CPU usage didn't increased.
I think that this is maybe related to the switch of using tokio in libp2p. If it is confirmed, this issue can be closed. In general this is just some investigation to find the underlying change for this behavior.
The text was updated successfully, but these errors were encountered: