-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
networking/litep2p: Node running litep2p seems to be leaking memory #6149
Comments
Thanks Alex for raising this! 🙏 We also had this issue with addresses, although the memory consumed here is in the order of GiB (#5998). There are a few places that come to mind where to look at next:
I would start by looking at litep2p and then move to substrate code |
We need metrics to filter out potential leaks (ie monotonically increasing state tracking is a concern). We have around 3 separate leaks:
1. Transport Manager State Leak
2. TCP/WebSocket Pending Dials Leak
3. TCP/WebSocket Cancellation Logic Leak
|
Litep2p PRsFor more details and explained edge-cases when the leaks happen see: |
Lower severity memory leaks in the ping and identify protocols: |
The identify protocol implementation leaked SubstreamIds and PeerIds via the `pending_opens` hashMap. Objects were only inserted in the `pending_opens`, however they were never removed. The only possible purpose of `pending_opens` is to double-check the events coming from the service layer: `TransportEvent::SubstreamOpened`. However this is not needed, as illustrated by the current implementation. Part of endeavors to fix memory leaks: paritytech/polkadot-sdk#6149 ### Testing Done - custom patched litep2p to dump the internal state of identify protocol running in kusama cc @paritytech/networking Signed-off-by: Alexandru Vasile <[email protected]>
The purpose of the `pending_opens` field is to double check outbound substream opens. This was used to ensure that the substream ID was indeed opened to a given peer ID. However, this is not needed considering the `identify` implementation. Further, the `pending_opens` was leaking `(SubstramId, PeerId)` tuples in cases where the substream opening would later fail. In other words, the implementation did not remove the tuples on the `TransportEvent::SubstreamOpenFailure` event. Part of endeavors to fix memory leaks: paritytech/polkadot-sdk#6149 ### Testing Done - custom patched litep2p to dump the internal state of identify protocol running in kusama The code is similar to the identify protocol. However, this leak was more subtle and not of the magnitude of the `identify` protocol since substream open failures are not that frequent: - #273 cc @paritytech/networking Signed-off-by: Alexandru Vasile <[email protected]>
…stable2409` for a temporary apply to the downstream project.
Looking over the dashboards on our kusama validators the memory on node that is running litep2p seems to be constantly increasing it is now at 12GiB, all other nodes are around 3-4 GiB and constant.
https://grafana.teleport.parity.io/goto/Uoh4CQmHg?orgId=1
cc: @paritytech/networking
The text was updated successfully, but these errors were encountered: