Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identify: Fix memory leak of unused pending_opens #273

Merged
merged 1 commit into from
Oct 24, 2024

Conversation

lexnv
Copy link
Collaborator

@lexnv lexnv commented Oct 23, 2024

The identify protocol implementation leaked SubstreamIds and PeerIds via the pending_opens hashMap.
Objects were only inserted in the pending_opens, however they were never removed.

The only possible purpose of pending_opens is to double-check the events coming from the service layer: TransportEvent::SubstreamOpened. However this is not needed, as illustrated by the current implementation.

Part of endeavors to fix memory leaks: paritytech/polkadot-sdk#6149

Testing Done

  • custom patched litep2p to dump the internal state of identify protocol running in kusama

cc @paritytech/networking

Copy link
Collaborator

@dmitry-markin dmitry-markin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@lexnv lexnv merged commit 0084dc3 into master Oct 24, 2024
8 checks passed
@lexnv lexnv deleted the lexnv/fix-identify-leak branch October 24, 2024 13:55
lexnv added a commit that referenced this pull request Oct 24, 2024
The purpose of the `pending_opens` field is to double check outbound
substream opens.
This was used to ensure that the substream ID was indeed opened to a
given peer ID.
However, this is not needed considering the `identify` implementation.

Further, the `pending_opens` was leaking `(SubstramId, PeerId)` tuples
in cases where the
substream opening would later fail. In other words, the implementation
did not remove the tuples on the `TransportEvent::SubstreamOpenFailure`
event.

Part of endeavors to fix memory leaks:
paritytech/polkadot-sdk#6149

### Testing Done
- custom patched litep2p to dump the internal state of identify protocol
running in kusama

The code is similar to the identify protocol. However, this leak was
more subtle and not of the magnitude of the `identify` protocol since
substream open failures are not that frequent:
 - #273
 
 cc @paritytech/networking

Signed-off-by: Alexandru Vasile <[email protected]>
lexnv added a commit that referenced this pull request Nov 4, 2024
## [0.8.0] - 2024-11-01

This release adds support for content provider advertisement and
discovery to Kademlia protocol implementation (see libp2p
[spec](https://github.com/libp2p/specs/blob/master/kad-dht/README.md#content-provider-advertisement-and-discovery)).
Additionally, the release includes several improvements and memory leak
fixes to enhance the stability and performance of the litep2p library.

### Added

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](#246))
- kad: Providers part 6: stop providing
([#245](#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](#236))
- kad: Providers part 4: refresh local providers
([#235](#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](#234))

### Changed

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](#255))
- kad/executor: Add timeout for writting frames
([#277](#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](#233))
- peer_state: Robust state machine transitions
([#251](#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](#250))
- kad: Remove unused serde cfg
([#262](#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](#254))

### Fixed

- tcp/websocket/quic: Fix cancel memory leak
([#272](#272))
- transport: Fix pending dials memory leak
([#271](#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](#273))
- kad: Fix not retrieving local records
([#221](#221))

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 5, 2024
This PR updates litep2p to the latest release.

- `KademliaEvent::PutRecordSucess` is renamed to fix word typo
- `KademliaEvent::GetProvidersSuccess` and
`KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work
and will be utilized later


### Added

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](paritytech/litep2p#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](paritytech/litep2p#246))
- kad: Providers part 6: stop providing
([#245](paritytech/litep2p#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](paritytech/litep2p#236))
- kad: Providers part 4: refresh local providers
([#235](paritytech/litep2p#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](paritytech/litep2p#234))

### Changed

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](paritytech/litep2p#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](paritytech/litep2p#255))
- kad/executor: Add timeout for writting frames
([#277](paritytech/litep2p#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](paritytech/litep2p#233))
- peer_state: Robust state machine transitions
([#251](paritytech/litep2p#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](paritytech/litep2p#250))
- kad: Remove unused serde cfg
([#262](paritytech/litep2p#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](paritytech/litep2p#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](paritytech/litep2p#254))

### Fixed

- tcp/websocket/quic: Fix cancel memory leak
([#272](paritytech/litep2p#272))
- transport: Fix pending dials memory leak
([#271](paritytech/litep2p#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](paritytech/litep2p#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](paritytech/litep2p#273))
- kad: Fix not retrieving local records
([#221](paritytech/litep2p#221))

See release changelog for more details:
https://github.com/paritytech/litep2p/releases/tag/v0.8.0

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
lexnv added a commit to paritytech/polkadot-sdk that referenced this pull request Nov 15, 2024
This PR updates litep2p to the latest release.

- `KademliaEvent::PutRecordSucess` is renamed to fix word typo
- `KademliaEvent::GetProvidersSuccess` and
`KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work
and will be utilized later

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](paritytech/litep2p#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](paritytech/litep2p#246))
- kad: Providers part 6: stop providing
([#245](paritytech/litep2p#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](paritytech/litep2p#236))
- kad: Providers part 4: refresh local providers
([#235](paritytech/litep2p#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](paritytech/litep2p#234))

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](paritytech/litep2p#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](paritytech/litep2p#255))
- kad/executor: Add timeout for writting frames
([#277](paritytech/litep2p#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](paritytech/litep2p#233))
- peer_state: Robust state machine transitions
([#251](paritytech/litep2p#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](paritytech/litep2p#250))
- kad: Remove unused serde cfg
([#262](paritytech/litep2p#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](paritytech/litep2p#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](paritytech/litep2p#254))

- tcp/websocket/quic: Fix cancel memory leak
([#272](paritytech/litep2p#272))
- transport: Fix pending dials memory leak
([#271](paritytech/litep2p#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](paritytech/litep2p#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](paritytech/litep2p#273))
- kad: Fix not retrieving local records
([#221](paritytech/litep2p#221))

See release changelog for more details:
https://github.com/paritytech/litep2p/releases/tag/v0.8.0

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants