Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

req-resp: Fix panic on connection closed for substream open failure #291

Merged
merged 3 commits into from
Nov 27, 2024

Conversation

lexnv
Copy link
Collaborator

@lexnv lexnv commented Nov 26, 2024

When a new connection is reported to the request-response protocol, the protocol tracks the state of the connection if and only if it could open a substream with the remote peer.

We do not track the state of the connection in cases where the substream cannot be opened because the connection is closed.

A further ConnectionClosed event triggered a debug_assert assumption for this case. The assumption is wrong since opening substreams can fail which informs us that the connection is closed.

Connection details not tracked:

Err(error) => {
tracing::warn!(
target: LOG_TARGET,
?peer,
protocol = %self.protocol,
request_id = ?context.request_id,
?error,
"failed to open substream",
);
return self
.report_request_failure(
peer,
context.request_id,
RequestResponseError::Rejected(error.into()),
)
.await;
}

Which later leads to panics:

let Some(context) = self.peers.remove(&peer) else {
tracing::error!(
target: LOG_TARGET,
?peer,
"state mismatch: peer doesn't exist",
);
debug_assert!(false);

Related to: #290

Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
@lexnv lexnv changed the title req-resp: Investigate state mismatch req-resp: Fix panic on substream open failure Nov 27, 2024
@lexnv lexnv changed the title req-resp: Fix panic on substream open failure req-resp: Fix panic on connection closed for substream open failure Nov 27, 2024
@lexnv lexnv self-assigned this Nov 27, 2024
@lexnv lexnv merged commit eb0d6f2 into master Nov 27, 2024
8 checks passed
@lexnv lexnv deleted the lexnv/investigate-mismatch branch November 27, 2024 13:11
lexnv added a commit that referenced this pull request Nov 27, 2024
## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([#291](#291))
- crypto/noise: Verify crypto/noise signature payload
([#278](#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([#292](#292))


cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 27, 2024
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.





## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([#292](paritytech/litep2p#292))


## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: #6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)


### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.


![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)



cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: GitHub Action <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 28, 2024
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.





## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([#292](paritytech/litep2p#292))


## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: #6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)


### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.


![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)



cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: GitHub Action <[email protected]>
github-actions bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 28, 2024
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.

## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([#292](paritytech/litep2p#292))

## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: #6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)

### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.

![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)

cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: GitHub Action <[email protected]>
(cherry picked from commit 51c3e95)
Krayt78 pushed a commit to Krayt78/polkadot-sdk that referenced this pull request Dec 18, 2024
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.





## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([paritytech#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([paritytech#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([paritytech#292](paritytech/litep2p#292))


## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: paritytech#6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)


### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.


![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)



cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: GitHub Action <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants