Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Restore transient error detection for canceled hyper requests #840

Merged

Conversation

iamjpotts
Copy link
Contributor

@iamjpotts iamjpotts commented Sep 8, 2024

Description:

The crate used to, but now fails to, detect gRPC requests that failed with a tonic error wrapping a hyper error where the hyper error was in a "cancelled" state.

This detection mechanism uses downcast to inspect a source (inner) error, and broke when hyper was upgraded from 0.14.x to 1.x on 4/26 in 2445034.

cc @RickyLB / @mehcode

tonic 0.11, the current dependency, and tonic 0.9, the previous dependency before that commit, both depend on hyper 0.14, and the downcast to a hyper 1.x Error type will not return the hyper 0.14.x Error type wrapped by tonic.

This PR modifies is_hyper_canceled to check for the Error type from both versions of hyper.

Alternatives Considered

  • Simply downgrade back to hyper 0.14. However, when tonic is upgraded from 0.11 to 0.12 it would be easy to forget, and the compiler would not warn you about, that downcast being broken again.
  • Add more tests - for positive detection of the canceled error state. This is made excessively difficult by the choice of the hyper crate maintainers to not provide public constructors for error variants

Fixes #

Notes for reviewer:

This fixes the backoff/retry mechanism in the context of connection failures, such as:

Execution of hedera::ping_query::PingQuery on node at index 4 / node id 0.0.7 failed due to Permanent(GrpcStatus(Status { code: Unknown, message: "transport error", source: Some(tonic::transport::Error(Transport, hyper::Error(Canceled, "connection was not ready"))) }))

That error should be retried, but is not, because the hyper error in a canceled state is not detected.

Logs after this PR is applied (and before the broken hyper upgrade linked above was applied):

2024-09-08T20:51:48.375888Z DEBUG hedera::execute: Preparing hedera::query::Query<hedera::account::account_balance_query::AccountBalanceQueryData> on node at index 3 / node id 0.0.6    
2024-09-08T20:51:48.375938Z DEBUG hedera::execute: Executing hedera::query::Query<hedera::account::account_balance_query::AccountBalanceQueryData> on node at index 3 / node id 0.0.6    
2024-09-08T20:51:48.377356Z  WARN hedera::execute: Execution of hedera::query::Query<hedera::account::account_balance_query::AccountBalanceQueryData> on node at index 3 / node id 0.0.6 will continue due to GrpcStatus(Status { code: Unknown, message: "transport error", source: Some(tonic::transport::Error(Transport, hyper::Error(Canceled, "connection closed"))) })    
2024-09-08T20:51:48.377530Z DEBUG hedera::execute: Preparing hedera::query::Query<hedera::account::account_balance_query::AccountBalanceQueryData> on node at index 1 / node id 0.0.4    
2024-09-08T20:51:48.377571Z DEBUG hedera::execute: Executing hedera::query::Query<hedera::account::account_balance_query::AccountBalanceQueryData> on node at index 1 / node id 0.0.4    
2024-09-08T20:51:48.463582Z DEBUG hedera::execute: Execution of hedera::query::Query<hedera::account::account_balance_query::AccountBalanceQueryData> on node at index 1 / node id 0.0.4 succeeded

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

@iamjpotts iamjpotts force-pushed the 20240908-tonic-hyper-cancel-detection branch 3 times, most recently from 99686b9 to 773a076 Compare September 8, 2024 21:08
@RickyLB RickyLB merged commit facd64e into hashgraph:main Sep 19, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants