Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add h2::Error as a source for tonic::Status when converting from h2::Error #612

Merged
merged 14 commits into from
Jun 23, 2021

Conversation

tdyas
Copy link
Contributor

@tdyas tdyas commented Apr 24, 2021

Motivation

A gRPC server may send a HTTP/2 GOAWAY frame with NO_ERROR status to gracefully shutdown a connection. This appears to Tonic users as a tonic::Status with Code::Internal and the message set to h2 protocol error: protocol error: not a result of an error.

The only way to currently detect this case and differentiate it from other internal errors (e.g., an application-level internal error) is to match on the message. A client may want to differentiate these cases because it may only want to alert on the application-level internal error and not on the transient transport-level issue. (Indeed, this is the use case for which I'm envisioning using this change.)

Matching on a message is not as robust, however, as matching on an h2::Error and its reason code. (The message could change for example if a future version of Tonic decided to vary the message. This would break any users that matched on the previous version of the message.)

Solution

Store the h2::Error used when creating a tonic::Status from a h2::Error and provide it as the source for purposes of std::error::Error. This will allow users to downcast it and match on the original h2::Reason.

Note: tonic::Status must now manually implement Clone becauseh2::Error cannot implement Clone since one of the Kind variants stores a std::io::Error (which does not implement Clone). Since we know that h2_error will always have a reason since we only store it in that case, this manual implementation of Clone just extracts the reason and creates a new h2::Error from it to "clone" it.

@tdyas
Copy link
Contributor Author

tdyas commented Apr 24, 2021

This could also be simplified to just have Status carry a h2_reason field and not support any of the std::error::Error::source functionality. (There would be a h2_reason method returning an Option<h2::Reason>).

Copy link
Member

@davidpdrsn davidpdrsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this makes sense 😊 I had a couple of suggestions. Just minor things.

This could also be simplified to just have Status carry a h2_reason field and not support any of the std::error::Error::source functionality. (There would be a h2_reason method returning an Optionh2::Reason).

I actually prefer the implementation in this PR as that we wont need special methods for each new kind of error source we might add in the future.

tonic/src/status.rs Outdated Show resolved Hide resolved
tonic/src/status.rs Outdated Show resolved Hide resolved
Copy link
Member

@davidpdrsn davidpdrsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good but should hear from @LucioFranco as well.

tonic/src/status.rs Outdated Show resolved Hide resolved
@davidpdrsn
Copy link
Member

@tdyas do you wanna drive this home? We would like to include the change to Status in the upcoming 0.5 release.

@tdyas
Copy link
Contributor Author

tdyas commented May 19, 2021

I modified the PR to add a generic source field, although my Rust knowledge is not as advanced so not sure what the right invocation to use to fix this error:

error[E0277]: the trait bound `Box<dyn std::error::Error + Sync + std::marker::Send>: Borrow<dyn std::error::Error>` is not satisfied
   --> tonic/src/status.rs:633:35
    |
633 |         self.source.map(|err| err.borrow())
    |                                   ^^^^^^ the trait `Borrow<dyn std::error::Error>` is not implemented for `Box<dyn std::error::Error + Sync + std::marker::Send>`
    |
    = help: the following implementations were found:
              <Box<T, A> as Borrow<T>>

@davidpdrsn
Copy link
Member

davidpdrsn commented May 19, 2021

This seems to do the trick

impl Error for Status {
    fn source(&self) -> Option<&(dyn Error + 'static)> {
        self.source.as_ref().map(|err| (&**err) as _)
    }
}
  • as_ref is needed because Option::map consumes self and you cannot move source out of a &self
  • Then map receives as &Box<dyn Error + Send + Sync + 'static>. The *s first derefs the &Box<dyn Error + Send + Sync + 'static> into a Box<dyn Error + Send + Sync + 'static> and the second derefs the box into dyn Error + Send + Sync + 'static which you need to put behind a reference with & since its unsized.
  • Finally the as _ casts the dyn Error + Send + Sync + 'static into a dyn Error + 'static which rust apparently can't do on its own.

Phew 😨

Removing Clone on Status causes a test in prost.rs to fail but that can be fixed with

let messages = std::iter::repeat_with(move || Ok::<_, Status>(msg.clone())).take(10000);

Copy link
Member

@davidpdrsn davidpdrsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing my review "comment" since I wanna give it final look when things are compiling.

Edit: Which apparently github doesn't let you do... How do you go from "approved" to "not approved yet" without "request changes" 🤷

@tdyas tdyas requested a review from davidpdrsn May 19, 2021 08:06
@tdyas
Copy link
Contributor Author

tdyas commented May 19, 2021

Edit: Which apparently github doesn't let you do... How do you go from "approved" to "not approved yet" without "request changes" 🤷

Not sure but I can hit the re-request review button to reset that state (which I just did).

Copy link
Member

@davidpdrsn davidpdrsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good! I had one question and an integration test would be cool as well.

tonic/src/status.rs Show resolved Hide resolved
.source()
.and_then(|err| err.downcast_ref::<h2::Error>())
.unwrap();
assert_eq!(orig.reason(), source.reason());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it would also be nice to have a high level integration test kinda thing that uses the APIs users would. Not sure how to write such a test though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The h2 crate has test helpers for low level testing of HTTP/2 streams. Maybe this integration test would be similar to the tests in https://github.com/hyperium/h2/tree/master/tests/h2-tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on whether an h2-level integration test would be the way to go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need those low level helpers, we should be able to trigger most things from hyper/h2 directly, no?

@LucioFranco
Copy link
Member

@tdyas @davidpdrsn any updates on this?

@tdyas
Copy link
Contributor Author

tdyas commented Jun 22, 2021

I haven't had a chance to work on an integration test for this due to my work schedule. I see there is now a merge conflict, in trying to fix that, I am having a compile error in tonic/src/transport/server/recover_error.rs due to err's ownership passing into Status::try_from_error and not being available in the else branch:

                if let Some(status) = Status::try_from_error(&*err) {
                    let mut res = Response::new(MaybeEmptyBody::empty());
                    status.add_header(res.headers_mut()).unwrap();
                    Poll::Ready(Ok(res))
                } else {
                    Poll::Ready(Err(err))
                }

I will need to adjust the Status::try_from_error signature to pass the err back out so the caller can recover it if the conversion fails. Never mind, it already does this.

@tdyas tdyas force-pushed the add_error_source branch from cdc139b to 92aaafe Compare June 22, 2021 13:51
@tdyas
Copy link
Contributor Author

tdyas commented Jun 22, 2021

I rebased the PR and now it fails with:

---- picks_client_timeout_if_thats_sorter stdout ----
thread 'picks_client_timeout_if_thats_sorter' panicked at 'assertion failed: err.message().contains(\"Timeout expired\")', tests/integration_tests/tests/timeout.rs:60:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- cancelation_on_timeout stdout ----
thread 'cancelation_on_timeout' panicked at 'assertion failed: err.message().contains(\"Timeout expired\")', tests/integration_tests/tests/timeout.rs:22:5

Are those tests relying on TimeoutExpired being preferred?

@LucioFranco
Copy link
Member

@davidpdrsn ^

@tdyas
Copy link
Contributor Author

tdyas commented Jun 22, 2021

Also, I had to add Send + Sync bounds to the Error passed into Status::try_from_error to get it compiling. Not sure how to have Rust "pass through" Send+Sync if present onto the result type of a method.

@davidpdrsn
Copy link
Member

Are those tests relying on TimeoutExpired being preferred?

Yes we should still downcast that error correctly. I did some digging and this fixes it:

diff --git a/tonic/src/status.rs b/tonic/src/status.rs
index ef9731a..927d95b 100644
--- a/tonic/src/status.rs
+++ b/tonic/src/status.rs
@@ -320,12 +320,6 @@ impl Status {
             Err(err) => err,
         };
 
-        #[cfg(feature = "transport")]
-        let err = match err.downcast::<crate::transport::TimeoutExpired>() {
-            Ok(timeout) => return Ok(Status::cancelled(timeout.to_string())),
-            Err(err) => err,
-        };
-
         #[cfg(feature = "transport")]
         let err = match err.downcast::<h2::Error>() {
             Ok(h2) => {
@@ -555,6 +549,11 @@ fn find_status_in_source_chain(err: &(dyn Error + 'static)) -> Option<Status> {
             });
         }
 
+        #[cfg(feature = "transport")]
+        if let Some(timeout) = err.downcast_ref::<crate::transport::TimeoutExpired>() {
+            return Some(Status::cancelled(timeout.to_string()));
+        }
+
         source = err.source();
     }

Also, I had to add Send + Sync bounds to the Error passed into Status::try_from_error to get it compiling. Not sure how to have Rust "pass through" Send+Sync if present onto the result type of a method.

Nah I don't think there is another way. Its fine 👌

@tdyas
Copy link
Contributor Author

tdyas commented Jun 22, 2021

Pushed @davidpdrsn's fix for the timeout error decoding.

Copy link
Member

@davidpdrsn davidpdrsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdyas Thank you for your excellent work on this! I only had one minor comment/question. So close! 😃

Comment on lines 595 to 602
impl TryFrom<Box<dyn Error + Send + Sync + 'static>> for Status {
type Error = Box<dyn Error + Send + Sync + 'static>;

fn try_from(err: Box<dyn Error + Send + Sync + 'static>) -> Result<Self, Self::Error> {
Status::try_from_error(err)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific use-case this impl enables, or is mainly for convenience? I dunno, feels a bit out of place for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall, and I can't find a relevant use of this in the main code bases in which I work. Shall I remove it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should just remove it.

Copy link
Member

@davidpdrsn davidpdrsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just need to get CI working then we can merge! Thanks for pushing this across the finish line!

@@ -356,7 +355,13 @@ impl Status {
_ => Code::Unknown,
};

Status::new(code, format!("h2 protocol error: {}", err))
let mut status = Self::new(code, format!("h2 protocol error: {}", err));
let error: Option<Box<dyn Error + Send + Sync + 'static>> = err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this extra type hint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall. I removed it and it still compiles so pushed the removal of it.

@@ -3,6 +3,7 @@ use crate::metadata::MetadataMap;
use bytes::Bytes;
use http::header::{HeaderMap, HeaderValue};
use percent_encoding::{percent_decode, percent_encode, AsciiSet, CONTROLS};
use std::convert::TryFrom;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is failing CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like deny(warnings) is only set in CI? When I run cargo clippy, it only warned for this and some other issues. It would probably be useful to make the crates enable #![deny(warnings)] (but in a separate PR obviously).

@davidpdrsn davidpdrsn merged commit b90bb7b into hyperium:master Jun 23, 2021
@riking
Copy link

riking commented Jul 7, 2021

GOAWAY frame with NO_ERROR status to gracefully shutdown a connection. This appears to Tonic users as a tonic::Status with Code::Internal and the message set to h2 protocol error: protocol error: not a result of an error.

I believe the official GRPC-on-h2 spec says that this should be mapped by the library to a UNAVAILABLE code; the user of the library shouldn't have to deal with recognizing it. https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#goaway-frame

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants