Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tracing] Allow to customize TracingLayer based on request #330

Closed
wants to merge 1 commit into from

Conversation

erebe
Copy link
Contributor

@erebe erebe commented Feb 5, 2023

Hello,

This is a rough/draft patch aiming to provide support for starting a discussion.

Motivation

TracingLayer is a great place to hook itself to gather information about the lifecycle of every request.
Sadly each trait OnRequest, OnResponse, OnFailure, OnEos, ... are independent of each others, and it is not possible to pass information between them.
In some cases, it is desirable to pass request information to other layers.
Take for example the case where we want to provide metrics regarding each request response's status per path

impl<A, B> OnResponse<A, B> for QoveryTracing {
    fn on_response(self, response: &Response<A>, duration: Duration, _span: &Span) {
        static OK_HEADER: HeaderValue = HeaderValue::from_static("0");

        let status = response.headers().get("grpc-status").unwrap_or(&OK_HEADER);

        let path = // How to get path of the request ?
        let status = to_str(tonic::Code::from_bytes(status.as_bytes()));
        metrics::REQUEST_COUNTER.with_label_values(&[path, status]).inc();
        metrics::REQUEST_LATENCIES
            .with_label_values(&[path, status])
            .observe(duration.as_millis() as f64);
    }

or for example, how to count the number of in-flight request per path ?

At the moment, the only context that is passed around is the span, but it is difficult to use it to attach information.
Span id, have no guarantee around uniqueness, so cannot be used as an index.
Span.record() let attach information, but it is not possible to retrieve the value attached to it.

Solution

This patch aims to support this use case by adding on each layer/Onxxx an adapt method where is passed the initial Request and the span (not sure the span is necessary), to allow them to know the context of the execution and adapt themselves to it.

For example, for in the previous example:

impl<A, B> OnResponse<A, B> for QoveryTracing {
    fn on_response(self, response: &Response<A>, duration: Duration, _span: &Span) {
        static OK_HEADER: HeaderValue = HeaderValue::from_static("0");

        let status = response.headers().get("grpc-status").unwrap_or(&OK_HEADER);

        let path = &self.request_path;
        let status = to_str(tonic::Code::from_bytes(status.as_bytes()));
        metrics::REQUEST_COUNTER.with_label_values(&[path, status]).inc();
        metrics::REQUEST_LATENCIES
            .with_label_values(&[path, status])
            .observe(duration.as_millis() as f64);
    }

    fn adapt(&mut self, request: &Request<B>, _span: &Span) {
        let path = QoveryTracing::get_path(request);
        self.request_path = CompactString::new(path);
    }
}

or for example counting the number of in-flight requests, would look like:

#[derive(Clone, Default)]
struct QoveryEos {
    path: CompactString,
}

impl Drop for QoveryTracing {
    fn drop(&mut self) {
        if !self.inflight_metric.is_empty() {
            metrics::INFLIGHT_REQUESTS
                .with_label_values(&[&self.path])
                .dec();
        }
    }
}

impl<ReqBody> OnEos<ReqBody> for QoveryTracing {
    fn on_eos(self, _trailers: Option<&HeaderMap>, _stream_duration: Duration, _span: &Span) {}

    fn adapt(&mut self, request: &Request<ReqBody>, _span: &Span) {
        let path = QoveryTracing::get_path(request);
        self.path = CompactString::new(path);
        metrics::INFLIGHT_REQUESTS.with_label_values(&[path]).inc();
    }
}

Discuss

The discussion I would like to start, is do you think this approach is the right one ?
If yes, would you be interested in a more complete patch for it to be integrated.

@erebe
Copy link
Contributor Author

erebe commented Feb 13, 2023

Hello,

Gentle bump regarding this PR :)

@davidpdrsn
Copy link
Member

I'm sorry but please don't "bump" PRs, especially after just a week. Everyone who works on tower-http are volunteers so we'll get to it when we get to it.

@erebe
Copy link
Contributor Author

erebe commented Feb 13, 2023

Hello @davidpdrsn,

No worries and sorry about that. It wasn't to be unrespectful of your time, as I wasn't knowing, it was to be sure the PR haven't been missed.

Now that I know, I am fine with the wait.

Thank you for your time.

@erebe
Copy link
Contributor Author

erebe commented Mar 7, 2023

Hi back,

I hope you are going well.
Just to let you know that I still want to move forward with this PR and would like to avoid letting it rot, as new release of the crate are being made.

I think this PR can provide values to others, internally we use it to provide metrics/trace per request { path and status } on several projects, and I am sure it can benefit others too.

image

So my question is, how can I help to move this forward.
Are you ok with the idea, or there is no chance to merge this at all ?
If yes, what would like to see/changes/add/remove in the PR for to be able to merge it.

@jplatte
Copy link
Collaborator

jplatte commented Mar 7, 2023

The main problem here is the amount of time maintainers can put into the project in their free time (since nobody is paid to work on this). I can think of one way you can help and that is to provide reviews of other PRs¹ so that once somebody with merge access reviews, there is a higher chance that it is already in a mergable state and they can move on to something else.

¹ there is one other PR that hasn't received a review yet: #333

@davidpdrsn
Copy link
Member

Are you ok with the idea, or there is no chance to merge this at all ?

I haven't had time to think about the consequences of doing this change. tower-http's tracing stuff is already complicated enough as is, so I'm not sure if we wanna add more stuff to it. Could be, not sure 🤷

@jplatte
Copy link
Collaborator

jplatte commented Mar 7, 2023

I just had a look at what's actually being proposed here and have to say I'm not a fan. This isn't just about the change itself though, I had previously only used make_span_with and none of the on_ methods of TraceLayer and this is now the first time I'm looking at those in detail.

To try to explain what I'm not feeling happy about: The API to me seems like it is over-using generics and traits. I think the obvious counter-example to this is CorsLayer, which defines no traits of its own but still allows for customizing things everywhere.

@davidpdrsn
Copy link
Member

Hehe yes it's entirely possible that I went too much into the generics on this one 😅 The main difference tracing and cors though is that tracing has lots of callbacks. We could use more Arc<dyn Fn>s which might be nicer.

@erebe
Copy link
Contributor Author

erebe commented Mar 7, 2023

Thank you for the responses :)

¹ there is one other PR that hasn't received a review yet: #333

Sure, whatever my expertise on the matter is worth, I will take time to review it Thursday morning

I haven't had time to think about the consequences of doing this change. tower-http's tracing stuff is already complicated enough as is, so I'm not sure if we wanna add more stuff to it. Could be, not sure shrug

I just had a look at what's actually being proposed here and have to say I'm not a fan. This isn't just about the change itself though, I had previously only used make_span_with and none of the on_ methods of TraceLayer and this is now the first time I'm looking at those in detail.

Yeah, I agree the implementation is complex, at least the type signature is scary.
As you mention @davidpdrsn, this is mainly due to the number of callbacks and that the implementation choose to be generic over every single one of them independently. It can be more "simple"/less scary just by merging all on_xx callbacks into a single trait and letting the end-user implements or not the callback he wants, or letting him Arc<Box<>> stuff inside its implementation.
But that's a breaking change in the API, and I may be a bit too hasty in my simplification.

Regarding the PR, the code only introduces into each callback trait an "adapter" that must know the request type, which is generic over its body. So basically most of the code is some glue to forward the type of the Body into every single callbacks type, and as there is one trait for each ¯\(ツ)
Most of the code is for passing another parameter type to allow user to see the request type in the adapter

@erebe
Copy link
Contributor Author

erebe commented Mar 9, 2023

I can try to make something simpler if you are OK with breaking changes.

Regarding the consequence of this change, I am biased, but I don't see further evolution required in the future.
With that in each on_xxx you have access to the full life cycle of the request, so I doubt there will a need for more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants