Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient instrumentation sets span name that is too broad #1277

Closed
andrzej-stencel opened this issue Sep 16, 2020 · 6 comments
Closed

HttpClient instrumentation sets span name that is too broad #1277

andrzej-stencel opened this issue Sep 16, 2020 · 6 comments
Labels
bug Something isn't working

Comments

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Sep 16, 2020

Bug Report

List of NuGet packages and
version that you are using (e.g. OpenTelemetry 0.4.0-beta.2):

  • OpenTelemetry.Instrumentation.Http 0.5.0-beta.2

Runtime version (e.g. net461, net48, netcoreapp2.1, netcoreapp3.1, etc.
You can find this information from the *.csproj file):

  • netcoreapp3.1

Symptom

When instrumenting HttpClient calls, the created span's name (Activity.DisplayName) is set to "HTTP {method}" (e.g. "HTTP GET" or "HTTP POST"), which is too broad according to current specification.

What is the expected behavior?

The name of the span created by HttpClient instrumentation should include the HTTP path, e.g. "/api/resources" (excluding the query parameters).

What is the actual behavior?

The current span name is "HTTP {method}", e.g. "HTTP GET" or "HTTP POST" etc.

Reproduce

See this simple repo: https://github.com/jedr/SpanNameInHttpClientInstrumentation

It was created more or less with the following commands:

dotnet new console --output SpanNameInHttpClientInstrumentation
cd SpanNameInHttpClientInstrumentation
dotnet add package OpenTelemetry.Instrumentation.Http -v 0.5.0-beta.2
dotnet add package OpenTelemetry.Exporter.Console -v 0.5.0-beta.2

All this code does is make an HTTP call using an instance of HttpClient, with HttpClient instrumentation plugged in and with ConsoleExporter added to see the output, like this:

using OpenTelemetry;
using OpenTelemetry.Trace;
using System.Net.Http;
using System.Threading.Tasks;

namespace SpanNameInHttpClientInstrumentation
{
    class Program
    {
        static async Task Main(string[] args)
        {
            using var tracerProvider = Sdk
                .CreateTracerProviderBuilder()
                .AddHttpClientInstrumentation()
                .AddConsoleExporter()
                .Build();

            using var httpClient = new HttpClient();

            var response = await httpClient.GetAsync("http://api.icndb.com/jokes/random");
        }
    }
}

The current output from this little program is:

Activity.Id:          00-ef15996d3af3f245b966adeca7732a86-71f906ac48697148-01
Activity.DisplayName: HTTP GET
Activity.Kind:        Client
Activity.StartTime:   2020-09-16T12:48:20.4903374Z
Activity.Duration:    00:00:00.5642352
Activity.TagObjects:
    http.method: GET
    http.host: api.icndb.com
    http.url: http://api.icndb.com/jokes/random
    http.status_code: 200
    otel.status_code: Ok
    otel.status_description: OK

but I would expect the Activity.DisplayName to be /jokes/random or GET /jokes/random rather than HTTP GET.

Additional Context

The ASP.NET and ASP.NET Core instrumentation libraries already do this here and here, respectively. I understand though that it's a bit different situation, with those libraries instrumenting incoming HTTP calls and being able to more easily strip out parts of the path that are too specific (like resource ID etc.). We don't have this comfort when instrumenting a general purpose outgoing calls library like HttpClient.

I've looked at how some other instrumentation libraries do this and came up with this little non-comprehensive summary (I'm far from expert in those libraries so please correct me if I'm wrong):

A "+" plus means the library sets span name to request path, a "-" minus means the library sets span name to HTTP {method}. Also noted if a library is client-side (outgoing calls) or server-side (incoming calls)

I'm happy to create a pull request for this, but would rather first know if you think it's a good idea and are willing to accept such change? Any feedback is more than welcome.

@andrzej-stencel andrzej-stencel added the bug Something isn't working label Sep 16, 2020
@eddynaka
Copy link
Contributor

Hi @astencel-sumo , looking at all the information that you posted, that looks right. Can you open a PR and add the proper tests to validate it?

We are using the HttpTagHelper to prevent some allocations.

@CodeBlanch , looking at that file, I think you were the one who created/updated it. Do you know if we have a motive to not to change it based on the spec?

@CodeBlanch
Copy link
Member

Hey @astencel-sumo! Thanks for taking the time to write this up. Maybe the most thorough issue I've seen! I'm going to have to disagree with you on it though. Check out Semantic conventions for HTTP spans from the spec.

It says specifically on this topic:

HTTP spans MUST follow the overall guidelines for span names. Many REST APIs encode parameters into URI path, e.g. /api/users/123 where 123 is a user id, which creates high cardinality value space not suitable for span names. In case of HTTP servers, these endpoints are often mapped by the server frameworks to more concise HTTP routes, e.g. /api/users/{user_id}, which are recommended as the low cardinality span names. However, the same approach usually does not work for HTTP client spans, especially when instrumentation is provided by a lower-level middleware that is not aware of the specifics of how the URIs are formed. Therefore, HTTP client spans SHOULD be using conservative, low cardinality names formed from the available parameters of an HTTP request, such as "HTTP {METHOD_NAME}". Instrumentation MUST NOT default to using URI path as span name, but MAY provide hooks to allow custom logic to override the default span name.

The way I read it, if you can get the route, you can do that. Something like /users/{userId} but if all you have is the path /users/1234 we should do HTTP GET instead.

Personally, I don't necessarily agree with the spec in this case. When I first got involved we used the path as this issue is suggesting. One day a PR came along and changed it to match the spec, and some of the users at my company were annoyed by that. But the spec is the spec 😄

It does say we can make this configurable. I've been meaning forever to add an option to control that, if you want to run with it feel free.

/cc @SergeyKanzhelev (we've talked about this before on PRs)

@andrzej-stencel
Copy link
Member Author

Oh thanks @CodeBlanch for the link to the other part of spec, totally missed it and it describes exactly what my concern was - going from "too broad" with just the HTTP method to "potentially too specific" with the request path. The spec is clearly on the side of "better too broad than too specific". I've dug up the issue behind this change in the spec and it gives even more sensible rationale - open-telemetry/opentelemetry-specification#270, also mentioning that it would be nice for the client instrumentation library to make it possible to override the span name creation. I think I'll try to see how this can be achieved with the HttpClient instrumentation.

Thank you both for your responses!

I wonder if something can be done to make sure issues like this are not raised anymore - add something to the documentation? Maybe if we have a way to customise the span name creation, we could describe it in the docs so that people coming here with similar problem have a solution right there in the docs.

@andrzej-stencel
Copy link
Member Author

It seems it is actually already possible to customize the span name by adding an ActivityProcessor.

I based my solution on this nice description in the ASP.NET Core instrumentation library: https://github.com/open-telemetry/opentelemetry-dotnet/tree/master/src/OpenTelemetry.Instrumentation.AspNetCore#special-topic---enriching-automatically-collected-telemetry.
I uploaded this solution to a branch on the sample repo: https://github.com/jedr/SpanNameInHttpClientInstrumentation/tree/customize-span-name

Here's the most interesting part of the code I added:

    public class HttpClientSpanNameEnrichmentProcessor : ActivityProcessor
    {
        public override void OnEnd(Activity activity)
        {
            var httpRequest = activity.GetCustomProperty("OTel.HttpHandler.Request") as HttpRequestMessage;
            if (httpRequest != null)
            {
                activity.DisplayName = httpRequest.RequestUri.AbsolutePath;
                Console.WriteLine($"✅ Found request in OnEnd, updating span name to {httpRequest.RequestUri.AbsolutePath}");
            }
            else
            {
                Console.WriteLine($"❌ Request not found in OnEnd");
            }
        }
    }

The output of this program is:

❌ Request not found in OnStart
✅ Found request in OnEnd, updating span name to /jokes/random
Activity.Id:          00-7a8f9484c0fae244ac14487c621677a2-d38b837555355e43-01
Activity.DisplayName: /jokes/random
Activity.Kind:        Client
Activity.StartTime:   2020-09-17T11:38:58.2239030Z
Activity.Duration:    00:00:00.5240410
Activity.TagObjects:
    http.method: GET
    http.host: api.icndb.com
    http.url: http://api.icndb.com/jokes/random
    http.status_code: 200
    otel.status_code: Ok
    otel.status_description: OK

Yay! 😀

Do you think I should add something to the documentation of the HttpClient instrumentation library, like an example on how to override the span name with an ActivityProcessor (similar to the ASP.NET Core docs I was basing on)?

@andrzej-stencel andrzej-stencel changed the title HttpClient instrumentation sets span name incorrectly HttpClient instrumentation sets span name that is too broad Sep 17, 2020
@cijothomas
Copy link
Member

@astencel-sumo There is a active PR to add document for HttpClient instrumentation https://github.com/open-telemetry/opentelemetry-dotnet/pull/1244/files

@andrzej-stencel
Copy link
Member Author

Thanks @cijothomas for the link to the PR. I'm not sure how to contribute to that, so I'll just paste a piece of docs about overriding activity name as would be useful to me and let you update the docs if/how you wish, if that's OK? Or please let me know how I can contribute.

To summarize, it occurs this issue is not really a bug as it the behaviour is according to spec. Also, the extensibility is there, so it's possible to override the activity name.

Thanks everyone again for helping me with this!

Piece of doc follows:

Example: Customize span name created by HttpClient instrumentation

The activities created by the HttpClient instrumentation library have generic names that only include the HTTP method of the request made (e.g. "HTTP GET", "HTTP POST"), and not other details, like the request path.
If you want to override the default activity name, it is quite easy to do so by adding an ActivityProcessor. See below example on how to do this.

internal class ActivityNameProcessor : ActivityProcessor
{
    public override void OnEnd(Activity activity)
    {
        var request = activity.GetCustomProperty("OTel.HttpHandler.Request") as HttpRequestMessage;
        if (request != null)
        {
            activity.DisplayName = request.RequestUri.AbsolutePath;
        }
    }
}

The custom processor must be added to the provider as below. It is important to
add the enrichment processor before any exporters so that exporters see the
changes done by them.

using Sdk.CreateTracerProviderBuilder()
    .AddProcessor(new ActivityNameProcessor())
    .AddConsoleExporter()
    .Build();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants