Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AspNetCore Instrumentation] Scrubbing of sensitive data in URLs #1747

Closed
dpk83 opened this issue Oct 20, 2021 · 19 comments
Closed

[AspNetCore Instrumentation] Scrubbing of sensitive data in URLs #1747

dpk83 opened this issue Oct 20, 2021 · 19 comments
Labels
comp:instrumentation.aspnetcore Things related to OpenTelemetry.Instrumentation.AspNetCore enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@dpk83
Copy link

dpk83 commented Oct 20, 2021

Feature Request

Make the instrumentation library more flexible to emit the URL information in a much more flexible manner to allow consumers to write a processor to scrub the sensitive information with minimal impact on performance.

Is your feature request related to a problem?
AspNetCore instrumentation library emits http.path and http.url tags containing the URL. These URLs can contains sensitive information e.g. something like, https;//www.contoso.com/users/[email protected]/chat/chat-12345

In this example this URL contains the user's id which is sensitive information that could lead to privacy issues. Currently the only way to scrub this information is to write a processor which would need to run complex regex on these URLs to detect and scrub sensitive information which takes a significant hit at performance.

Note: Similar problem exists for other instrumentation libraries and we would like to have similar support for those too.

Describe the solution you'd like:
I would like to propose the following

  1. Utilize the http.route, e.g. in the above example https;//www.contoso.com/users/{userId}/chat/{chatId}, and log the parameter values as separate tags i.e. in this case userId and chatId goes as tags.
  2. Either remove http.path and http.url (as the full URL can be built using the http.route and the parameter tags) OR expose options for consumers to disable adding these tags.

Another point worth discussing here: While in this solution consumers of the library will write their processors and the processors can be customized to have own redaction library and redaction mechanism, it would be good to see if there is a way to standardize on the redaction mechanism somehow.

Describe alternatives you've considered.

  • One alternative is to write a processor that runs complex regex on the URLs to detect and then scrub sensitive information. This is expensive in terms of performance so we would certainly want to avoid it.
  • Another alternative to write a separate custom instrumentation library where we can also avoid the processor and have all the required scrubbing directly inside the library.

Additional Context

Not leaking sensitive data is critical for our enterprise services so not scrubbing isn't an option.

@dpk83 dpk83 added the enhancement New feature or request label Oct 20, 2021
@reyang
Copy link
Member

reyang commented Oct 20, 2021

Related to #1791.

@reyang reyang added good first issue Good for newcomers help wanted Extra attention is needed labels Oct 20, 2021
@cijothomas
Copy link
Member

Exceptions stacktraces are also being recorded by default and may also contain sensitive information

Which instrumentation is recording this by default? Its explicit opt-in, like below:
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Instrumentation.AspNetCore/AspNetCoreInstrumentationOptions.cs#L53

@dpk83
Copy link
Author

dpk83 commented Oct 26, 2021

Oh, my bad. Apologies for false alarm. Deleting the comment to avoid confusion from the post.

@dpk83
Copy link
Author

dpk83 commented Oct 27, 2021

@cijothomas I created a new issue with a related feature ask #1794

@tvaintrob
Copy link

@cijothomas i would like to take this issue, can you assign it to me?

@cijothomas
Copy link
Member

@cijothomas i would like to take this issue, can you assign it to me?

Assigned. Could you create sub-issues and propose the exact changes? The behavior of instrumentations are based on the semantic conventions, (which are still experimental), so we need to be in-sync with semantic conventions changes.

@CodeBlanch
Copy link
Member

Some thoughts I have on this...

Utilize the http.route, e.g. in the above example https;//www.contoso.com/users/{userId}/chat/{chatId}, and log the parameter values as separate tags i.e. in this case userId and chatId goes as tags.

The issue with this is we don't always have the route. The route data is added after ASP.NET or ASP.NET Core runs its routing logic. That isn't always guaranteed to run (static files, middleware which completes before routing) and it isn't always guaranteed to match a route (404s, etc). In my own code, I add logic in the Enrich callback which looks for the "http.route" tag, and if it isn't found, I set the display name to either "unknown" or "unknown/{statusCode}". In areas of code that end the response before routing (ex /health), I manually update the span name and add an http.route tag to prevent my scrubbing from running.

Either remove http.path and http.url (as the full URL can be built using the http.route and the parameter tags) OR expose options for consumers to disable adding these tags.

I like the idea of adding an option to not add path/url but if we don't have a route (see above), those spans would be pretty useless. Maybe the option should be a callback user can specify to control the route resolution and tagging logic completely (if they so desire)?

@dpk83
Copy link
Author

dpk83 commented Nov 23, 2021

We found an alternate solution that doesn't require changes in the instrumentation library and can still achieve minimal performance impact which goes like this

  • In Enrich we capture the HttpRequest object and add it as a custom property on the activity
  • We then have our custom processor for redaction. In the OnEnd method of the custom processor we extract the HttpRequest object.
  • Remove tags from activity for url, path and target if they exists (as they are the ones containing sensitive data)
  • At this time route exists so we fetch the route and parameters, redact the parameters which needs to be redacted and then add route and redacted parameters as tags to the acitivity.
  • remove the HttpRequest object from activity which was added as custom property earlier.

Note: Regex aren't used for any of the processing to avoid any perf overhead.

@kaspertygesen
Copy link

@cijothomas
I would like to take this issue, but I would like to discuss the solution with you first.

Sorry about the wall of text 😅
Let me know if you prefer to discuss it on Slack or video.

Redating sensitive data from url.path / http.target:

The non-sensitive parameters in the route / template is replaced with the values from ActionContext.RouteData or HttpContext.Request.RouteValues. The sensitive parameters are left as {parameter} or replaced by REDACTED.
This can be done using string.Create() and span operations to ensure that we only allocate a single string.

Then the value of the tag url.path / http.target is replaced.

Example: /api/values/1 replaced with /api/values/REDACTED or /api/values/{id}

The casing of the route / template might differ from the requets path. Is this ok?
I haven't figured out an effective way to ensure consistent casing yet. The template can be lower cased without affecting the parameter values if that helps...
Example: /api/values/1 vs /api/Values/1

I have come up with 3 possible solutions all using the approach above.

1. Override url.path / http.target in HttpInListener if scrubbing is needed.

The parameters to redact can be configured using a string[] on AspNetCoreInstrumentationOptions.

When HttpInListener.OnMvcBeforeAction is executed we can compare beforeActionEventData.ActionDescriptor.Parameters to the list of parameters to redact and determine if any scrubbing is needed.

If scrubbing is needed we can redact sensitive data as explained above.

Notes:

  1. Is it ok to perform the redaction in HttpInListener or do you prefer something else?
  2. Is it ok to make the list of parameters to redact global?
  3. Easy to use.

2. Scrubbing processor

By injecting the IHttpContextAccessor into a processor we have all the data we need to redact sensitive information in the OnEnd method.

We can pass a list of parameters to redact to the processor when registered.

When executed the processor will redact sensitive data as explained above.

Notes:

  1. This approach doesn't complicate HttpInListener and it adds zero overhead when the processor is not added. Personally I like this solution the most.
    It is a little harder to use then 1. but I think that is ok.
  2. Is it ok to make the list of parameters to redact global?
  3. Should the processor be part of OpenTelemetry.Instrumentation.AspNetCore?

3. Enrich

We could provide a helper to make it easy to do effective redaction in the enrich step using the approach explained above.
This allows the users of the library to control when to do redaction.

All the data needed is already available during EnrichWithHttpResponse.

  1. If we want full flexibility for the user of the library then I guess this is a better than solution 2.
  2. Should the helper be part of OpenTelemetry.Instrumentation.AspNetCore?


Let me know if there is a better approach to this then I will investigate it further 😃

@cijothomas
Copy link
Member

By injecting the IHttpContextAccessor into a processor

Not recommended. IHttpContextAccessor needs to made thread-safe, likely locking and reducing throughput.

Should the processor be part of OpenTelemetry.Instrumentation.AspNetCore

Processor is an SDK concept, and instrumentations should only have API dependency.

Left couple of quick comments above.

I am not actively working on this, so tagging @vishweshbankwar to further help.

@vishweshbankwar
Copy link
Member

Related: #2191

@kaspertygesen Thanks for your interest in this. We are currently working on multiple changes on the instrumentation libraries. We have not evaluated this particular issue in detail. Once we do, I will reach out to you.

@nicolasgaraza
Copy link

Hi, is there any update on this?

Thanks

@julealgon
Copy link

Have you folks considered leveraging Microsoft.Extensions.Compliance.Redaction for this work?

Would be nice to have that abstraction being used vs creating something custom just for OTEL. As a bonus, it also promotes the standard redaction library and potentially pushes for more features in there (if something needed for OTEL is misssing for instance).

@kaspertygesen
Copy link

Have you folks considered leveraging Microsoft.Extensions.Compliance.Redaction for this work?

Would be nice to have that abstraction being used vs creating something custom just for OTEL. As a bonus, it also promotes the standard redaction library and potentially pushes for more features in there (if something needed for OTEL is misssing for instance).

Interesting but I am not sure a dependency like that will be accepted.
If I end up implementing this one I will evaluate it then.

@reyang reyang transferred this issue from open-telemetry/opentelemetry-dotnet May 13, 2024
@reyang reyang added the comp:instrumentation.aspnetcore Things related to OpenTelemetry.Instrumentation.AspNetCore label May 13, 2024
@julealgon
Copy link

@dpk83

These URLs can contains sensitive information e.g. something like, https;//www.contoso.com/users/[email protected]/chat/chat-12345

If you know the data is sensitive, why would you expose it in the URL in the first place?

@martinjt
Copy link
Member

martinjt commented Jun 2, 2024

The recommended way to redact is using the OpenTelemetry Collector. We now have redaction of the QueryString parameters.

Further, it's not recommended that sensitive data is in the path anyway since that ends up in Access logs for webservers.

I don't think there's anything more to do on this issue.

@martinjt martinjt closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2024
@abatishchev
Copy link
Contributor

@martinjt

We now have redaction of the QueryString parameters

Is there a way to opt out?

@CodeBlanch
Copy link
Member

Is there a way to opt out?

OTEL_DOTNET_EXPERIMENTAL_ASPNETCORE_DISABLE_URL_QUERY_REDACTION

PR: https://github.com/open-telemetry/opentelemetry-dotnet/pull/5532/files#diff-eef1a6fbb9155013c4a21ba1f3da8ade5f0ac7e08d23bd761575b4ecd70ac86aR36-R42

@abatishchev
Copy link
Contributor

thanks, @CodeBlanch! I asked and then found the corresponding environment variables myself.

I asked a question about them here: open-telemetry/semantic-conventions#860 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:instrumentation.aspnetcore Things related to OpenTelemetry.Instrumentation.AspNetCore enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests