Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it ok for http.scheme to factor in forwarded "proto" field / x-forwarded-proto headers? #2338

Closed
trask opened this issue Feb 11, 2022 · 7 comments
Assignees
Labels
area:semantic-conventions Related to semantic conventions semconv:HTTP spec:trace Related to the specification/trace directory

Comments

@trask
Copy link
Member

trask commented Feb 11, 2022

What are you trying to achieve?

Reduce user confusion/panic about why we report a server request as http when it was "really" https.

This happens commonly when terminating SSL in front of the process that is capturing telemetry.

I'd like to respect the forwarded/x-forwarded-proto headers when capturing http.scheme.

But if that is not acceptable, it would be helpful to introduce a new attribute to capture the forwarded "proto" field / x-forwarded field so that backends can decide to display that instead if they prefer.

@blumamir
Copy link
Member

Reduce user confusion/panic about why we report a server request as http when it was "really" https.

This happens commonly when terminating SSL in front of the process that is capturing telemetry.

I think other users can "panic" if they see that their servers behind the API gateway are accessed with https and not http. In my opinion, it is best to report what actually happened as it is easier to understand and implement.

If the gateway itself is instrumented, that could result in client/server couple where one end is http (gateway) and the other is https (application)

But if that is not acceptable, it would be helpful to introduce a new attribute to capture the forwarded "proto" field / x-forwarded field so that backends can decide to display that instead if they prefer.

I support recording this data. Since we already have the http-request-and-response-headers speced and implemented, it could make sense to capture it to http.request.header.x-forwarded-proto. To my understanding, users can already do that via instrumentation library configuration but it also makes sense to me to make it recorded by default to reduce new users' friction.

@trask
Copy link
Member Author

trask commented Jan 31, 2023

maybe http.client_scheme similar to http.client_ip which is defined as:

The IP address of the original client behind all proxies, if known (e.g. from X-Forwarded-For).

This is not necessarily the same as net.sock.peer.addr, which would
identify the network-level peer, which may be a proxy.

This attribute should be set when a source of information different
from the one used for net.sock.peer.addr, is available even if that other
source just confirms the same value as net.sock.peer.addr.
Rationale: For net.sock.peer.addr, one typically does not know if it
comes from a proxy, reverse proxy, or the actual client. Setting
http.client_ip when it's the same as net.sock.peer.addr means that
one is at least somewhat confident that the address is not that of
the closest proxy.

@blumamir
Copy link
Member

maybe http.client_scheme similar to http.client_ip which is defined as:

The IP address of the original client behind all proxies, if known (e.g. from X-Forwarded-For).
This is not necessarily the same as net.sock.peer.addr, which would
identify the network-level peer, which may be a proxy.
This attribute should be set when a source of information different
from the one used for net.sock.peer.addr, is available even if that other
source just confirms the same value as net.sock.peer.addr.
Rationale: For net.sock.peer.addr, one typically does not know if it
comes from a proxy, reverse proxy, or the actual client. Setting
http.client_ip when it's the same as net.sock.peer.addr means that
one is at least somewhat confident that the address is not that of
the closest proxy.

This makes sense as well.

I think the benefits of recording it into the http.request.header namespace are:

  • It prevents a duplicate recording of the value if the user is also recording the headers.
  • It is consistent with the source of this data, which records an http request header

Recording it to the http.client_scheme is:

  • more friendly to end users since it's shorter
  • not technical - does not require an understanding of the HTTP headers meanings (e.g. x-forwarded-proto is less understandable than client_scheme )

@trask
Copy link
Member Author

trask commented Jan 31, 2023

one concern about http.request.header.x-forwarded-proto is that there's also http.request.header.forwarded, which requires further parsing to extract the "protocol" (e.g. ForwardedHeaderParser.java)

@trask trask moved this to Blocker for stability in Spec: HTTP Semantic Conventions Feb 1, 2023
@blumamir
Copy link
Member

blumamir commented Feb 1, 2023

one concern about http.request.header.x-forwarded-proto is that there's also http.request.header.forwarded, which requires further parsing to extract the "protocol" (e.g. ForwardedHeaderParser.java)

But the parsing still needs to be done, right? It's just being done at instrumentation time vs doing it at collector / backend / viewer on demand

@trask trask assigned lmolkova and unassigned trask Feb 8, 2023
@lmolkova
Copy link
Contributor

lmolkova commented Feb 10, 2023

Some thoughts here:

each header requires explicit enablement to be populated, so duplication would only happen if user needs the contents of the forwarded header to get other information than protocol.

parsing seems to be one pass over the string and in the worst case getting a substring. It can be optimized to do nothing at all when forwarded proto matches http.scheme. Also parsing for component seem necessary anyway.

So, neither duplication nor parsing don't seem like a problem in this case.

Assuming we introduce an attribute for forwarded protocol, we might eventually introduce attributes for other forwarded header (or x-forwarded-*) components:

  • by list of node identifiers (same as in for component)
  • host
  • for - this one has different semantics than x-forwarded-for header - it's a list of node identifiers which could be IPs, IP:port, or an obfuscated name.

So I suggest to remove http.client_ip entirely and come up with a better attribute set that would be aligned with forwarded RFC terminology. ECS network.forwarded_ip seems to be invalid too.

namespace depends on results of ECS conversation, but I'd suggest

  • ?forwarded.for,
  • ?forwarded.proto,
  • ?forwarded.by,
  • ?forwarded.host

We don't need to spec all of them out right away, but probably just start with ?.forwarded.for, ?.forwarded.proto and include others later (or add them as optional attributes).

@trask
Copy link
Member Author

trask commented Apr 25, 2023

I think we have resolution on the question asked by this issue, so I have created a new issue specifically about the remaining question: Where to capture HTTP forwarded proto (original url.scheme)?

@trask trask closed this as completed Apr 25, 2023
@github-project-automation github-project-automation bot moved this from Blocker for stability to Done in Spec: HTTP Semantic Conventions Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions semconv:HTTP spec:trace Related to the specification/trace directory
Projects
5 participants