-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit HTTP request method cardinality: original_method (option A) #17
Limit HTTP request method cardinality: original_method (option A) #17
Conversation
Option A (currently implemented in this PR)
Example:
Cons:
Possible solution: Different requirement levels for different signals
I.e. example of a valid request would be
|
Option B:
|
Option C: Single attribute
|
I picked this from A but was thinking of option C when I first read this. I agree, different signals have potentially different requirements but still the same field should be used. For a trace, only valid fields should be accepted / are part of the standard. For logs the same fields is more lenient but it is still encouraged, to be strict. With the above, we could get the best of both worlds. There is a way to express in the schema requirements for different signals but still optimise for bringing all the different signals together. It is then up to the storage engine / query engine of the different implementations to see how they deal with the mixed data if signals should be brought together. The part I'm most worried about A is:
I expect if we go with A, we will apply this solution in many different places. I'm not against having an additional field to store the original or canonical one if needed, but there must be a field where everything can be queried together.
@lmolkova I unfortunately couldn't join, so hopefully the above does not just repeat what was already discussed, otherwise please ignore. |
74f8313
to
c750799
Compare
Discussed during Semconv SIG meeting:
based on this, I updated the PR to make @ruflin do you think the following approach could work:
I understand it's still a breaking change for ECS users since the attribute is being renamed. It seems the alternatives listed above would not work in the long term once we take other signals, cardinality, and minimal duplication into consideration. |
I'm thinking of the sem-conv like an onion. In the center, we have the shared semconv across all signals. It is the most lenient version of it. Each layer of the onion can become more strict (but not more lenient). What this means, For me this adheres to the base principle mentioned above: "we should preserve attribute semantics consistency across signals". The main lines in the PR we are discussing are the following. The discussion is if it should be
If there is agreement in the group to proceed with the proposal, I don't want to block it. One of the reason is, that from my perspective the schema can become more lenient without a breaking change but not the other way around. So my proposed change could still be made later on. |
@ruflin I think there's a few points maybe not getting across from the semconv WG here. There are two MAIN uses cases in OTEL that need to be addressed here, and are different than ECS.
As such, in terms of design space, we CAN play with the following:
But the "layer of the onion" with strictness analogy doesn't really work here. I like the layer of the onion, but think about it in terms of common key-value that can be shared across signals, with each signal being ALLOWED to have more attributes if needed, and each "layer" of your o11y pipeline being able to add more attributes (e.g. otel collector enhancing telemetry with extra information). Given the need for metric cardinality concerns, I think |
I read through the original PR discussions and will try to summarize what I understood
I like the direction of option A here, except, and I quote/agree with @ruflin's comment on the original PR:
I would be OK with Then I'd like to maybe bring again the suggestion from @Oberon00 as I think it has good value and it was something that also came to mind while I was reading all. With this, we essentially introduce a new attribute and keep recording
Option D
Example:
Pros
Cons:
For the name of the "canonical" attribute, chatGPT suggested
|
I think Otel and ECS are perfectly aligned here. ECS was built to exactly allow these cross signal queries. It is more a question on how we approach it. @jsuereth Please let me know if I miss here some fundamental differences. For Elasticsearch the lenient approach works really well, because even in the data is mixed on ingest time ( If the "addition" of attributes is the path forward, I like Option D proposed by @joaopgrassi @lmolkova @jsuereth Sorry for the extra rounds on this one. But I think if we settle this it will become easy to apply the same logic to all the future changes by just referring to the discussion here. |
I think somebody else has mentioned this already but it depends on the particular HTTP implementation if Get is treated like GET or like an unknown method. In the latter case, it would be extremely confusing if instrumentations mapped it to GET. However, if the instrumentation has the knowledge whether the instrumented server maps it itself, or ideally even has access to a pre-normalized method, then it could do some sort of "smart" mapping. |
I'm strongly in favor of Option D. This addresses the concerns I raised in the previous PR about unexpected behavior for security users, but most critically I think it's also the ideal user experience when consuming this data. I hate the idea of users having to make decisions about which fields they query based on the value of other fields, so I like that if a user wants to query the original value then they use the base field, and if they want to accept some context loss in order to benefit from low cardinality, they use that field. In both cases, they just use the one field that makes sense and it just works. This does mean that we're duplicating data, but I think that should be acceptable if it's truly in the best interests of the overall user experience of consuming the data. Storage mechanisms like Elasticsearch could implement solutions to ensuring duplicate data like this does not incur twice the storage requirements overall - that many don't should be a factor to consider, but it shouldn't be the deciding factor IMO. |
As a comparison, I have implemented option B (which seems to be option D from @joaopgrassi proposal, but canonical method stays case-sensitive) in #34. PTAL and provide feedback or approve if you're onboard. |
Option E: not doing anything special - if we get a weird HTTP method "XYZ", just keep it. And here are the reasons:
|
@reyang it's probably fine for clients, but makes it easy to create some form of ddos attack on metrics when server does not sanitize it. Anyone can send unauthorized requests with random methods. Presumably, there are just a few points for an invalid method and they're not really interesting to anyone (i.e. do not skew any other data), how bad would it be to have a lot of (but quite short) time series? |
Like I explained in #17 (comment), metrics cardinality is a general problem - nothing specific to HTTP methods. I think we should tackle this issue in the metrics API/SDK space so it can lift all boats (e.g. #2960 is a great example / starting point). |
@reyang essentially what you're saying is Without it, cardinality limits would protect from DDOS attacks, but can make time series that got through less useful. |
Yes. I think this is how semantic conventions should be positioned:
The API/SDK spec should be positioned to solve the cardinality/reliability problem:
|
There was a general push-back to have different semantics for different signals. Dim-capping on metrics can be considered as such inconsistency. cc @jsuereth @arminru Still, I drafted it here: open-telemetry/opentelemetry-specification#35 with the caveat that I don't believe instrumentation should provide hints on HTTP methods since we have a standard and common 10 methods that are used in the absolute majority of requests. the drawback of this change is that span name is not quite as specific as it was before, but I don't see a huge harm even if the span name is of high cardinality for invalid requests. group-by, top-N queries should still work great |
69adeaf
to
c5ddb40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need a changelog for this now :)
ce7fa6b
to
0c77dee
Compare
@Oberon00 do you have any remaining concerns that you'd like to see addressed before merging? |
I guess we are aware that this will basically pre-determine the general "_original" naming convention and the "_OTHER" value that open-telemetry/opentelemetry-specification#117 asks for, as soon as HTTP is marked stable + a release is cut. I think that's fine from a priority perspective. |
…ditionally required
Co-authored-by: Trask Stalnaker <[email protected]>
Co-authored-by: Christian Neumüller <[email protected]>
Co-authored-by: Christian Neumüller <[email protected]>
68a7d63
to
35a44be
Compare
Moving in open-telemetry/opentelemetry-specification#3478.
Changes
http.request.method
values to well-known namesother
for unknown methods to prevent cardinality explosion if a malicious (or buggy) client sends custom and dynamic methods.http.request.original_method
There was a discussion on the original PR and at HTTP SIG meeting, the summary and options are in the comments.