-
Notifications
You must be signed in to change notification settings - Fork 164
Data Classifications for resources and attributes #187
Data Classifications for resources and attributes #187
Conversation
2083903
to
b3f2f54
Compare
This seems related to this spec issue open-telemetry/opentelemetry-specification#2114 At a glance, classifications like sensitive, high-cardinality do seem more useful when creating attribute views than just optional and required. /cc @lmolkova |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few high level thoughts:
- the word "resource" has specific meaning in OTEL, you're overloading it by saying "resource classification". You are classifying attributes, not resources.
- I think the classifications can be part of the schema files that we already support, I don't see why std attributes need to be classified in user code.
- I'm fairly certain there will be big pushback on making this an SDK feature, why not do it in the collector where it can be implemented once?
- before an OTEP like this is accepted there needs to be a working prototype with benchmarks
Both are being classified here, attributes have their own classification along with their the resource sending it. The reason being is that if you can not allow resources with PD or UGC to be exported to a vendor you can drop the resource or remove the attributes that contain PD or UGC. Classifying the resource is important because it does not require searching / validating attributes or other fields of the resource.
I don't see an easy pathway to success by doing this, and I maintaining it within the schema (semantic convention) alone would create a lot of friction trying to release changes in it.
Some organisations can not allow for PD, UGC to leave the application so only implementing in the collector already restricts those users from adopting.
Using Golang's benchmark, I had got the following results:
Simply performing a range over an attribute collection of size 100 was However, using Resource classification filtering, it can filter out resource values sub nanosecond times. |
By the end of the day, I'll publish my mini prototype so others can play with themselves |
Are you sure? Not only does not everyone use the collector but that would still mean sending the PII data to the collector. |
+1 to this. I think it is worth entertaining this alternate to see what the tradeoffs are. The SDKs already contain pre-built dictionaries of standard attributes (in Go SDK it is in the
For non-standard attributes it may still be necessary to do what this OTEP suggests. I don't know how important it is though, perhaps standard attributes cover the majority of cases where classification matters (maybe not, I may be wrong).
On the other hand by NOT making it part of the semantic conventions there is a big risk of having inconsistent behaviour in different SDKs, where each instrumentation author makes their own classification decisions. |
Maybe there could be sensible defaults which can be overridden. The semantic convention, or the instrumentation author can't know which attributes will be sensitive because it changes between applications. For example, |
As I mentioned earlier, here is the prototype I had mentioned https://github.com/MovieStoreGuy/data-classifier, it is a rather stripped down only preserving attributes and resource (without any of the values or nested fields).
Right, I think you're referring to the Classification Values here? To your point of:
In the scenario where you have mixed mode deployments of running compute nods, lambdas and users running experiments / tests locally. Not sending the classification would mean that the internal gateway / proxy collector that is forwarding the data can not filter known classifications that is not supported by the system being exported to. Assuming that application owners ignored classification filtering in their exporter, it would mean that resources or attributes that are can not be classified without impacting processing performance. Reactive classification is extremely computational expensive with a lot of operational overhead, proactive classifications reduces the need to categorise the data in order to filter it. Moreover, I mention within the OTEP that half the space requested (64bits) can be reserved for Vendors to enable advanced features like smaller retention windows, improve resource catalogs when filtering for telemetry (SLO resources would be more important that high cardinality resources for example). The vendor can extend their exporter to do additional actions when these classifications are set. Classifications can be limited to the application context and no included as part of the wire datagram, however, I see a lot value including it in order to help to build central means of ensuring resource data isn't sent to the wrong place.
Our current adoption of Open Telemetry in Atlassian requires heavy stream process in order to filter out the standard attributes that are high cardinality, and we also wanting to extend the semantic convention to support the idea of real user monitoring (RUM) which is currently not defined in the semantic convention so being able to add classification to those attributes and resources mean we can correctly experiment with the idea without having to add additional processing requirements into a our internal telemetry streaming service.
I agree, the values of the classifications has to be consistent across language and the semantic convention is the best place to define it. |
To @jamesmoessis point, classifications are coupled to the application deployment. Meaning that application sitting on the network edge would define |
Are those organizations prepared to commit significant engineering capacity to developing and maintaining this non-trivial capability in every language supported by OTel? On the other hand, a deployment of the main app with collector as a sidecar communicating over local network (i.e. in practice over local unix socket) - why would that be considered "leaving the application"? |
Hey @yurishkuro ,
I am not sure your intent behind this comment and I want to understand more :) I wanted to understand what part of this OTEP is non-trivial? I can see managing the classifications within the semantic convention and how the convention would adopt it to be non trivial and these can be expanded upon further here :) Per your commitment comment, if the change is as simple as I think then it shouldn't require a significant timeline. I am happy to take on making a collector extension to support this idea as that has been my main project that I have contributed to.
I am not certain that communicating via unix socket is supported by the collector? Let me check that :)
Once it has been sent via the application, it is no longer the application that has control of the data and that is where the concern is. |
b3f2f54
to
cb34c9f
Compare
I like the idea. I am, however, worried a little bit about metric aggregations, since attributes are part of the metric identity. Dropping them can require re-aggregation of data (e.g. open-telemetry/opentelemetry-specification#1297). I am not sure how easy or hard that is to add to existing metrics SDKs or if that would require a lot of additional processing on the collector side (@jmacd, @reyang, @jsuereth, do you have thoughts on this?) |
@pirgeo , I think you can get away with this without the need to re-aggregate. You can follow similar principle of MapReduce to get around it, meaning:
In short, I think this could simply be a decorator on the exporter or considered an optional extension. |
It looks like a simplified version of this has been started here #100 where this proposal incapsulates all data classifications. Perhaps that we should join these efforts? |
- this was no considered heavily since it misses USERS (developers) whom are using the SDK natively and then would need to manage their own Semantic Convention that would also need to be understood by the down stream processors and vendors | ||
- Does not account for version drift with a breaking change involved | ||
- Does a processor need to know all versions? | ||
- Does that mean values can not be changed in the semantic convention? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another downside of doing it in semantic conventions goes to your point above that different organizations will have potentially different classifications. Reaching a common agreement on classification for the semantic conventions would require lengthy discussion, with a good chance of many attributes not reaching a consensus decision on a classification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite this being one more thing people will end up learning in order to do instrumentation, I like this proposal, especially as it's optional for current and future users.
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
d7862aa
to
c226bbf
Compare
@MovieStoreGuy we are marking OTEPs which are no longer being updated as |
No worries, I don't have much bandwidth to work on this now but I would love to pick this up at some point. |
The idea behind this that not all telemetry data is the same with regards to importance, how it is processed, and what is supported by the down stream vendors.
This approach would allow for efficient checking of data to then apply various configuration that is required by your organisation.
Some sections are raw and left as dot points pending discussion to help expand upon them.