Add exporter data classes for experimental profiling signal type. #6374

jhalliday · 2024-04-10T14:29:37Z

First small step towards supporting the new experimental signal type for profiling!

This PR introduces data interfaces corresponding to the OTLP model as defined in
open-telemetry/oteps#239
open-telemetry/opentelemetry-proto#534

Until the latter is merged the wire format is still something of a moving target, but no major changes are expected so it should be safe to start reviewing this...

The code structure corresponds pretty much 1:1 with the .proto files, modulo some tidying up of field names for better readability. The easiest way to review is probably split screen the .java and .proto files - fields appear in the same order in both. The bulk of the work will likely be getting familiar with the new data model, the code itself its not complicated.

The javadocs are more or less copies from the .proto comments, many of which are in turn inherited from google's pprof.proto The OTel profiling SIG has an open task to better document the data model, after which another pass on the code will update the javadocs to match. For now they should be considered placeholders.

Some elements are @deprecated. This is unusual for fresh code! The situation arises because the new OTel model builds on google's existing pprof one, adding some fields and deprecating others. In order to allow 'legacy' style data to be written for compatibility, the data classes have methods for the older fields. An alternative design choice is not to support these, which should in theory allow all necessary data to be written for sending to OTel receivers, but would not allow for older style data expected by existing pprof tools. I've opted for more flexibility since there is little cost.

The profiling format is I think the first OTLP spec in which byte[] type fields appear as first class citizens, not merely as carriers for strings. How to model this in the Java API is an open question, given the design preferences around nullability and immutability. ByteBuffer isn't ideal, but may be preferred to raw byte[]. Protobuf uses ByteString, but that's more relevant to the Marshalers than the API - we don't really want a dependency on protobuf runtime classes at this level.

This foundation code leads on to the Immutable data classes, then the Marshalers. Those are mostly ready, but I prefer smaller pieces for review, so they will follow in later PRs. The Marshalers in particular will require the new proto PR above to be merged first anyhow.

linux-foundation-easycla · 2024-04-10T14:29:42Z

The committers listed above are authorized under a signed CLA.

✅ login: jhalliday / name: Jonathan Halliday (ae94404, a7af9cb, 3044610, f84df98, 64da240, a84e2f9, ffb73b9, 112796b, bda144f, 087ce89, 2fa0060, 55b8ab6, 4d31472, d6e7951)

codecov · 2024-04-11T14:53:42Z

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 90.86%. Comparing base (66cf1b6) to head (55b8ab6).
Report is 280 commits behind head on main.

Files with missing lines	Patch %	Lines
...exporter/otlp/profiles/AggregationTemporality.java	0.00%	3 Missing ⚠️
...ntelemetry/exporter/otlp/profiles/BuildIdKind.java	0.00%	3 Missing ⚠️
...y/exporter/otlp/profiles/ProfileContainerData.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #6374      +/-   ##
============================================
- Coverage     91.09%   90.86%   -0.24%     
- Complexity     5772     6168     +396     
============================================
  Files           627      681      +54     
  Lines         16852    18491    +1639     
  Branches       1720     1813      +93     
============================================
+ Hits          15351    16801    +1450     
- Misses         1006     1154     +148     
- Partials        495      536      +41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/MappingData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/AggregationTemporality.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/LinkData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/LocationData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ProfileContainerData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/SampleData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ValueTypeData.java

jkwatson · 2024-04-18T23:45:08Z

I'm not sure that mirroring the protos exactly is a great way to build out our SDK classes. We have the opportunity to build something better than raw protos that will apply specifically to java. I'd want to have a very strong reason to expose clearly mutable data (for example), and yet call it as Immutable.

jhalliday · 2024-04-19T09:53:32Z

I'm not sure that mirroring the protos exactly is a great way to build out our SDK classes. We have the opportunity to build something better than raw protos that will apply specifically to java. I'd want to have a very strong reason to expose clearly mutable data (for example), and yet call it as Immutable.

I think there are perhaps two separate issues here. I'll take them in reverse order:

I don't like byte[] in the API and don't expect it to be used in the final version, but before I can swap it for something better we'll need to agree on what that is. ByteBuffer would do, but isn't reliably immutable either. Then again nor is List, which is widely used in the API anyhow. At least it solves the Nullability problem which is distinct from the Immutability one.

The 1:1 mapping to the proto is absolutely horrible as a user API and I am sure it's not a great way to build such an API. That said, this isn't intended to be the user facing API. It's intended for feeding the Marshalers which encode the proto, and for that 1:1 produces the cleanest, least error prone code and maximises flexibility. As it gets more usage, we'll likely start finding patterns that are more user friendly and I expect the eventual public (as distinct from internal) API will look different. Designing that, I'd be concerned with masking the lookup table mechanism, so users can e.g. feed in a List<StackFrame[]> and have all the decomposing of that handled for them. That's likely to take the form of an SDK package that maps the user facing layer onto these Marshaler-facing internal data classes. It's possible that putting these interfaces where I have in the repository structure is misleading as to their intended use, but then again all other protos have more or less 1:1 structure like this, so from a uniformity and maintenance point of view it follows principle of least surprise. Profiling is perhaps the first proto where the wire structure is so significantly different from the optimal user API, so using the same classes for both purposes, as in other signal types, won't work so well. We'll just have to feel out what that means for the package structure as we go along I think.

jkwatson · 2024-04-22T15:01:14Z

If our goal is to provide a route into marshallers, maybe we should create a write-only API that facilitates this use-case, rather than constructing a proto-mirroring set of Java classes. 🤔

Let's discuss approaches at the SIG meeting this week.

jhalliday · 2024-04-22T17:03:20Z

If our goal is to provide a route into marshallers, maybe we should create a write-only API that facilitates this use-case

Marshallers read an object model and transform it into wire format data. This is that object model. It can't be write-only, because then the marshallers have nothing to read. It should be 1:1 with the wire format, so the marshallers don't have to handle complex transformations. This API is well designed for its intended use case. If you have a different use case, that's a different API.

jkwatson · 2024-04-22T20:15:50Z

If our goal is to provide a route into marshallers, maybe we should create a write-only API that facilitates this use-case

Marshallers read an object model and transform it into wire format data. This is that object model. It can't be write-only, because then the marshallers have nothing to read. It should be 1:1 with the wire format, so the marshallers don't have to handle complex transformations. This API is well designed for its intended use case. If you have a different use case, that's a different API.

If our Marshaller exposed a "write only API" to the producer of the profile, then the marshaller can do with the data as it will. There doesn't need to be an intermediate representation, necessarily.

jhalliday · 2024-04-23T10:15:02Z

If our Marshaller exposed a "write only API" to the producer of the profile, then the marshaller can do with the data as it will. There doesn't need to be an intermediate representation, necessarily.

True. However, all existing Marshallers, including the ones I already wrote for profiling, work the same way. This follows principle of least astonishment, ensuring that users/maintainers can reuse prior knowledge of the pattern when working with the new classes. It seems to me there would need to be a good reason for deviating from that. What is it?

jkwatson · 2024-04-24T02:11:06Z

If our Marshaller exposed a "write only API" to the producer of the profile, then the marshaller can do with the data as it will. There doesn't need to be an intermediate representation, necessarily.

True. However, all existing Marshallers, including the ones I already wrote for profiling, work the same way. This follows principle of least astonishment, ensuring that users/maintainers can reuse prior knowledge of the pattern when working with the new classes. It seems to me there would need to be a good reason for deviating from that. What is it?

The primary reason is highlighted by the proposal in this PR. Having an intermediate representation as ugly and non-ergonomic as this one seems like a very good reason to do things differently. The profiling signal is several orders of magnitude (at least) more complex than either metrics, logs or spans. I think it's worth considering a radically different approach.

A big question to answer: how many exporters are there going to be, realistically? If it's just OTLP, then I think skipping the intermediate representation would allow optimizations that would not be possible if it would be required to convert back and forth.

jack-berg

Key question is whether we should strictly reflect the proto representation or have a more user-friendly java representation, and leave the translation to the optimized proto representation to serialization layer. I think we should do the later, since its what we've done elsewhere.

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/AggregationTemporality.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/AttributeUnitData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ProfileData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ProfileContainerData.java

jack-berg · 2024-04-24T15:01:10Z

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ProfileContainerData.java

+ * @see "profiles.proto::ProfileContainer"
+ */
+@Immutable
+public interface ProfileContainerData {


I wonder if we need to differentiate between ProfileContainer and ProfileData. The proto seems to do it purely because they're extending pprof. I'm inclined to flatten this to a single class called ProfileData.

I think perhaps this is another instance of figuring out how we want to position the SDK to users, much like the deprecated methods question.
I can appreciate the 'OpenTelemetry is the One True Way' approach of providing only functionality that does things the Otel way, but... There may also be use cases where it is desirable to e.g. generate pprof format output for feeding into pre-existing tools that expect it. For such use, being able to handle ProfileData (i.e. pprof/pprofextended) without thinking about the OTel wrapper (ProfileContainerData) makes sense. I don't really want to have to construct and pass a ProfileContainerData around when I'm only interested in the ProfileData field of it. I'm therefore slightly in favour of retaining the greater modularity and flexibility here, as I don't see it having a particularly high overhead in either runtime or maintenance and indeed the code is perhaps even simpler with it in place e.g. marshallers are more loosely coupled.

I'm inclined to leave this as is now that the revised plan is to move these classes into the serialization package for use just by marshallers. A 1:1 between data classes and marshallers helps modularity and testing.

I don't really want to have to construct and pass a ProfileContainerData around when I'm only interested in the ProfileData field of it.

What's the difference? There should only really be one pprof serializer ever written for this. Doesn't seem to challenging for the developers of that one component to know that we don't differentiate between the ProfileContainer and ProfileData because we don't carry the same historical baggage as the proto message definitions.

I think perhaps this is another instance of figuring out how we want to position the SDK to users, much like the deprecated methods question.

Agreed. I like this a heuristic (or part of one anyway) could be that as long as its possible to do a pprof encoding of the data, we can be free to have some minor divergences from the proto representation.

A 1:1 between data classes and marshallers helps modularity and testing.

That could be true, but not obvious to me right now whether it is or isnt.

Overall, I don't feel strongly one way or another about this. I doubt this is the last word we'll have on the matter, so no need to split hairs. 🙂

There should only really be one pprof serializer ever written for this.

Hmm. There is an existing ecosystem of tools written to handle pprof files and retaining compatibility in the proto format is aimed at allowing us to take advantage of those. Thus I think there may be use cases where you want to export pprof to file, which means writing a ProfileData without the enclosing ProfileContainer.

So yes, technically still only one Marshaler for a ProfileData, but it has got two use cases - to be called as an intermediate node in a tree that's rooted at the ExportProfilesServiceRequest and works down via the ProfileContainer, which is how OTLP wire messages will get written, or to be called as a root node directly, which is how pprof files would get written.

I'd like to retain pprof file export capability at least through the experimental phase, as it's useful to be able to e.g. send into a pprof flame graph rendering tool to visually sanity check the encoding. If it's something we want to offer in the eventual stable API is a different question and even then the pipeline design may be such that it winds up being cleaner to create the full tree in all cases but ignore the container bits in the pprof file exporter backend.

I meant there should only be one pprof serializer - of course there is also the need for an OTLP serializer. My point was that removing the additional abstraction layer should only marginally impact the pprof serializer. And since it only needs to be implemented once, its not the end of the world if the developer who writes has to understand the difference between the opentelemetry-java in-memory representation and pprof.

I'd like to retain pprof file export capability at least through the experimental phase

Sure, I don't feel strongly one way or the other.

Changes addressing code review comments.

jhalliday · 2024-05-07T13:17:12Z

In term of relocating this to the same place the marshallers will eventually be, how about:
exporters/otlp/profiles/src/main/java/io/opentelemetry/exporter/otlp/profiles/data
or
exporters/profiles-otlp/src/main/java/io/opentelemetry/exporter/profiles/otlp/data

jack-berg · 2024-05-09T19:59:13Z

We don't have any other signals that are structured purely as an exporter, with exporter and data classes bundled together without any sdk artifact.

Looking at the opentelemetry-exporter-otlp artifact, we see packages:

io.opentelemetry.exporter.otlp
- http
  - logs - otlp/http logs
  - metrics - otlp/http metrics
  - traces - otlp/http traces
- logs - otlp/grpc logs
- metrics - otlp/grpc metrics
- traces - otlp/grpc traces

And over in the sdk packages (say opentelemetry-sdk-traces for examples), we see packages:

io.opentelemetry.sdk.traces - top level trace sdk
- data - trace data interfaces
- export - trace exporter interfaces
- samplers - sampler interfaces

(Side note - we could have done better with the SDK package design. Not sure what we gain by having dedicated packages for data, exporting, and samplers. But what's done is done.)

Can we preserve parts of both of these package structures coherently in a single OTLP profile artifact? Not exactly. The two package hierarchies diverge so we couldn't have a single root package for the java module system. Since this is an exporter artifact and not an SDK artifact, let's prioritize symmetry with the existing OTLP exporter artifact:

io.opentelemetry.exporter.otlp
- http
  - profiles - otlp/http profile
- profiles - otlp/grpc profile
- data - profile data interfaces
- export - profile exporter interfaces

I don't anticipate this being permanent, since by bundling data / exporter interfaces with the OTLP exporter, we limit any future exporter to take a dependency on the OTLP profile dependency. Minimally, I think we'd also be interested in a logging exporter at some point. But let's wait on that. Since the OTLP profile exporter will be the only thing dependent on the profile data / exporter interfaces, breaking them out into a separate sdk module will be minimally disruptive.

Changes addressing code review comments.

jack-berg

Couple additional comments.

...rters/otlp/profiles/src/main/java/io/opentelemetry/exporter/otlp/data/AttributeUnitData.java

jack-berg · 2024-05-16T18:24:54Z

exporters/otlp/profiles/src/main/java/io/opentelemetry/exporter/otlp/data/LinkData.java

+  String getTraceId();
+
+  /** Returns the trace identifier as 16-byte array. */
+  default byte[] getTraceIdBytes() {


Can we omit these and just have serializers know to reference OtelEncodingutils.bytesFromBase16?

Sure. No reason they have to follow the same pattern as the existing SpanContext API does.

jack-berg · 2024-05-16T18:28:20Z

exporters/otlp/profiles/src/main/java/io/opentelemetry/exporter/otlp/data/ProfileData.java

+  List<FunctionData> getFunctions();
+
+  /** Lookup table for attributes. */
+  Attributes getAttributes();


This breaks from the lookup table style we see elsewhere. Samples, Mappings, and Locations won't be able to reference this by index. Feels icky, but probably needs to be something like List<Map.Entry<AttributeKey<?>,?>>, or some other equivalent key/value pair holder.

Yeah, I was going to revisit that when we get to the Marshalers. Right now I think it does kind of work, because the Attributes can be iterated in order. However, the existing attributes API doesn't really guarantee that characteristic, it's essentially a side-effect of the present implementation. One option is to change Attributes to guarantee it, or to provide e.g. an asArray() type method that does, but otherwise yes it will have to be handled in the ProfileData API instead.
I think it's going to be a bigger headache for any user-facing API, where users will expect to be able to use Attributes more consistently. Hence having Attributes support ordering seems perhaps a better long term bet?

However, the existing attributes API doesn't really guarantee that characteristic, it's essentially a side-effect of the present implementation.

Yeah that's the problem. No guarantee.

I think it's going to be a bigger headache for any user-facing API, where users will expect to be able to use Attributes more consistently.

The access patterns for attributes currently allow you to get by key, or iterate over all. Iterating over all with Attributes.forEach reflects the common way that serializer need to interact with attributes, where order typically doesnt matter. Attributes.forEach allows for iteration without allocation. If order does matter, Attributes.asMap is an escape hatch.

One option is to change Attributes to guarantee it, or to provide e.g. an asArray() type method that does

I dont think we can guarantee it. Its possible to provide your own implementation of the Attributes interface. Guaranteeing the ordering would be retroactively adding a requirement to those implementations and is probably a violation of compatibility guarantees. The asArray option is akin to my suggestion of adding an accessor for List<Map.Entry<AttributeKey<?>,?>>. If its a problem unique to profiling, we might consider adding a helper AttributeKeyValuePair class only to the profiling signal. Not sure, but we can experiment with different things and see what works.

Changes addressing code review comments.

jack-berg

I think the lack of being able to reference attributes by index is a problem, but otherwise this seems like a good start.

jhalliday · 2024-05-21T10:36:19Z

I think the lack of being able to reference attributes by index is a problem, but otherwise this seems like a good start.

Yes, I think we've reached the point where we should probably just accept that there are imperfections that will need some adjustment as we start to use the interfaces (I'm already having some doubts on ByteBuffer...) and deal with those as we go along and have a bit more experience with how it works in practice. Thanks for the reviews.

Add SDK data classes for experimental profiling signal type.

a7af9cb

jhalliday requested a review from a team April 10, 2024 14:29

jhalliday added 3 commits April 10, 2024 15:42

Add SDK data classes for experimental profiling signal type.

3044610

Add SDK data classes for experimental profiling signal type.

64da240

Add SDK data classes for experimental profiling signal type.

ae94404