Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract pdata into a separate Go Module #3104

Closed
9 tasks done
tigrannajaryan opened this issue May 5, 2021 · 13 comments · Fixed by #3530
Closed
9 tasks done

Extract pdata into a separate Go Module #3104

tigrannajaryan opened this issue May 5, 2021 · 13 comments · Fixed by #3530
Assignees
Labels
area:pipeline priority:p1 High release:required-for-ga Must be resolved before GA release

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented May 5, 2021

Problem Description

pdata is currently part of the Collector core. It makes difficult/impossible to use it in external repositories which need a way to work with OTLP in-memory data which is fast, is stable API and is compatible with Collector codebase and is reusable in the Collector codebase.

Examples of how this creates a problem:

Suggestion

Extract pdata into a separate Go Module. Either keep in this repository or make it a completely new repository. This should also include Gogo protobuf generation (which is internal currently and will remain internal in the new pdata module).

UPDATE (based on @jacobmarble comment below). We need to make sure the new module exports the gRPC server/listener as public symbols so that users of this new module can implement OTLP/gRPC protocol.

Proposed plan

  • Determine where the code lives. If we keep it in this repository we can make the vanity URL to be go.opentelemetry.io/collector/model/pdata but not go.opentelemetry.io/model/pdata. Otherwise a new repo is needed or share the go-proto repo.
  • Transform goldendataset for traces to not depend on protos. #3229
  • Define interfaces for encoder, translator, see Introduce model translation and encoding interfaces #3200 for WIP proposal.
  • Change otlp Binary/Text/JSON encoder to implement the new encoding APIs, use them in fileexporter.
  • Remove helpers for [To/From]ProtoBinary frompdata.Traces.
  • Avoid using generated gRPC code for Client because they depend on generated protos.
  • Avoid using generated gRPC code for Server because they depend on generated protos.
  • Avoid using generated gRPC gateway code in OTLP receiver. Use the new APIs for encoder/translator with a plain http server or utils from gRPC gateway.
  • Change OTLP tests to not use the protos directly, instead to compare pdata.
@tigrannajaryan
Copy link
Member Author

FYI @jacobmarble if we do this you can depend on the new module in your https://github.com/influxdata/influxdb-observability/tree/main/otel2influx and avoid unnecessary serialization/deserialization.

@jacobmarble
Copy link
Contributor

FYI @jacobmarble if we do this you can depend on the new module in your https://github.com/influxdata/influxdb-observability/tree/main/otel2influx and avoid unnecessary serialization/deserialization.

@tigrannajaryan spot on! I would love to depend on pdata instead of the generated protobufs directly, as long as I can also implement a gRPC listener/client (outside of otelcol/otelcol-contrib) with the same package.

@appdashishmehta
Copy link

appdashishmehta commented May 6, 2021

We, at AppDynamics, recently stumbled on this need too and it would greatly help avoid unnecessary serialization/deserialization. And in order to avoid code duplication we decided to convert OTLP > pdata > OTLP when the data cross boundaries but this introduces latencies in processing.

@jacobmarble
Copy link
Contributor

@tigrannajaryan does this issue block #3323 ? I would like to merge #3323 as-is, then refactor influxdbexporter, influxdbreceiver, and the corresponding Telegraf plugins, in a separate work unit.

@tigrannajaryan
Copy link
Member Author

@tigrannajaryan does this issue block #3323 ? I would like to merge #3323 as-is, then refactor influxdbexporter, influxdbreceiver, and the corresponding Telegraf plugins, in a separate work unit.

No, it does not. We can refactor later.

@alolita
Copy link
Member

alolita commented May 12, 2021

Is this a must-have for GA? Can this be moved to the phase 2 backlog? @tigrannajaryan please advise.

@tigrannajaryan
Copy link
Member Author

This is a must-have since it part of the public API used by components.

@bogdandrutu
Copy link
Member

@tigrannajaryan can you help me with a proposed full "package" name for this?

  • go.opentelemetry.io/model/pdata? - Allows us to have other interfaces like Marshaller" in a go.opentelemetry.io/model/pencoding or whatever name.
  • go.opentelemetry.io/pdata? - Simpler. Where would a Marshaller exists? Maybe go.opentelemetry.io/pdata/encoding

Please help me pick a winner.

@tigrannajaryan
Copy link
Member Author

@bogdandrutu I incline towards go.opentelemetry.io/model/pdata. This way we can also have go.opentelemetry.io/model/proto in the future which is an vanity url for github.com/open-telemetry/opentelemetry-proto-go. Anything else data model related can also go under go.opentelemetry.io/model/ as you say.

@Aneurysm9
Copy link
Member

@bogdandrutu I incline towards go.opentelemetry.io/model/pdata. This way we can also have go.opentelemetry.io/model/proto in the future which is an vanity url for github.com/open-telemetry/opentelemetry-proto-go. Anything else data model related can also go under go.opentelemetry.io/model/ as you say.

We've already got go.opentelemetry.io/proto/otlp for github.com/open-telemetry/opentelemetry-proto-go. I thought that using model in package names was generally discouraged, though perhaps I'm interpreting that too broadly and @rakyll only means to say that there shouldn't be a model package that contains all model types.

I'm somewhat wary of introducing gRPC client/listeners based on pdata as it feels like at that point it begins to encroach on the territory of OTLP. Do we really want two different ways to communicate the same data? Will the collector at some point expose a pdata receiver? Will telemetry sinks or SDKs eventually be expected to support both OTLP and pdata?

@bogdandrutu
Copy link
Member

bogdandrutu commented Jun 7, 2021

pdata and otlp are the same except the internal representation. On the wire they are exactly the same, the reason to have pdata is because we want internally to generate faster more optimal code than what proto generates.

So the answer is on the wire will always be OTLP, but internally you have a choice to represent things as generated protobufs or pdata which we aim to make it faster marshaling/unmarshaling

bogdandrutu added a commit that referenced this issue Jun 10, 2021
tigrannajaryan pushed a commit that referenced this issue Jun 14, 2021
tigrannajaryan pushed a commit that referenced this issue Jun 14, 2021
@tigrannajaryan
Copy link
Member Author

@jacobmarble @appdashishmehta this is now done, we have https://github.com/open-telemetry/opentelemetry-collector/tree/main/model module with pdata package.

It would be great to validate that this works for your use case before we freeze the 1.0 API as part of the planned GA release coming soon. If you have any feedback please do let us know.

@jacobmarble
Copy link
Contributor

jacobmarble commented Jul 19, 2021

@tigrannajaryan github.com/influxdata/influxdb-observability now uses github.com/open-telemetry/opentelemetry-collector/model/pdata instead of its own generated protos. The update PR for otelcontribcol is open-telemetry/opentelemetry-collector-contrib#4277

The only trouble I have is that I can't reliably compare two objects from the pdata package because there is no Sort method on the generated *Slice structs. I have a proposed fix in PR #3671

Additionally, I used package otlpgrpc to create a gRPC service in Telegraf. That went very smoothly, I don't have any feedback for you there. influxdata/telegraf@c35eb48?file-filters%5B%5D=.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:pipeline priority:p1 High release:required-for-ga Must be resolved before GA release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants