Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: Custom SDK #1045

Closed
wants to merge 71 commits into from
Closed

PoC: Custom SDK #1045

wants to merge 71 commits into from

Conversation

MrAlias
Copy link
Contributor

@MrAlias MrAlias commented Aug 27, 2024

PoC for #954

This is a proof-of-concept for an SDK fully implemented by the auto-instrumentation. This supports all span functionality:

  • Sampling (TODO: the sample method needs to be instrumented)
  • Random correct ID generation
  • All Start options
    • WithLinks
    • WithNewRoot
    • WithSpanKind (defaults to probe SpanKind if not set)
    • WithTimestamp
    • WithAttributes
  • The AddEvent method, including all options
    • WithStacktrace
    • WithAttributes
    • WithTimestamp
  • The AddLink method
  • The IsRecording method (TODO: based on sampling support)
  • The SpanContext method
  • The SetStatus method
  • The SetAttribute method
  • The TracerProvider method
  • All End options
    • WithTimestamp

Design

auto.GetTracerProvider

There is only one function exported publicly. This is GetTracerProvider in go.opentelemetry.io/auto.

This function returns a singleton instance of an opentelemetry-go trace.TracerProvider that is held in the internal/sdk package.

internal/sdk

The go.opentelemetry.io/auto/internal/sdk package is added. This is a "full feature" OTel trace SDK from the perspective of the Tracer and Span.

All data about any Span created will be built in userspace. This is stored (mostly) in the collector's ptrace.Traces type.

When the Span is ended the ptrace.Traces is marshaled into a proto binary encoding and passed as a buffer to the ended method of the Span. This method does nothing and is expecting a uprobe to be inserted at its call site.

auto/sdk probe

A simple probe is added to instrument the go.opentelemetry.io/auto/internal/sdk package. This probe does not rely on any offsets from the sdk types and simply routes the encoded span data from ended to the events eBPF map.

From there the ptrace.Traces data is unmarshaled and parsed into a SpanEvent that the Controller processes in the normal fashion.

Demo

Run Jaeger

$ docker run --rm --name jaeger -e COLLECTOR_OTLP_ENABLED=true -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest
2024/08/28 20:38:34 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
# ...

Run the example

$ cd examples/auto-sdk && go build -o $GOPATH/bin/example && $GOPATH/bin/example
outter-0...done
outter-1...

Run the auto-instrumentation

$ cd cli && go build
$ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  OTEL_GO_AUTO_TARGET_EXE=$GOPATH/bin/example \
  OTEL_SERVICE_NAME=example \
  sudo -E ./cli
{"level":"info","ts":1724885322.1967607,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:86","msg":"building OpenTelemetry Go instrumentation ...","globalImpl":false}
# ...
{"level":"info","ts":1724885324.8517134,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:115","msg":"instrumentation loaded successfully"}

You can also run with debug logging:

$ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  OTEL_GO_AUTO_TARGET_EXE=$GOPATH/bin/example \
  OTEL_SERVICE_NAME=example \
  sudo -E ./cli -log-level=debug

Let this run for a bit and then stop the example. Stopping the example while there is a span active means you will get an error. E.g.

go build -o $GOPATH/bin/example && $GOPATH/bin/example
outter-0...done
outter-1...^Cdone

(notice the ^C is before the second done)

Review the span

Overview

20240903_091218

Spans with recorded errors (via events)

20240903_091307

Span links

20240828_155637

Open Issues/Questions

  • A maximum span serialization size of 412 is only supported
    • Ways to increase eBPF storage past the stack limit (512) need to be investigated
    • When we know the span is going to be too big, we need to drop attributes, links, and events in userspace
  • Sampling needs to be implemented.
  • Fix call to bpf_probe_read: https://github.com/open-telemetry/opentelemetry-go-instrumentation/actions/runs/10605550069/job/29394558469?pr=1045
  • Currently the SpanEvent start and end times are relative offsets to the eBPF process time. This is changed in this PR, thereby breaking all other probes.
  • Do we want to use ptrace from the collector as the serialization format? Do we want to build our own?
  • This adds more uses of the bpf_probe_write_user. Can we use pinned eBPF maps to bypass this and communicate across processes?

@MrAlias MrAlias changed the title Add initial trace SDK PoC: Custom SDK Aug 27, 2024
internal/sdk/trace.go Outdated Show resolved Hide resolved
internal/sdk/trace.go Outdated Show resolved Hide resolved
internal/sdk/trace.go Outdated Show resolved Hide resolved
internal/sdk/trace.go Outdated Show resolved Hide resolved
internal/sdk/trace.go Outdated Show resolved Hide resolved
@RonFed
Copy link
Contributor

RonFed commented Aug 29, 2024

A maximum span serialization size of 412 is only supported

We can overcome this by using a BPF_MAP_TYPE_PERCPU_ARRAY which acts as a "heap". We are using this in multiple probes so you can have a look for example use cases.

Regarding my other comment, I think we should have this working example include an auto span in the same trace (HTTP).

trace.go Outdated
Copy link
Contributor Author

@MrAlias MrAlias Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be in it's own module. Otherwise there will be a module cycle when we import otel and it imports this.

dependabot bot and others added 23 commits September 18, 2024 14:06
…etry#1063)

Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…pen-telemetry#1057)

* Bump golang from 1.23.0 to 1.23.1 in /internal/test/e2e/databasesql

Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Bump expected test output

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
…-telemetry#1058)

* Bump golang from 1.23.0 to 1.23.1 in /internal/test/e2e/nethttp

Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Bump expected test output

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
open-telemetry#1065)

* Bump golang from 1.23.0 to 1.23.1 in /internal/test/e2e/nethttp_custom

Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Bump expected test output

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
* Update generated offsets

* Add changelog entry

---------

Co-authored-by: MrAlias <[email protected]>
Co-authored-by: Tyler Yahn <[email protected]>
Co-authored-by: Tyler Yahn <[email protected]>
…lemetry#1067)

* Bump golang from 1.23.0 to 1.23.1 in /internal/test/e2e/grpc

Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Bump expected test output

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
…pen-telemetry#1068)

* Bump golang from 1.23.0 to 1.23.1 in /internal/test/e2e/otelglobal

Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Bump expected test output

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
…emetry#1070)

* Bump golang from 1.23.0 to 1.23.1 in /internal/test/e2e/gin

Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Bump expected test output

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
…n-telemetry#1072)

* Bump golang from 1.23.0 to 1.23.1 in /internal/test/e2e/kafka-go

Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Bump expected test output

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
* Bump golang.org/x/sys from 0.24.0 to 0.25.0

Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.24.0 to 0.25.0.
- [Commits](golang/sys@v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Run go mod tidy

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
Co-authored-by: Tyler Yahn <[email protected]>
* Flatten config pkg into auto

* Update to use flattened pkg

* Unexport the NewNoopConfigProvider func

* Unexport unneeded sampling types
* Log the CLI version

To help debug issues, include information about the CLI version when
starting. Include the release version, a git hash, and information about
the Go setup.

* Add changelog entry

* Update Makefile

Build the cli pkg, not just main.go.
* Add gRPC Status object to offsets

* Add ClientConn_Invoke_Returns ebpf probe

* Add status code to span event

* make docker-generate

* Set grpc.status.code attribute

* Generate status code fixture

* Add changelog entry

* Generate error span in grpc test

* Switch to int32 and change offset logic

* Check argument 2 and use bpf_probe_read_user

* Add error struct to pointer chain

* Update verify.bats

* lint

* Fix span status

* Skip error checks if resp==nil

* Use u32

* Update changelog

* Add call to stop_tracking_span

* Make docker-offsets

* Update changelog for new offsets

* make fixture-grpc

* Update bats test
* Update generated offsets

* Add change log entry

---------

Co-authored-by: MrAlias <[email protected]>
Co-authored-by: Tyler Yahn <[email protected]>
* Add the auto-instrumentation SDK wireframe

Adds a new go.opentelemetry.io/auto/sdk module that holds the
OpenTelemetry SDK implementation used by auto-instrumentation.

Part of open-telemetry#954
Split from open-telemetry#1045

* Add dependabot entry

* Ignore the sdk module until it is finished

* Set Go mod go directive to 1.21.0

* Fix lint
…0.55.0 (open-telemetry#1096)

* Bump go.opentelemetry.io/contrib/exporters/autoexport

Bumps [go.opentelemetry.io/contrib/exporters/autoexport](https://github.com/open-telemetry/opentelemetry-go-contrib) from 0.54.0 to 0.55.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.54.0...zpages/v0.55.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/exporters/autoexport
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Run go mod tidy

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
…pen-telemetry#1085)

Bumps [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) from 1.29.0 to 1.30.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.29.0...v1.30.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…y#1086)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.66.0 to 1.66.2.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.66.0...v1.66.2)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…pen-telemetry#1087)

Bumps [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) from 1.29.0 to 1.30.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.29.0...v1.30.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…etry#1094)

Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.60.3 to 1.61.0.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](golangci/golangci-lint@v1.60.3...v1.61.0)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 27.2.0+incompatible to 27.2.1+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v27.2.0...v27.2.1)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
MrAlias added a commit that referenced this pull request Sep 18, 2024
* Switch SpanEvent times to use time.Time

Use `time.Time`, a timestamp, to identify the start and end time of the
`SpanEvent` instead of representing them as nanoseconds since boot time
in a built-in `int64`.

This is motivate by the need to support things that report timestamps
for start/end times directly. Things like the custom SDK (see #1045).

To support the conversion for probes, the `utils.BootOffsetToTime`
function is added. This converts between the measured nanoseconds since
boot-time that an eBPF program measures, and a timestamp.

* Update internal/pkg/instrumentation/utils/kernel.go

Co-authored-by: Ron Federman <[email protected]>

---------

Co-authored-by: Ron Federman <[email protected]>
@MrAlias MrAlias closed this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants