Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to read trace id from goroutine labels of Go processes #2574

Merged
merged 1 commit into from
Mar 4, 2024

Conversation

brancz
Copy link
Member

@brancz brancz commented Mar 1, 2024

Why?

It's already possible to attach trace ids to profiling data with instrumented profilers, but it would be great if we could also create the connection between distributed tracing and profiling data with collection from eBPF. Well this is exactly that.

With the trace ID added to profiled stacks it's possible to view all profiling data of a request in one flamegraph/iciclegraph, which makes it much easier to identify bottlenecks of a request than having to view it on a per process basis.

What?

Attempt to read the otel.traceid goroutine label from Go processes. Plus a flag to enable it, by default this will be disabled as it significantly increases the amount of data produced.

How?

Go processes store the current goroutine in thread local store. From there this reads the g (aka goroutine) struct, then the m (the actual operating system thread) of that goroutine, and finally curg (current goroutine).

This chain is necessary because getg().m.curg points to the current user g assigned to the thread (curg == getg() when not on the system stack). curg may be nil if there is no user g, such as when running in the scheduler.

Test Plan

Tested with a test binary.

Screenshot 2024-03-01 at 10 11 34

Copy link
Contributor

@gnurizen gnurizen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

typedef __s32 s32;
typedef __u32 u32;
typedef __s64 s64;
typedef __u64 u64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we repeat all these things defined in basic_types.h?

Copy link
Member Author

@brancz brancz Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whops, yes! A leftover from my standalone development.

}
u64 bucket_count = 1 << log_2_bucket_count;
void *label_buckets;
res = bpf_probe_read(&label_buckets, sizeof(label_buckets), labels_map_ptr + 16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we declare a map struct and collapse these 3 reads into one? Would probably turn a 8 byte read, followed by 1 byte read, followed by 8 byte read into a memcpy I'm guessing? Probably a wash performance wise but maybe smaller code? Feel free to file under "thing for Tom to investigate".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be possible to optimize, but the bpf stack is only 512bytes, which already caused me to have to move one of these to be stored in a per-cpu bpf map instead

unwind_state->stack_key.trace_id[12] = 0;
unwind_state->stack_key.trace_id[13] = 0;
unwind_state->stack_key.trace_id[14] = 0;
unwind_state->stack_key.trace_id[15] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think a bzero or memset call would be more efficient here but I haven't investigated it.

@brancz brancz force-pushed the traceid branch 2 times, most recently from 80a2677 to 98af3a6 Compare March 2, 2024 21:03
@kakkoyun kakkoyun merged commit 6ce64ff into main Mar 4, 2024
11 of 12 checks passed
@kakkoyun kakkoyun deleted the traceid branch March 4, 2024 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants