Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface unwind failure reasons in status page #2649

Merged
merged 9 commits into from
Mar 28, 2024

Conversation

umanwizard
Copy link
Contributor

Each time we return from the eBPF program without a stack, bump a counter describing the reason for doing so. In user space, every time through the profiling loop, slurp these counters and surface them in the status page.

My hope is that this will be a first step towards helping us debug why some processes are not being unwound in some user environments.

Why?

We are repeatedly hearing from users that some processes are not being profiled in various situations and they don't know why. It's a bit hard to figure this out from just the /metrics output, because that is not broken down per process.

Test Plan

Tested by looking at the status page locally

Each time we return from the eBPF program without a stack, bump a
counter describing the reason for doing so. In user space, every time
through the profiling loop, slurp these counters and surface them in
the status page.

My hope is that this will be a first step towards helping us debug why
some processes are not being unwound in some user environments.
@umanwizard umanwizard requested a review from a team as a code owner March 27, 2024 12:07
return nil, err
}
var reasons profiler.UnwindFailedReasons
if err := binary.Read(bytes.NewBuffer(val), m.byteOrder, &reasons.PcNotCovered); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just create one reader and use it for all calls to binary.Read. Also I'd probably use reflection in a loop to set the values instead of hand writing each one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know Go supported reflection, that is definitely way nicer than what I have here (and also the string generation stuff in main.go). Will restructure.

@@ -35,6 +35,45 @@ type StackID struct {
TID PID
}


// TODO[btv]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think Go should be the source of truth and go generate should be used to generate a header. But others have probably trodden this path, I wonder what cilium/ebpf does.

u32 chunk_not_found;
u32 null_unwind_table;
u32 table_not_found;
u32 rbp_failed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name probably shouldn't be architecture specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep things as close as possible to the log messages / comments in the unwinder. We can clean up all of them at the same time in a separate PR, I think.

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loooooooove what this will enable!

@@ -195,6 +196,12 @@ func (p *CPU) ProcessLastErrors() map[int]error {
return p.processLastErrors
}

func (p *CPU) FailedReasons() map[int]profiler.UnwindFailedReasons {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just a quick comment that the key is the PID?

@@ -1032,6 +1036,25 @@ func run(logger log.Logger, reg *prometheus.Registry, flags flags, numCPU int) e
default:
profilingStatus = profilerStatusInactive
}
failedReasons := unwindFailedReasons[prflr.Name()][pid]
failedReasonsStrs := make([]string, 0)
v := reflect.ValueOf(failedReasons)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idiomatic thing in Go here is to list all fields individually, I realize what you're trying to do here, but I think most people reading this stumble over it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, are you suggesting I do it like I had in a previous commit (see here) ? Or are you suggesting something else?

I'm a Go novice, so I'm happy to do whatever you think is the most idiomatic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that would be the typical way to write this in Go.

v := reflect.ValueOf(reasons)
for i := 0; i < v.Elem().NumField(); i++ {
fv := v.Elem().Field(i)
if err := binary.Read(buf, m.byteOrder, fv.Addr().Interface()); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call a single binary.Read on the whole struct, no need for reflection here.

@brancz brancz merged commit b8876c4 into parca-dev:main Mar 28, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants