-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support user mode stacktraces in events #2175
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Welcome! Thanks for taking the time to add this, I started the CI, and some checks might be worth a look to see what's wrong. I'll try to review your PR quickly! The only thing that immediately came to my mind is that we would break compatibility if we changed the name of the previous stack trace thing to kernel stack trace, but that might make sense to do so. Let me take a look and we'll see. |
I'm trying to fix vmtests. But I'm lack of logs. I think, the problem is that there is no gcc on VM image and I couldn't compile test binary. @mtardy, do you have any ideas how to handle this in the right way? |
Yes, so usually, the C programs we need for testing are located in You should find full logs there https://github.com/cilium/tetragon/actions/runs/8132145926?pr=2175 but I don't remember if there's more than what's displayed. |
e06107e
to
f34a301
Compare
Thanks a lot, tests look good now so it's mostly on me! Again I'll do a proper review soon :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for doing this. I tried it and took a look, there is just a small issue with the fact that the BPF helper might return a lot of errors and I just need to check why and what we should do with that. Also, another thank you for adding proper testing :)!
Most questions are about retro-compatibility and deps addition choices. I think it's a nice opportunity that tonight we have the first Tetragon community call so I would love to bring this to the discussion. If you can join that would be lovely, otherwise I'll write the conclusion of the discussion here anyway. See https://isogo.to/tetragon-meeting-notes.
Again thank you, this does look very cool on dynamically linked binaries, especially when symbols are not stripped, it gives awesome observability!
Thank you for review and invitation! I'll try to take a part in the meeting where we can discuss the PR details! I will answer on comments soon). |
Could you |
288c439
to
6e0d5e8
Compare
6061323
to
313c0d4
Compare
Hi, @mtardy! |
Hey, sorry last week was KubeCon week in Paris and I was busy there. Not really, I've been chatting about the EFAULT error and there's not much we can do, from time to time, we'll indeed get a page fault when retrieving the trace from the kernel side and we can't avoid it. The best thing we could do is to warn for this in the doc (with some I think your PR is ready for me to take a last look and merge, even if some stuff is missing I'll take the time to add them after (like the doc thing I mentioned), I don't want you to wait longer again. |
One thing we could do is to keep the |
Those two comments are mostly nits, if your PR passes the tests, we can consider merging and fixing the flag and the docs after. |
the check is broken currently on forks but could you bump the version here
|
it seems that you might need to generate properly some files with |
You mean to keep |
Yes, let's do this way! |
It's a good point, but I don't understand why EFAULT is happened sometimes. I only have some ideas, but they might be wrong. Anyway some |
I think, it is good to squash all commits in one and write some commit description. I can do when PR will be camera ready or it could be done while merging. |
I think you could squash some things but please keep the separation for, for example:
|
ah indeed. Maybe yes that would be a good idea to not break compatibility. |
I increased test duration for 2 times (80m), it doesn't help... If you have time to debug this problem on ARM machine, it will be great. I have no ARM machine, unfortunaty, but I'll try to find an ARM VM somewhere). |
11be7de
to
76d6c19
Compare
I've tested the test that I've added on ARM VM it works:
I use qemu without kvm. Maybe the problem is because I added one more test? I ran
But it doesn't hangs... So I think, the hang problem is more related to ARM CI, not to ARM tests themself. Also, I tested on Arm VM ./pkg/sensors/tracing only and it works fine:
|
76d6c19
to
436774e
Compare
@mtardy, maybe let's test only |
2004ecb
to
64aa981
Compare
I converted to draft since it seems your added test is the issue on arm64, it seemed you could not reproduce this on arm64 machine, I'll try as well on my side and see if I can get any idea. It was failing on actuated machines:
We can ssh into them if really needed if we don't find any way to reproduce this since it seems to reliably fail there. |
My ARM machine:
|
3ae6621
to
68857a5
Compare
314852f
to
d48f6e8
Compare
d48f6e8
to
2de28b1
Compare
return fmt.Sprintf("%s (%s+0x%x)", fsym.Name, fsym.Module, fsym.Offset) | ||
} | ||
|
||
// GetFnSymbol -- returns the FnSym for a given address and PID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be documented that this function is limited to native code only (C, C++, Go and Rust). The given approach will not work for interpreted languages (Java, Python, Perl, etc), as it can not resolve the internal state of the interpreter.
Signed-off-by: Andrey Fedotov <[email protected]>
Signed-off-by: Andrey Fedotov <[email protected]>
Signed-off-by: Andrey Fedotov <[email protected]>
Signed-off-by: Andrey Fedotov <[email protected]>
2de28b1
to
06c385a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your perseverance! That's a cool addition to the stack trace feature!
Hi 👋!
I was inspired by #1429 and decided to add the ability for collecting user mode stacktraces. User-mode stacktraces allow you to enrich information about events and understand why this event occurred. For example, this information will help you understand why the seccomp policy is triggered.
Please, have a look, I'm ready to make changes if needed. Thanks!