-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(event): export thread, process and parent entity id #3503
feat(event): export thread, process and parent entity id #3503
Conversation
4604f62
to
42f699c
Compare
Lets have the 3 IDs available ? This way we can always find ourselves in the proctree, and we can also find parent and leader for artifacts used and anything related. If you add it in this PR we can merge it before any other, and then I'll rebase #3498 and you will likely fix yours (datasource) as well. |
4febd87
to
88f3698
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlonZivony I believe there are some minor issues/things to be addressed (a padding, a .proto file change + make proto run). Everything else seems ok to me. Feel free to place the type change PR so we can merge that before this one.
Will push the changes after #3505 will be merged |
I have merged the type change, will upgrade the type version and adjust the grpc interface in another PR then you will be fine to rebase this one with the nit fixes. |
as the first commit of this PR, then the rest of the PR will take care of the type changes, likely. Don't forget the grpc code: $ git diff -p
diff --git a/pkg/server/grpc/tracee.go b/pkg/server/grpc/tracee.go
index 6f75cc929..958e92247 100644
--- a/pkg/server/grpc/tracee.go
+++ b/pkg/server/grpc/tracee.go
@@ -170,13 +170,14 @@ func getProcess(e trace.Event) *pb.Process {
return &pb.Process{
// Executable
- EntityId: wrapperspb.UInt32(e.EntityID),
+ EntityId: wrapperspb.UInt32(e.ProcessEntityId),
Pid: wrapperspb.UInt32(uint32(e.HostProcessID)),
NamespacedPid: wrapperspb.UInt32(uint32(e.ProcessID)),
RealUser: &pb.User{
Id: wrapperspb.UInt32(uint32(e.UserID)),
},
Thread: &pb.Thread{
+ EntityId: wrapperspb.UInt32(e.ThreadEntityId),
Start: threadStartTime,
Name: e.ProcessName,
Tid: wrapperspb.UInt32(uint32(e.HostThreadID)),
@@ -186,6 +187,7 @@ func getProcess(e trace.Event) *pb.Process {
UserStackTrace: userStackTrace,
},
Parent: &pb.Parent{
+ EntityId: wrapperspb.UInt32(e.ParentEntityId),
Pid: wrapperspb.UInt32(uint32(e.HostParentProcessID)),
NamespacedPid: wrapperspb.UInt32(uint32(e.ParentProcessID)),
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it too much to send all the hashes on every event?
I mean, from bpf to userspace. |
88f3698
to
07b36ad
Compare
You are right that you can get them from user-mode, but it is a pretty expensive procedure. |
d973f30
to
5e66116
Compare
Actually, if we missed the SignalSchedProcEvent, for example, then we would have no information about parent, and leader, and we wouldn't be able to find the relation (or to create their nodes). Reading procfs wouldn't be always possible, for mitigating this, due to fast spawning processes. Imagining the situation we do have the nodes in the process tree (for parent and leader) but not the leader and parent hashes... by the time the event is decoded, we would have to check something like "recent pids" cache (since we don't have leader and parent start time information), so it would involve keeping non unique data (recent PIDs and/or start_time and/or their hashes) for something that is meant to be unique (like the hash IDs, which is unique no matter what, as long as there are no hash collisions). With that said, considering we're talking about adding a u64 to each event (2 u32 hashes), I think we should stick with the hashes for parent, leader and task and consider this as "unique task identifiers" (considering PIDs and TIDs aren't necessarily unique during tracee lifetime). The benefits look way bigger than the disadvantages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There is a tiny test change need (and the kernel tests run) before this can be merged, IMO.
Actually there are now 144 bytes and before there were 128, adding 16 bytes in total. Another point to be made is that now we need to calculate the hash in bpf code for every event, which might degrade performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check the performance implications of this change before merging it:
- Event header size is now 12.5% bigger
- For every event (not only those sent to userspace) we calculate 3 hashes in bpf code
This needs to be checked against a high poad of events. For example, using magic_write event and sched_switch
pkg/ebpf/c/common/context.h
Outdated
hash_task_id(get_task_host_tgid(leader), get_task_start_time(leader)); | ||
struct task_struct *parent = get_leader_task(get_parent_task(leader)); | ||
context->task.parent_hash_id = | ||
hash_task_id(get_task_host_tgid(parent), get_task_start_time(parent)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hashing happens on init_context, which means not only we do the calculation for every event sent to userspace, but for every event in general, which may have performance implications
Change the current event entity id to be the proper process event ID (was the task entity ID before), and add the thread and parent entity IDs. Now the values are given from the kernel.
5e66116
to
cbd3d0a
Compare
Move the entity ID calculation to the user space. This way the calculation per event is reduced, and we have no exceptions between different events.
cbd3d0a
to
39a782d
Compare
Fair enough. I see Alon changed from hash to start_time for both, leader and parent. That was another idea we discussed recently, LGTM as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@rafaeldtinoco I'm leaving the merge to you after your review
1. Explain what the PR does
4febd87 feat(api): add thread entity id to protobuf
eb156ab feat(event): export thread, process and parent entity ids
2. Explain how to test it
Manually checked that hashes matches the values from the process tree.
3. Other comments