Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernel stack traces to kprobes events #1429

Merged
merged 11 commits into from
Oct 11, 2023
Merged

Add kernel stack traces to kprobes events #1429

merged 11 commits into from
Oct 11, 2023

Conversation

mtardy
Copy link
Member

@mtardy mtardy commented Sep 5, 2023

See individual commits for info.

See related documentation preview.

Preview

Looks like this with the following tracing policy:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: syscall
spec:
  kprobes:
    - call: kfree_skb_reason
      selectors:
      - matchActions:
        - action: post
          stackTrace: true

And doing

nc -vz spooky.com 443

With

./tetra getevents -o compact --processes nc

The output looks like this (with color):

image
Add a new kernel stack traces alpha feature to kprobes events.

@mtardy mtardy added kind/enhancement This improves or streamlines existing functionality release-note/minor This PR introduces a minor user-visible change labels Sep 5, 2023
@netlify
Copy link

netlify bot commented Sep 5, 2023

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 8c8c274
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/652653ebb9e9240008d19401
😎 Deploy Preview https://deploy-preview-1429--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mtardy mtardy force-pushed the pr/mtardy/stack-traces branch from f15b64f to 643e49c Compare September 5, 2023 09:30
@mtardy mtardy linked an issue Sep 5, 2023 that may be closed by this pull request
@mtardy mtardy force-pushed the pr/mtardy/stack-traces branch 3 times, most recently from 426456e to aad5d33 Compare September 5, 2023 15:13
@mtardy mtardy marked this pull request as ready for review September 5, 2023 15:40
@mtardy mtardy requested a review from a team as a code owner September 5, 2023 15:40
@mtardy mtardy requested review from tixxdz, kkourt and jrfastab September 5, 2023 15:40
@olsajiri
Copy link
Contributor

olsajiri commented Sep 5, 2023

Looks like this with the following tracing policy:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: syscall
spec:
  kprobes:
    - call: kfree_skb_reason
      selectors:
      - matchActions:
        - action: post
          stackTrace: true

bit late to the discussion, so sorry if I missed something.. but seems like this kind of setup will trigger the filtering code even if it's not necessary, because the selector is empty and actions tail call does not need to be called

could we perhaps instead add flag/bool to config_map and call get_stackid in generic_output? with spec like:

spec:
  kprobes:
    - call: kfree_skb_reason
      stackTrace: true

@tixxdz
Copy link
Member

tixxdz commented Sep 5, 2023

Looks like this with the following tracing policy:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: syscall
spec:
  kprobes:
    - call: kfree_skb_reason
      selectors:
      - matchActions:
        - action: post
          stackTrace: true

bit late to the discussion, so sorry if I missed something.. but seems like this kind of setup will trigger the filtering code even if it's not necessary, because the selector is empty and actions tail call does not need to be called

could we perhaps instead add flag/bool to config_map and call get_stackid in generic_output? with spec like:

spec:
  kprobes:
    - call: kfree_skb_reason
      stackTrace: true

+1 for @olsajiri suggestion, as this should apply to all actions or triggered probes anyway

Another thing inside the doc please state clearly that this feature prints kernel addresses, so please ensure to use unix socket for the API when loading tracing policies....

The kernel has some protections to restrict unprivileged from printing kernel addresses... hmm we need to think about this

@mtardy
Copy link
Member Author

mtardy commented Sep 6, 2023

Another thing inside the doc please state clearly that this feature prints kernel addresses, so please ensure to use unix socket for the API when loading tracing policies....

The kernel has some protections to restrict unprivileged from printing kernel addresses... hmm we need to think about this

This is already in the doc! 😄
image

@mtardy
Copy link
Member Author

mtardy commented Sep 6, 2023

About putting this thing out of selectors, indeed it makes sense. We had discussions offline about moving that thing from another action to a parameter to post with @kkourt and @tixxdz, but indeed moving it out of selectors makes sense. I think here we have a "UI" vs "how things work" clash a little bit.

@kkourt kkourt added the needs-rebase This PR needs to be rebased because it has merge conflicts. label Sep 18, 2023
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have some minor comments but this LGTM.

pkg/grpc/tracing/tracing.go Outdated Show resolved Hide resolved
pkg/sensors/tracing/generickprobe.go Show resolved Hide resolved
@kkourt
Copy link
Contributor

kkourt commented Sep 18, 2023

Looks like this with the following tracing policy:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: syscall
spec:
  kprobes:
    - call: kfree_skb_reason
      selectors:
      - matchActions:
        - action: post
          stackTrace: true

bit late to the discussion, so sorry if I missed something.. but seems like this kind of setup will trigger the filtering code even if it's not necessary, because the selector is empty and actions tail call does not need to be called

could we perhaps instead add flag/bool to config_map and call get_stackid in generic_output? with spec like:

spec:
  kprobes:
    - call: kfree_skb_reason
      stackTrace: true

The downside of having it as a global option is that you don't get to have a different option for different selectors. I'd also expect the selector filtering code to be lightweight if there are nothing to match, but could be wrong on that. I'd prefer having it as a part of the selector. If performance is an issue, I think we can fix it by having a global option and pass it to bpf if all selectors have the stack trace enabled.

@jrfastab
Copy link
Contributor

That warning about the addrs I find a bit alarming. I don't actually think its as serious of an issue as the warning states however. How about just blinding the addresses for now by mask and hash?

@mtardy mtardy force-pushed the pr/mtardy/stack-traces branch from aad5d33 to 2abeb51 Compare September 28, 2023 11:33
@mtardy mtardy removed the needs-rebase This PR needs to be rebased because it has merge conflicts. label Sep 28, 2023
@mtardy
Copy link
Member Author

mtardy commented Sep 28, 2023

That warning about the addrs I find a bit alarming. I don't actually think its as serious of an issue as the warning states however. How about just blinding the addresses for now by mask and hash?

do we want to use the same hash as the kernel or we don't care? I think the kernel does something like:

+/* Maps a pointer to a 32 bit unique identifier. */
+static char *ptr_to_id(char *buf, char *end, void *ptr, struct printf_spec spec)
+{
+	unsigned long hashval;
+	const int default_width = 2 * sizeof(void *);
+
+	if (unlikely(!have_filled_random_ptr_key)) {
+		spec.field_width = default_width;
+		/* string length must be less than default_width */
+		return string(buf, end, "(ptrval)", spec);
+	}
+
+#ifdef CONFIG_64BIT
+	hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);
+	/*
+	 * Mask off the first 32 bits, this makes explicit that we have
+	 * modified the address (and 32 bits is plenty for a unique ID).
+	 */
+	hashval = hashval & 0xffffffff;
+#else
+	hashval = (unsigned long)siphash_1u32((u32)ptr, &ptr_key);
+#endif
+
+	spec.flags |= SMALL;
+	if (spec.field_width == -1) {
+		spec.field_width = default_width;
+		spec.flags |= ZEROPAD;
+	}
+	spec.base = 16;
+
+	return number(buf, end, hashval, spec);
+}

@mtardy mtardy force-pushed the pr/mtardy/stack-traces branch from 2abeb51 to 8812d66 Compare September 29, 2023 10:32
Since there was only one flag value, userspace was checking for values
superior to 0 naively. Fixed this using a bitwise flag in order to
extend that flag in next commit for stack traces.

Signed-off-by: Mahe Tardy <[email protected]>
With this new field for the post action, you can add "stackTrace: true"
in your selector's matchActions like that, for "kfree_skb_reason":

kprobes:
  - call: kfree_skb_reason
    selectors:
    - matchActions:
      - action: post
        stackTrace: true

This will add a new "stack_trace" field on the "process_kprobe"
event, the output will be similar to this:

{
  "address": "18446670264256120896",
  "offset": "272",
  "symbol": "skb_release_head_state"
},
{
  "address": "18446670264257081512",
  "offset": "88",
  "symbol": "ip_protocol_deliver_rcu"
},
{
  "address": "18446670264257082152",
  "offset": "88",
  "symbol": "ip_local_deliver_finish"
},
[...]

Signed-off-by: Mahe Tardy <[email protected]>
It looks similar to this with the new stack trace encoder:
syscall  /usr/bin/curl kfree_skb_reason
  0xffffbcdee5bf4840: skb_release_head_state+0x110
  0xffffbcdee5be3678: __sys_connect_file+0x88
  0xffffbcdee5be376c: __sys_connect+0xbc
  0xffffbcdee5be37bc: __arm64_sys_connect+0x28
  0xffffbcdee4dfcd68: invoke_syscall+0x78
  0xffffbcdee4dfcf70: el0_svc_common.constprop.0+0x180
  0xffffbcdee4dfcfa4: do_el0_svc+0x30
  0xffffbcdee5e861e8: el0_svc+0x48
  0xffffbcdee5e881b4: el0t_64_sync_handler+0xa4
  0xffffbcdee4de1e38: el0t_64_sync+0x1a4

Signed-off-by: Mahe Tardy <[email protected]>
The feature was changed in 92a7652 to
only work on the Post action which was not reflected in the doc. This
commit changes that.

Signed-off-by: Mahe Tardy <[email protected]>
A non correctly initizalized ksym could lead to a panic if we try to
retrieve an offset from it.

Signed-off-by: Mahe Tardy <[email protected]>
@mtardy mtardy force-pushed the pr/mtardy/stack-traces branch from 8812d66 to 8958dbc Compare October 10, 2023 14:47
@mtardy mtardy marked this pull request as draft October 10, 2023 17:00
@mtardy
Copy link
Member Author

mtardy commented Oct 10, 2023

I converted it to a draft because we need to fix something with the multi-kprobe support with @olsajiri first. But it's 99% ready! :)

@mtardy mtardy force-pushed the pr/mtardy/stack-traces branch from 8958dbc to c769668 Compare October 10, 2023 18:22
@mtardy mtardy marked this pull request as ready for review October 10, 2023 18:22
Kernel addresses might not be exposed in stack traces by default for
security reasons and this add a flag `--expose-kernel-addresses` to
opt-in for that behavior.

Signed-off-by: Mahe Tardy <[email protected]>
This test makes sure this feature is not broken and continues working.

Signed-off-by: Mahe Tardy <[email protected]>
Here is the issue, for example:
- In the case of simple kprobe: map name is: gkp-sensor-1-gkp-0-name_map
  with gk.pinPathPrefix = gkp-sensor-1-gkp-0
- In the case of multi kprobe: map name is gkp-sensor-1-multi_kprobe-name_map
  with gk.pinPathPrefix = gkp-sensor-1 in the second.

In the case of multi kprobe, the pinPathPrefix is wrong, it should be
gk.pinPathPrefix = gkp-sensor-1-multi_kprobe.

Signed-off-by: Mahe Tardy <[email protected]>
@mtardy mtardy force-pushed the pr/mtardy/stack-traces branch from c769668 to 8c8c274 Compare October 11, 2023 07:51
@mtardy
Copy link
Member Author

mtardy commented Oct 11, 2023

Merging this, the only CI fails is because https://www.virtualbox.org/ is, as often, down. I need to add to the link checker ignore list.

@mtardy mtardy merged commit 08b79a9 into main Oct 11, 2023
33 of 34 checks passed
@mtardy mtardy deleted the pr/mtardy/stack-traces branch October 11, 2023 10:16
@kkourt kkourt added release-note/major This PR introduces major new functionality and removed release-note/minor This PR introduces a minor user-visible change labels Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This improves or streamlines existing functionality release-note/major This PR introduces major new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report stack trace in the events
5 participants