-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate context through TCP packets #1161
Conversation
@@ -175,12 +175,13 @@ const u8 ip4ip6_prefix[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xff, 0xff}; | |||
|
|||
#ifdef BPF_DEBUG | |||
static __always_inline void dbg_print_http_connection_info(connection_info_t *info) { | |||
bpf_dbg_printk("[http] s_h = %llx, s_l = %llx, d_h = %llx, d_l = %llx, s_port=%d, d_port=%d", | |||
bpf_dbg_printk("[conn] s_h = %llx, s_l = %llx, s_port=%d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split this so that we can use it on older kernels where bpf_printk allowed for at most 3 arguments.
@@ -186,6 +186,33 @@ int BPF_KPROBE(kprobe_tcp_connect, struct sock *sk) { | |||
|
|||
bpf_dbg_printk("=== tcp connect %llx ===", id); | |||
|
|||
tp_info_pid_t *tp_p = tp_buf(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connect runs before the SYN packet is sent. We use this opportunity to setup a trace context information for the connection. We'll later query the trace information in tc_egress, and serialize it on the TCP packet.
if (tcp_syn(&tcp) && !tcp_ack(&tcp)) { | ||
bpf_skb_load_bytes(skb, tcp.hdr_len, &buf, 4); | ||
if (skb->len - tcp.hdr_len < sizeof(tp_info_pid_t)) { | ||
bpf_printk("SYN packet without tp info"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these prints will have to be removed, but keeping them for now since this is disabled. bpf_dbg_printk doesnt' work because we need to fix it to not use bpf_get_pid_tid, but bpf_get_task info
make_tp_string(tp_buf, &tp.tp); | ||
bpf_printk("tp: %s", tp_buf); | ||
|
||
bpf_map_update_elem(&incoming_trace_map, &conn, &tp, BPF_ANY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we receive a traceID over the wire (TCP packet) we store it for later to be used by the trace code.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1161 +/- ##
==========================================
- Coverage 81.07% 78.35% -2.73%
==========================================
Files 136 136
Lines 11416 11416
==========================================
- Hits 9256 8945 -311
- Misses 1636 1920 +284
- Partials 524 551 +27
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -196,10 +204,16 @@ static __always_inline void get_or_create_trace_info(http_connection_metadata_t | |||
|
|||
if (meta) { | |||
if (meta->type == EVENT_HTTP_CLIENT) { | |||
tp_info_pid_t *in_tp = bpf_map_lookup_elem(&outgoing_trace_map, conn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change the client code only looked for a server trace and if it doesn't find it it will generate the trace information later. Now we look if the TCP egress has setup trace info for us, if we find it we mark as having trace info, we must not regenerate it. If we have our own TCP egress info, but there's server incoming request that wrapped this client call, then we'll overwrite this info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually put that in a comment
tp_info_pid_t *existing_tp = trace_info_for_connection(conn); | ||
|
||
if (correlated_requests(tp_p, existing_tp)) { | ||
tp_info_pid_t *existing_tp = bpf_map_lookup_elem(&incoming_trace_map, conn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For server requests, we first look for TCP info and then we fall back to black-box info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, awesome! Very clever!
My comments are very minor (pedantic) nits, feel free to ignore.
@@ -232,7 +259,7 @@ int BPF_KRETPROBE(kretprobe_sys_connect, int fd) | |||
sort_connection_info(&info.p_conn.conn); | |||
info.p_conn.pid = pid_from_pid_tgid(id); | |||
info.orig_dport = orig_dport; | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: trailing whitespace
|
||
if (tcp_syn(&tcp) && !tcp_ack(&tcp)) { | ||
bpf_skb_load_bytes(skb, tcp.hdr_len, &buf, 4); | ||
if (skb->len - tcp.hdr_len < sizeof(tp_info_pid_t)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking into this, and stumbled upon this check in read_sk_buff()
if ((skb->len - tcp->hdr_len) < 0) { // less than 0 is a packet we can't parse
both len
and hdr_len
are unsigned, so I think, to complement this check, we should change read_sk_buff()
to
if ((skb->len - tcp->hdr_len) > skb->len) { // underflow, hdr_len is > len, packet too big
or even better
if (tcp->hdr_len > skb->len) { // package too big
bpf/k_tracer.c
Outdated
bpf_printk("tot_len = %d, new_tot_len = %d", bpf_ntohs(tcp.tot_len), bpf_ntohs(new_tot_len)); | ||
bpf_printk("h_proto = %d, skb->len = %d", tcp.h_proto, skb->len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bpf_printk("tot_len = %d, new_tot_len = %d", bpf_ntohs(tcp.tot_len), bpf_ntohs(new_tot_len)); | |
bpf_printk("h_proto = %d, skb->len = %d", tcp.h_proto, skb->len); | |
bpf_printk("tot_len = %u, new_tot_len = %u", bpf_ntohs(tcp.tot_len), bpf_ntohs(new_tot_len)); | |
bpf_printk("h_proto = %u, skb->len = %u", tcp.h_proto, skb->len); |
@@ -196,10 +204,16 @@ static __always_inline void get_or_create_trace_info(http_connection_metadata_t | |||
|
|||
if (meta) { | |||
if (meta->type == EVENT_HTTP_CLIENT) { | |||
tp_info_pid_t *in_tp = bpf_map_lookup_elem(&outgoing_trace_map, conn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually put that in a comment
bpf_printk("SYN packed len = %d, offset = %d, hdr_len %d", skb->len, len, tcp.hdr_len); | ||
tp_info_pid_t tp = {0}; | ||
|
||
bpf_skb_load_bytes(skb, tcp.hdr_len, &tp, sizeof(tp_info_pid_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it may be a good idea to have tp_info_pid_t
store some sort of magic number - it's highly unlikely, but if someone stores the exact same amount of bytes there, we will be fooled. Who knows how many other people are doing crazy things like this?
Something like [0xb3l4][tp_info_pid_t]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I initially thought of this and said to myself, so if they do, we'll read something random and it's no worse than generating new ID. However, on a second thought, the value someone else puts in could be not random, but a constant and it will make all of our traceIDs be the same. I took your advice to put an ID on the PID (we don't use this field for TCP and it's later overwritten with -1). So I'll check the PID on ingress and if it doesn't match our magic value we'll not take in the packet.
@@ -703,17 +741,25 @@ int app_egress(struct __sk_buff *skb) { | |||
return 0; | |||
} | |||
|
|||
if (!tcp_syn(&tcp) || tcp_ack(&tcp)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment such as:
// handle SYN only, ignore SYN+ACK packets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! As Rafael suggests, I'd write your GitHub explanations as actual code comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR finishes the work required to propagate context with trace context injection through TCP packets. The feature is disabled by default. I'm proposing the PR to ensure we can pass all integration tests without this enabled.
To complete the work, I plan to follow-up with another PR that will do the following:
Relates to #871