Fix delayed SSL server trace cleanup #942

grcevski · 2024-06-17T17:46:19Z

I found one more bug related to the cleanup of server traces for SSL black-box propagation. Since we now delay requests generation for SSL too, so that we can produce accurate response times, we were cleaning up at the time of request generation, which with HTTP keep-alive might be on the next SSL request. This new SSL request might have different *ssl pointer and we won't cleanup the correct server ID, causing context propagation to fail.

With this PR I clean-up the SSL trace info immediately as the request is captured, not waiting for the delayed generation.

grcevski · 2024-06-17T17:47:42Z

bpf/http_sock.h

@@ -279,7 +279,9 @@ static __always_inline void finish_http(http_info_t *info, pid_connection_info_t
            bpf_ringbuf_submit(trace, get_flags());
        }

-        delete_server_trace();
+        if (info->type == EVENT_HTTP_REQUEST) {


This was another bug, we were calling cleanup on the server traces even for client requests. So for Python applications (where everything happens on a single thread), if the server span made more than one client request we'd not propagate the trace id correctly on the second request.

codecov-commenter · 2024-06-17T17:55:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.35%. Comparing base (6162c54) to head (d723a67).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #942      +/-   ##
==========================================
+ Coverage   75.74%   79.35%   +3.61%     
==========================================
  Files         131      131              
  Lines       10154    10154              
==========================================
+ Hits         7691     8058     +367     
+ Misses       1939     1580     -359     
+ Partials      524      516       -8

Flag	Coverage Δ
integration-test	`55.12% <ø> (?)`
k8s-integration-test	`59.24% <ø> (+0.01%)`	⬆️
oats-test	`36.14% <ø> (ø)`
unittests	`47.25% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mariomac

👏🏻

grcevski added 5 commits June 17, 2024 09:56

Add elixir based test

f9ab1ec

More reliable distributed tracking for SSL

010ae79

Clean-up SSL traces as soon as request is done

850b985

Merge branch 'main' into better_ssl_tid_tracking

18eb80f

fix merge issues

2b2b06f

grcevski requested review from mariomac and marctc as code owners June 17, 2024 17:46

grcevski commented Jun 17, 2024

View reviewed changes

Remove debug info

d723a67

mariomac approved these changes Jun 18, 2024

View reviewed changes

grcevski merged commit 0dc8f55 into grafana:main Jun 18, 2024
6 checks passed

grcevski deleted the fix_delayed_ssl branch June 18, 2024 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix delayed SSL server trace cleanup #942

Fix delayed SSL server trace cleanup #942

grcevski commented Jun 17, 2024

grcevski Jun 17, 2024

codecov-commenter commented Jun 17, 2024 •

edited

Loading

mariomac left a comment

Fix delayed SSL server trace cleanup #942

Fix delayed SSL server trace cleanup #942

Conversation

grcevski commented Jun 17, 2024

grcevski Jun 17, 2024

Choose a reason for hiding this comment

codecov-commenter commented Jun 17, 2024 • edited Loading

Codecov Report

mariomac left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 17, 2024 •

edited

Loading