Tracking the NodeJS event loop #998

grcevski · 2024-07-05T00:59:45Z

We've had a long standing issue with context propagation for NodeJS applications, where we can't really process more than one context at a time, essentially making context propagation for NodeJS broken.

This PR ads some tracking of the active async_id for the NodeJS event loop.

The idea is to make the context tracking in Beyla based on the PID:TID + additional ID, which will be 0 for languages that don't need another ID. For NodeJS we'll poke at the active Node transaction for the PID:TID and match that way.

Ideally only a probe on EmitAsyncInit is needed, but the IDs are double floats (e.g. number in Javascript). The system ABI for float arguments is register based and at present time it's not possible to read floating point registers in kprobes/uprobes. Some kernel mailing list discussion about adding support for floats here https://lore.kernel.org/bpf/CAO658oXvAN12PFQhAQR2UXs78K-1vF3tAefd6-ToEzzQucNM=Q@mail.gmail.com/T/. Since we can't read the float registers, we resort to an additional probe on AsyncReset, which allows us to save the AsyncWrap pointer and then in EmitAsyncInit we read the two values, async_id_ and trigger_async_id_.

TODO:

Harder to pass integration tests

Limitations:

Doesn't work with SSL yet. SSL seems to increase the async_id_ in JavaScript interpreted code, so we can't match the parent transaction. More work is required here.

codecov-commenter · 2024-07-05T01:13:11Z

Codecov Report

Attention: Patch coverage is 95.29412% with 4 lines in your changes missing coverage. Please review.

Project coverage is 80.84%. Comparing base (31e98da) to head (4686a89).

Files	Patch %	Lines
pkg/internal/ebpf/nodejs/nodejs.go	94.59%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #998      +/-   ##
==========================================
+ Coverage   80.75%   80.84%   +0.09%     
==========================================
  Files         134      135       +1     
  Lines       10733    10813      +80     
==========================================
+ Hits         8667     8742      +75     
- Misses       1559     1562       +3     
- Partials      507      509       +2

Flag	Coverage Δ
integration-test	`56.30% <95.29%> (+0.31%)`	⬆️
k8s-integration-test	`58.91% <4.70%> (-0.32%)`	⬇️
oats-test	`36.18% <4.70%> (-0.26%)`	⬇️
unittests	`50.44% <0.00%> (-0.40%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

grcevski · 2024-07-05T20:43:15Z

bpf/http_sock.c

@@ -121,7 +121,6 @@ int BPF_KPROBE(kprobe_tcp_rcv_established, struct sock *sk, struct sk_buff *skb)
        // If the source port for a client call is lower, we'll get this wrong.
        // TODO: Need to fix this. 
        pid_info.orig_dport = pid_info.p_conn.conn.s_port,
-        task_tid(&pid_info.c_tid);


This isn't required anymore. I figured out a better way to track what I need for cleaning up the parent trace information. Essentially, now I store the namespaced threadID and the extra runtime ID on the HTTP request. It's the most accurate information to use on cleanup, since we use those to create the parent request trace.

grcevski · 2024-07-05T20:44:08Z

bpf/http_sock.c

@@ -628,8 +624,8 @@ int BPF_KPROBE(kprobe_sys_exit, int status) {
        return 0;
    }

-    pid_key_t task = {0};
-    task_tid(&task);
+    trace_key_t task = {0};


Extended data structure. It now has an extra ID, which is the current runtime ID. For now it's 0 for all other languages than NodeJS, and it's the current async_id for NodeJS.

grcevski · 2024-07-05T20:44:53Z

bpf/http_types.h

@@ -96,6 +95,8 @@ typedef struct http_info {
    // with other instrumented processes
    pid_info pid;
    tp_info_t tp;
+    u64 extra_id;


These are now stored here so we can correctly clean-up the server trace information when the HTTP request is finished.

grcevski · 2024-07-05T20:45:52Z

bpf/nodejs.c

+    __uint(pinning, LIBBPF_PIN_BY_NAME);
+} async_reset_args SEC(".maps");
+
+SEC("uprobe/node:AsyncReset")


Tracks this NodeJS function to remember the AsyncWrap NodeJS pointer. We then use the pointer to read the async_id_ and async_trigger_id_ in EmitAsyncInit.

grcevski · 2024-07-05T20:46:29Z

bpf/nodejs.c

+            bpf_probe_read_user(&trigger_async_id, sizeof(u64), ((void *)wrap) + async_wrap_trigger_async_id_off);
+
+            if (async_id) {
+                bpf_map_update_elem(&active_nodejs_ids, &id, &async_id, BPF_ANY);


Saves the current async_id_ in play and the child -> parent relationship with the trigger_async_id_.

grcevski · 2024-07-05T20:47:30Z

bpf/runtime.h

+#include "pid_types.h"
+#include "nodejs.h"
+
+static __always_inline u64 extra_runtime_id() {


Meant to support other runtimes which have internal threading models, for now only checks NodeJS.

grcevski · 2024-07-05T20:48:37Z

bpf/trace_common.h

-            if (p_tid) {
-                // Lookup now to see if the parent was a request
-                c_tid = *p_tid;
+            if (t_key.extra_id) {


If we have runtime ID, we lookup in the runtime parent table, else we look at the OS level thread information as before.

grcevski · 2024-07-05T20:49:30Z

pkg/internal/ebpf/nodejs/nodejs.go

+
+func (p *Tracer) UProbes() map[string]map[string]ebpfcommon.FunctionPrograms {
+	return map[string]map[string]ebpfcommon.FunctionPrograms{
+		"node": {


These symbols have been verified with Node 18, 20 and 22.

mariomac

Amazing! I like the idea of extra ID information for some runtimes

grcevski added 3 commits July 4, 2024 17:29

wip: add nodejs ebpf module

57f544b

add to attacher and add probe for async reset

a529d41

Track async ids fully

3db3d66

grcevski added 4 commits July 5, 2024 11:39

Fix linter, refactor code

98ea021

Add use of the runtime id in trace code

1c624eb

Fix for conflicting server spans

c166a1c

Make distributed traces test much harder now

4686a89

grcevski marked this pull request as ready for review July 5, 2024 20:39

grcevski requested review from mariomac and marctc as code owners July 5, 2024 20:39

grcevski changed the title ~~WIP: Tracking the NodeJS event loop~~ Tracking the NodeJS event loop Jul 5, 2024

grcevski commented Jul 5, 2024

View reviewed changes

mariomac approved these changes Jul 8, 2024

View reviewed changes

grcevski merged commit 31390b2 into grafana:main Jul 8, 2024
6 checks passed

grcevski deleted the nodejs branch July 8, 2024 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking the NodeJS event loop #998

Tracking the NodeJS event loop #998

grcevski commented Jul 5, 2024 •

edited

Loading

codecov-commenter commented Jul 5, 2024 •

edited

Loading

grcevski Jul 5, 2024

grcevski Jul 5, 2024

grcevski Jul 5, 2024

grcevski Jul 5, 2024

grcevski Jul 5, 2024

grcevski Jul 5, 2024

grcevski Jul 5, 2024

grcevski Jul 5, 2024

mariomac left a comment

Tracking the NodeJS event loop #998

Tracking the NodeJS event loop #998

Conversation

grcevski commented Jul 5, 2024 • edited Loading

codecov-commenter commented Jul 5, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mariomac left a comment

Choose a reason for hiding this comment

grcevski commented Jul 5, 2024 •

edited

Loading

codecov-commenter commented Jul 5, 2024 •

edited

Loading