flaky usdt test #2402

xh4n3 · 2022-10-28T07:54:33Z

What reproduces the bug?

Trigger CI on the master branch.
https://github.com/iovisor/bpftrace/actions/runs/3343605763/jobs/5536998903
https://github.com/iovisor/bpftrace/actions/runs/3341593002/jobs/5532960039

2: [ RUN      ] usdt."usdt probes - file based semaphore activation multi process"
2: [  TIMEOUT ] usdt."usdt probes - file based semaphore activation multi process"
2: 	Command: ../src//bpftrace runtime/scripts/usdt_file_activation_multiprocess.bt --usdt-file-activation
2: 	Timeout: 5
2: 	Current output: Attaching 3 probes...
2: __BPFTRACE_NOTIFY_PROBES_ATTACHED
2: found 1 processes

BurntBrunch · 2022-10-28T22:46:11Z

I'm fighting with it in 4eb280b but I can't make it stable yet. Now the pidof invocation to get the child pid after it has started is failing. I may just make that logic only run if the test needs BEFORE_PID.

viktormalik · 2022-11-03T06:44:26Z

I think that this should be resolved by #2370

BurntBrunch · 2022-11-03T17:59:27Z

Unfortunately, I don't think it was fully fixed, just made a bit better. I still don't fully understand how this happens (and why it happens on that test specifically) but here's a run after that PR that timed out again - https://github.com/BurntBrunch/bpftrace/actions/runs/3381817256/jobs/5616467787

As noted in bpftrace#2402, usdt flakiness was made better by 508538a but not fully fixed. This commit is what I should have done all along: it allows the test runner to parse and wait for multiple BEFORE clauses and thus ensures the processes have started before the test runs. There are two minor changes: 1. The check for child processes is now `ps --ppid` based to eventually allow parallel process runs in the same environment. 2. Because of the `ps` usage, the name check is now truncated to 15 chars, which will fail if TASK_COMM_LEN is not 16. That looks like a constant in the kernel, so I think we're good.

BurntBrunch · 2022-11-04T21:18:34Z

Okay, I give up. #2414 doesn't help either, so this is not about the test runner racing the BEFORE calls. Maybe it's something to do with --usdt-file-activation and whatever kernel the GH runners run?

As noted in bpftrace#2402, usdt flakiness was made better by 508538a but not fully fixed. This commit is what I should have done all along: it allows the test runner to parse and wait for multiple BEFORE clauses and thus ensures the processes have started before the test runs. There are two minor changes: 1. The check for child processes is now `ps --ppid` based to eventually allow parallel process runs in the same environment. 2. Because of the `ps` usage, the name check is now truncated to 15 chars, which will fail if TASK_COMM_LEN is not 16. That looks like a constant in the kernel, so I think we're good.

As noted in bpftrace#2402, usdt flakiness was made better by 508538a but not fully fixed. This commit is what I should have done all along: it allows the test runner to parse and wait for multiple BEFORE clauses and thus ensures the processes have started before the test runs. There are two minor changes: 1. The check for child processes is now `ps --ppid` based to eventually allow parallel process runs in the same environment. That requires to use `ps` from the `procps` package on Alpine as the default BusyBox one doesn't have the `--ppid` option. 2. Because of the `ps` usage, the name check is now truncated to 15 chars, which will fail if TASK_COMM_LEN is not 16. That looks like a constant in the kernel, so I think we're good.

As noted in #2402, usdt flakiness was made better by 508538a but not fully fixed. This commit is what I should have done all along: it allows the test runner to parse and wait for multiple BEFORE clauses and thus ensures the processes have started before the test runs. There are two minor changes: 1. The check for child processes is now `ps --ppid` based to eventually allow parallel process runs in the same environment. That requires to use `ps` from the `procps` package on Alpine as the default BusyBox one doesn't have the `--ppid` option. 2. Because of the `ps` usage, the name check is now truncated to 15 chars, which will fail if TASK_COMM_LEN is not 16. That looks like a constant in the kernel, so I think we're good.

xh4n3 added the bug Something isn't working label Oct 28, 2022

xh4n3 closed this as completed Nov 3, 2022

BurntBrunch reopened this Nov 3, 2022

BurntBrunch mentioned this issue Nov 3, 2022

Teach runtime runner about multiple BEFORE clauses #2414

Merged

viktormalik mentioned this issue Nov 25, 2022

CI: disable flaky USDT test #2438

Merged

3 tasks

jordalgo added the tests Issues with our tests or test framework; missing tests; invalid tests label Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flaky usdt test #2402

flaky usdt test #2402

xh4n3 commented Oct 28, 2022

BurntBrunch commented Oct 28, 2022

viktormalik commented Nov 3, 2022

BurntBrunch commented Nov 3, 2022

BurntBrunch commented Nov 4, 2022

flaky usdt test #2402

flaky usdt test #2402

Comments

xh4n3 commented Oct 28, 2022

What reproduces the bug?

BurntBrunch commented Oct 28, 2022

viktormalik commented Nov 3, 2022

BurntBrunch commented Nov 3, 2022

BurntBrunch commented Nov 4, 2022