Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of alpine-built static bpftrace is prone to crashing #51

Closed
dalehamel opened this issue Jan 27, 2019 · 3 comments
Closed

Use of alpine-built static bpftrace is prone to crashing #51

dalehamel opened this issue Jan 27, 2019 · 3 comments
Labels
bug Something isn't working

Comments

@dalehamel
Copy link
Member

dalehamel commented Jan 27, 2019

As described in bpftrace/bpftrace#266, it appears that linking to musl instead of glibc is what causes sporadic crashes (deterministically) when loading certain scripts.

For instance, when I run kubectl-trace using tcpconnect.bt from bpftrace/tools, it works fine. But, if I try and load tcpaccept.bt, I get a crash every time:

libclang: crash detected during parsing: {
  'source_filename' : 'definitions.h'
  'command_line_args' : ['clang', '-I', '/bpftrace/include', '-nostdinc', '-isystem', '/virtual/lib/clang/include', '-I/lib/modules/4.14.65+/build/arch/x86/include', '-I/lib/modules/4.14.65+/build/arch/x86/include/generated/uapi', '-I/lib/modules/4.14.65+/build/arch/x86/include/generated', '-I/lib/modules/4.14.65+/build/include', '-I/lib/modules/4.14.65+/build/./arch/x86/include/uapi', '-I/lib/modules/4.14.65+/build/arch/x86/include/generated/uapi', '-I/lib/modules/4.14.65+/build/include/uapi', '-I/lib/modules/4.14.65+/build/include/generated', '-I/lib/modules/4.14.65+/build/include/generated/uapi', '-I./arch/x86/include', '-Iarch/x86/include/generated/uapi', '-Iarch/x86/include/generated', '-Iinclude', '-I./arch/x86/include/uapi', '-Iarch/x86/include/generated/uapi', '-I./include/uapi', '-Iinclude/generated/uapi', '-include', './include/linux/kconfig.h', '-D__KERNEL__', '-D__HAVE_BUILTIN_BSWAP16__', '-D__HAVE_BUILTIN_BSWAP32__', '-D__HAVE_BUILTIN_BSWAP64__', '-Wno-unused-value', '-Wno-pointer-sign', '-fno-stack-protector'],
  'unsaved_files' : [('definitions.h', '...', 22), ('/bpftrace/include/__stddef_max_align_t.h', '...', 1771), ('/bpftrace/include/float.h', '...', 5192), ('/bpftrace/include/limits.h', '...', 3735), ('/bpftrace/include/stdarg.h', '...', 2025), ('/bpftrace/include/stddef.h', '...', 4499), ('/bpftrace/include/stdint.h', '...', 23388)],
  'options' : 0,
}
Clang error while parsing C definitions: 2
Input (22): #include <net/sock.h>

Unknown struct/union: 'sock'
exit status 1

Which is consistent with the error and failure mode of bpftrace/bpftrace#266.

To fix this, we should either use an ubuntu-based bpftrace that is dynamically linked, or if bpftrace figures out to build a statically linked bpftrace with ubuntu (currently ubuntu doesn't patch a static libclang.a), then use that instead. Of course, maybe someone will figure out a way to make the alpine build work, but I think the issue is in musl so I'm not particularly hopeful.

Until this is fixed, some scripts will randomly crash which makes this tool pretty unreliable.

@dalehamel
Copy link
Member Author

dalehamel commented Jan 28, 2019

IMO the most pragmatic thing to do here would be to ditch alpine in favor of ubuntu bionic for the tracerunner, but we'd need to be conscious of container size (which is a major benefit of alpine, as I understand).

To achieve this, we could use a new builder image based on ubuntu bionic (until bpftrace is packaged as a debian package) to generate the dynamic bpftrace executable similar to what Dockerfile.bpftracebase does, and then have Dockerfile.tracerunner copy this dynamic executable in, and use libbcc from a the iovisor apt repo. I bet we could achieve a similarly-sized tracer container with this approach.

The gobuilders can probably stay as alpine, since we just copy the resulting executable in. Otherwise, they can be migrated as well, for consistency.

If a working ubuntu static build comes out or the static build is fixed, I agree that it would be preferable for simplicity's sack to have the single executable (and more consistent with the golang paradigm used here). At that point, the tracerunner container could go back to alpine, as libbcc from an apt repo isn't needed.

For the record, I might be biased - we deploy bpftrace and bcc via an ubuntu bionic based "toolbox" image, and the approach I describe above is how we use bpftrace normally.

@fntlnz WDYT about this? Are there reasons for staying on alpine that I'm missing?

@dalehamel
Copy link
Member Author

dalehamel commented Jan 28, 2019

I hacked something together to see how big the image would be, first attempt weighed-in at 241MB with ubuntu, vs 107MB for alpine. I'll see if I can chop that down at all and try to confirm it solves the crashing issue.

Edit: Final working image with ubuntu was 299MB, so almost 3 times the size 😞 it might still be worth it based on the functionality different, but a static bpftrace is looking better and better. Most of the size comes from libbcc, libclang, and libllvm, and a statically linked version wouldn't include code for symbols that aren't used, resulting in much of this heft being discarded.

@fntlnz fntlnz added the bug Something isn't working label Apr 24, 2019
@fntlnz fntlnz changed the title Bug: Use of alpine-built static bpftrace is prone to crashing Use of alpine-built static bpftrace is prone to crashing Apr 24, 2019
@dalehamel
Copy link
Member Author

Solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants