-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: alternatives to static build #342
Comments
Curious, why?
It sounds like we could even consider taking it a step further and create a custom bpf IR + compiler. FWIW, I've been tossing a similar idea around the last couple months. Clang is a bit heavy weight for the kind of bpf programs that bpftrace enables. |
Discussed that with @4ast ? Right now I'm hoping to stabilize the API and fix bugs; changing the compiler chain may be worthwhile, but not high on my list at the moment! |
I think the main issue is with the alpine static build, and the way that it links to musl, per my investigation in #266. Are there any other problems with a static build we should be trying to solve for? If we could link to a standard glibc through another build process (such as with another distro that does provide a static libclang), I think that the resulting artifact would be fine. We really only need one platform for static libclang to get a static bpftrace, after that it should be portable.
For the record, the approach I'm currently using is to package these up in a container, similar to in https://github.com/dalehamel/kubectl-trace/blob/ubuntu-images/build/Dockerfile.tracerunner-ubuntu. The resulting container certainly is bigger than the alpine one with a static binary, but it's more reliable. I would love to see a static build based on glibc, which could produce the necessary distributable binary for the cases you mentioned above. Though, it's worth mentioning, that the kernel headers are also a significant overhead as well, and until btf is widely adopted the dream of a single executable is probably a ways off. Are we averse to having a separate build pipeline for static builds though? Once that targeted only a subset (perhaps even only one) distribution?
I was looking at the cmake files that alpine uses, to see what can be gleaned from there to produce the 'magic sauce' that makes the libclang.a artifact that is needed. I had to switch context to something else, but I believe this could be a promising approach. If the patchset is small enough, then an alternative here could be to have the build process also build the needed libclang.a artifact, and perhaps cache it as a package in an iovisor repository? I really like being able to have bpftrace accept scripts and one-liners, I think it would be a big sacrifice to lose this functionality with the intermediate approaches you suggest, though there are probably some use-cases where this could make a lot of sense. |
If you invoke bpftrace from a container shell, does it still have access to all the information of the machine? Because containers would have set up namespaces etc. which, as one would expect, contain the programs in it. |
IR and compiler would be easy to implement (and maybe easier to use than LLVM's IR), but we're also using libclang to parse structs, this would be the hardest part to move away from. |
@brendangregg correct me if I'm wrong, but AFAIU, with BTF support would be possible to get rid of parsing structs through libclang. |
BTF would help for kernels that have this type information, and userspace programs which (I think) would need to have it added to their build process. Currently bpftrace allows defining structures inline, so BTF wouldn’t help with that. If we want to retain this functionality, we will probably need to keep libclang. |
We need somewhere to discuss possible compiler alternates, such as including a lightweight BPF instruction generator for when llvm is not available. Maybe rename the title of this ticket and discuss here? If anyone is working/thinking about such compiler alternates, please add comments here! |
I was having a discussion with someone who wanted to roll out bpftrace, but had a hard time making the case for it due to the current size requirements being inhibitive to their use case. To quote:
The static bpftrace docker container based on alpine is about 100MB, about 60MB of which are bpftrace. The dynamically linked version (based on ubuntu) is about 300MB, of which about 200MB are clang/llvm libraries. The static version is able to just pull in the symbols it needs during the compile-time linking, so the resulting executable is much smaller, where as the dynamic version pulls in a bunch of unused clang/llvm libraries, or pulls in a whole library to link against just a few symbols on it. Including an eBPF generating backend has the potential to reduce the footprint substantially, probably closer to on the order of 10MB or so i would guess, depending on how complicated the eBPF generating backend is. However, to put things in perspective, kernel headers needed for most deployments of bpftrace are can be closer to 1GB - i think this is the larger issue for bpftrace's portability, though I think that BTF should help to solve that problem. In our case, we just pre-deploy the kernel headers to all hosts at a predictable location to solve this problem, but it would be great not to have that dependency. A future where bpftrace has its own eBPF generator + can use BTF for kernel type information I think is the path forward to a portable, lightweight bpftrace.
Agree that static is only part of the issue here, maybe ticket name should be "RFC: Alternative eBPF generating backend" or similar? |
Maybe we will need to write our own BPF backend eventually, but it would not be trivial and we'd miss out on all the LLVM optimisations (which help us to not blow the stack all the time). We'd also have to implement our own C parser to get rid of libclang as well as LLVM or there'd be no point. For the static bpftrace executable: it doesn't need to run inside the Docker container it's built in - it should be portable. The 60MB can be reduced a little bit further down to 44MB with |
44MB in pretty reasonable, so perhaps the answer in the short term is static builds. 1GB for kernel headers? On my system they add to 137 MB (Ubuntu linux-header packages). Yes, BTF should take that down and give us everything. |
To clarify, this size is to grab a snapshot of the full linux sources since there is no header package available for the distribution I'm working with most of the time (Google's Container OS on their GKE platform). Once the headers have been stripped down it's closer to the size you mention, but it's not practical to do that in a container as it needs to be portable between different kernel versions. So, our flow involves fetching the sources for the current kernel and pulling out the headers that way. As i mentioned, we do this once per host (in an initContainer as is done with kubectl-trace) and just cache it in a way that all containers can access. See https://github.com/iovisor/kubectl-trace/tree/master/init for the gist of how this is done if you're interested.
Yes, but i believe that alpine is the only platform that static builds work on, as it's the only one that provides the necessary static libs to link against. As far as I know this build is still broken in a few ways, but I think fixing this would be much easier than refactoring bpftrace to use a different backend than LLVM for ebpf generation. The approach I had briefly investigated for that was to see how alpine modify's LLVM/Clangs's cmake files in order to produce the missing static libraries to other platforms, but that would involve building Clang and/or LLVM from source in order to do a static bpftrace build which incurs a non-trivial maintenance burden. |
My 2 cents: eBPF generation is the easy part if we want to remove LLVM as a dependency. The hard part is parsing C structures, enumerators, macros, etc. Especially since we allow users to Well, maybe we could come up with a modular approach to bpftrace's parser, where users can plug-in their own extra parsers (for instance, a C/C++ parser) which will be used to get struct and types sizes, offsets, etc. We could do this with the current approach: decouple clang_parser from bpftrace and make it work as a plugin. This, alongside our own BPF generator, would considerably reduce the size of bpftrace. Is this a good idea? I don't know, but that would allow us to support other languages without having to implement everything bpftrace's core.
If we can build libclang static libraries in other platforms, maybe we can advocate to have the static version of libclang included in some distributions. But the first step is getting it to work. Ubuntu already installs LLVM static libraries, it would only be a matter of doing the same for libclang. Worst case we can have a Docker image with libclang compiled statically, which we use to build bpftrace. |
We discussed enabling bpftrace on Android recently. Our constraint is not adding another copy of LLVM to device (what our system image can support is very much size constrained). I'm not sure how feasible it would be to split precompilation out on to the host, but from what I understand, it seems that bpftrace is a JIT, where as breaking up compilation from execution would make it more feasible for Android. Not that compilation is necessarily slow, but it would be nice to be able to just "reload" a compiled bpf program. Then for Android, we could cross compile the bpf program on the host, push it to device, and run it there all without the need of having LLVM on device. |
I did a spike of trying to build a static bpftrace with glibc and uclibc. Both of them suffer from the same issue described in #336. I believe this may also explain why the current alpine build with musl works, but is incorrect and crahes, causing #266 and #869. An alternative might be to have a mostly static build, but link to libc dynamically. This would make bpftrace less portable, as the mostly static executable it would be tied to glibc versions. On the plus side though, no LLVM or clang would need to be distributed. Or perhaps we can find away to work around #336 and get a static build working with glibc. |
With a fair bit of hacking, I was able to build bpftrace including llvm and clang libraries statically, the output of ldd on bpftrace:
This is about as minimal as you can get in terms of dynamic dependencies, as those are all supplied by glibc / linux. The resulting bpftrace executable is about 78MB and bpftrace appears to run without crashing (see below, I've attached it). I had to build clang manually to get I then manually editing the generated link.txt on my system to manually specify the paths to a few static libraries, as cmake wasn't quite doing it properly:
The ugly part is that I had to manually replace LLVM.so with a list of all static llvm libraries (I was lazy about this, that's why there are some duplicates), as well as replace So... why does any of this matter / why do we care? If we use the This way we can include llvm, or at least clang in our CMake build when a similar switch is used (performs STATIC_LLVM or similar), then bpftrace can be distributed without the need for the kitchensink of llvm and clang packages. This should make it more feasible for shipping to other platforms, like to the raspberry pi and android or other embedded environments. So, my proposal here is that we add a new build mode that builds and embeds the necessary LLVM static libraries from source (for example, perhaps via ExternalProject or similar). This would be a much, much longer build time if this flag is enabled (it's building the kitchen sink of LLVM / Clang), but would reduce the overall size of bpftrace, and negate the need for anything but these meat-and-potatoes libc libraries to be available at runtime. Here's the executable I've built, I wonder how portable it it is ... https://drive.google.com/file/d/121l-RAjN4Q2SCUJOapdaKV1V7CCA4z6I/view it was built against glibc-2.29, so I'd expect it to at least work on any system with an ABI compatible glibc... it was built using bpftrace 0.12.0, clang and LLVM 8.0.1, and bpftrace master as of today. |
I documented my build using a Dockerfile in a separate repo: https://github.com/dalehamel/bpftrace-static-clang, anyone interested to should be able to reproduce by cloning that repo and just running I tested copying the resulting bpftrace to ubuntu disco (19.04), as that is the first ubuntu using glibc 2.29.0. It worked just fine, despite having actually been built off of the latest Gentoo sources. Theoretically if the system used to build the semi-static bpftrace is older (eg ubuntu bionic on 2.27, or ubuntu xenial on 2.23) , the compatibility should be increased, as glibc is generally backwards compatible. Since it is usually possible at least loosely to associate a linux kernel version with the glibc version that distros using that kernel tend to ship with, it should be pretty easy to target support for glibc to match kernel with eBPF support. For example, ubuntu 18.04 ships with kernel 4.13, and also with glibc 2.27. By targeting glibc 2.27 then, we can be pretty certain that systems with kernel 4.13 will be supported, comparing with other major distros.
@mmarchini guess I ended up doing basically that |
After re-reading this thread and the surrounding issues, it seems to me there are two main concerns here (please correct me if I'm wrong):
Regarding 1, I'd like to step back and discuss what the goal is. In my experience, distros really dislike shipping static binaries. At FB, we really like shipping static binaries. However, we have our own toolchain for building open source stuff and I've managed to hack it enough to build bpftrace mostly-statically (glibc and friends are still dynamically linked). To clarify, outside of these server environments, what are the advantages of a static build? Regarding 2, I think we must avoid shipping & running a compiler -- regardless if bpftrace is statically built or not. I've been thinking about taking on the
BTF support has already been merged so we're no longer dependent on kernel headers. Note that |
You raise some good points @danobi, here are my thoughts.
To clarify, I actually don't think a 100% static build is realistic; it seems like LLVM won't play nicely with a static libc. What I'm advocating for is a "semi-static/embedded" build, ie, with all dependencies except libc embedded in.
What you've described is exactly what I built in my prototype above, which is basically the embedded/semi-static build. That we each independently arrived at the same practical solution suggests to me there might be something to this approach... Perhaps it is time to codify these hacks? I'll take a look at making a patch to support this with an option in cmake, to see how heavy-weight of an addition this approach would be.
The advantages of this that I see are:
I guess what I really want is to make it easier to get a "batteries included" build of bpftrace that I suspect would be very popular and help to lower the bar to trying out bpftrace for a lot of potential users. For example, the version even the most bleeding edge users of Ubuntu eoan right now will get is 0.9.2, linked with BCC 0.8.0. These same users could benefit from the latest bpftrace builds and functionality with a semi-static build. Right now the only feasible alternative is to build it all yourself.
I would tweak this to: I think we should avoid needing to ship and run a compiler. I think this is a good long-term goal to be able to pre-build and reuse probes, but it takes away some of the flexibility of bpftrace to force this workflow. I think that even lower-resource environments should be able to manage compiling eBPF probes - the issue I suspect is more getting the dependencies on there. I think this will also constitute a substantial effort, and until it is done a semi-static build would offer an alternative path to managing the complexity of compiling to BPF bytecode code on-demand in the meantime. |
Agreed.
Hmm, but this is only covering the build part. For distribution, are you suggesting something like snapcraft/flatpak? (fwiw I think having our own packages would be a good idea) Overall, your arguments seem pretty convincing. I think it'd be fairly reasonable to get a semi-static build mode set up for bpftrace. |
Actually I think it'd also be pretty cool to upload a binary to our releases tab that will "just run" on most distros. One issue that comes to mind with this is all the compile time OS checks we do (eg which bpf helpers are available). |
@danobi which ones are you referring to? Maybe i'm having a brain-fart, but I don't recall bumping into this, and I grepped through the source code and couldn't find any. In any case, perhaps they could be converted to run-time checks? Am I correct in assuming that these are checking which helpers are available from the Linux API? So no matter how bpftrace is built (static or dynamic), this is still a portability issue that depends on what kernel helpers are available?
It would be great if we could push it to the iovisor apt repo, dockerhub, and github releases page for starters. I'm just barely aware of these newer methods of snapcraft/flatpak, but I'm sure it shouldn't be hard to wrap up the mostly-static binary in whatever packaging system we like, or even for community members to do the work for us if its easy to get the build artifacts. Looking at flatpaks the build looks pretty trivial, as even the sample build is just copying in one file, and likewise snapcraft looks to use a similar manifest-driven approach. I think a docker image can also be a great means of distribution. For instance, I'd love if we could also push to quay or dockerhub, so that users can do something like this:
And have a fully-functioning bpftrace without the need to build anything! This image clocks in at just 72MB, where 52MB of which is bpftrace itself. I found this image just by searching for "small glibc docker image", there is nothing that special about it - it just shows how portable this can be. I was able to run bpftrace with this (from a mostly-static binary I copied in locally). Being able to do something like above would make the Even if people don't want to use docker images, docker can still be used to distribute the file by just copying it out of the iovisor container.
It would be great to automatically push build artifacts from travis or some other CI to github on tagged releases come to think of it... This could also trigger pushes to quay, apt repos, and wherever else it makes sense to do so. Perhaps we should cut out a separate issue for this discussion on packaging and distribution @danobi ?
I definitely agree! That's what I'm chasing after here. At the least, could provide a couple of options based on what libc it is linked against, or other compatibility options. I bet that just linking against the older glibc in ubuntu xenial (2.23) would be enough to mostly achieve this. I have already been hacking on a patch to "embed" LLVM and clang, and built an otherwise static executable (minus libc). I have something working and it has turned out to be less messy than I thought I would, I hope to have it up for review soon, I'm just smoothing out some of the rougher edges first. |
For example, this one:
Right. The portability concern was related to the idea of uploading binary that "just works" across multiple distros/kernel versions/glibc versions.
+1 |
thanks @danobi now I know exactly what to look for, I had indeed seen this and it had just slipped my mind.
Yeah i agree, I think that for glibc we can start by targeting a relatively old glibc version (2.23), and for the kernel we could tackle that in a few ways, perhaps a combination of runtime checks, a lookup table of kernel versions and the associated symbols that should be present in them, or by building multiple artifacts (against different "BPF milestone" kernel header versions). For example, one targeting 4.14 + glibc 2.23, another targeting 5.3 + glibc 2.30, and then age-out support for these as new kernels are released with more BPF functionality. At a certain point this will probably be less of an issue, once the BPF helper support becomes feature complete. The kernel and glibc that major/popular distros like Fedora and Ubuntu ship with can probably also be used as rubrics for what combinations are reasonable to target. I'll cut a separate issue for this, too. Once the dependencies are built, creating an additional bpftrace output artifact is relatively inexpensive, but I think it is still ideal to have as few artifacts as is reasonable to keep track of and maintain, as each one will add more time to the CI runs. |
This should help embed libbcc into bpftrace: * bpftrace/bpftrace#342 As well as other tools like ebpf_exporter, so that there is no longer a requirement to have a particular version libbcc installed.
This should help embed libbcc into bpftrace: * bpftrace/bpftrace#342 As well as other tools like ebpf_exporter, so that there is no longer a requirement to have a particular version libbcc installed.
This should help embed libbcc into bpftrace: * bpftrace/bpftrace#342 As well as other tools like ebpf_exporter, so that there is no longer a requirement to have a particular version libbcc installed.
At first I was interested if LTO can help with size reduction, since I've seen it working with Rust (where LLVM is the backend: cloudflare/psi_exporter#2), but I don't know much about this stuff and I wasn't able to make it work easily. However, in the process of experimentation, I noticed that there's a significant difference in binary size of
The smallest binary here is 100MB+ behemoth and it still links to
It's possible that this can translate to bpftrace as well, and it may be even better if LTO is enabled. This could already be taken into account, but size reduction seems worth mentioning just in case. |
thanks for the tip @bobrik I'll take a look. For the time being I'm kinda doubtful it can be smaller than 52MB as the linker seems to already be pretty efficient in this regard. I did a quick test with My understanding of linking to static libraries is that the linker will only pull in the symbols from the static libs archive that it needs in order to complete the binary, so I'm not sure how this could really be made more efficient unless it is pulling in things which it doesn't need to... I almost have #1041 ready for review, but feel free to try out a build for yourself and see if you can come up with any further size optimizations. I tried |
My knowledge could be outdated, but my understanding was that if an object file A uses any symbol from object file B, all of object file B is linked in. I believe that's why the toolchain folks added a bunch of symbol visibility control (eg LTO can also do full program analysis and eliminate stuff like dead code that crosses translation units (which the compiler can't normally do on a per-TU basis). In general it can be quite helpful but it can be very expensive to perform. I think that's why stuff like ThinLTO is developed. In general, all this stuff can be helpful, but we definitely have to set up a bunch of instrumentation and follow the data to be certain. |
I didn't try adding -flto to the C/CXXFLAGs, just to the linker flags.
TIL thank you
Agreed, and for now I'm able to reproduce:
I think the biggest difference has been that ubuntu bionic builds RelWithDebInfo, where alpine and the embedded build are MinSizeRel. That release one is pretty compelling over the enormous size of the LLVM packages already. If we can get it smaller (@ajor said as low as 40 MB earlier?), that would be awesome. I also considered another option - what if we added google breakpad: https://chromium.googlesource.com/breakpad/breakpad/+/master/docs/linux_starter_guide.md This would allow us to build everything with RelWithDebInfo, generating a fairly large bpftrace executable, then stripping off the the symbols. This way if bpftrace crashes, it can dump in the google breakpad format, which will allow using the symbols generated at build time when a release/version is cut to be able to interpret these more minimal symbols. I've worked with breakpad before and it is awesome for getting to the root of crashes, and the stripped binary will probably be comparably sized. I'll cut an issue for breakpad support if the static+glibc builds move forward. |
Googled around, seems my knowledge might have some gaps. Seems like ld has
Cool, haven't seen this before. Would have to spend some time looking before I have an opinion. |
The sizes I quoted earlier were from running the |
This should help embed libbcc into bpftrace: * bpftrace/bpftrace#342 As well as other tools like ebpf_exporter, so that there is no longer a requirement to have a particular version libbcc installed.
As an update, in #1041 an immediate alternative to fully static builds is provided by #1041, which links all but libc. My running theory on why a fully static exe is an issue is that LLVMs header parsing is doing dladdr at some point / depends on libdl, which blows up with musl. It may be that a static libc of another flavor could still work, but I haven't dedicated time to testing this. To extend this concept further, I applied the same method used in #1041 to replicate @michalgr 's work in CMake. This links dynamically to bionic libc as well, and also introduces cross-compilation to bpftrace's build system. So the following bpftrace targets should soon be fairly readily available / easy to publish as release objects:
This issue thread has also gotten quite long and introduced some interesting new ideas that we can cut out into a separate issue, as well as discussions on:
Would anyone object to closing this issue to focus the discussion on the topics above in separate issues, at least if #1041 is merged? |
I made some minor changes to the bionic static docker build and was able to build on xenial (glibc 2.23, llvm8). I ran into #507 and fixed it with #507 (comment). I also hit issues with the older version of CMake installed by default on Ubuntu 16. After upgrading CMake and applying the linked fix, I could build with static llvm/clang. Here's my dockerfile as reference: https://github.com/alexeldeib/bpftrace-static/blob/de383badfd9482c419b3790d2f70325f444e50f6/Dockerfile I know there are compatibility issues with the older kernel in xenial, but there's a lot you can still do with bpftrace there and it has a broad deploy base. I'd love to see the published releases target the older glibc 2.23 so they're usable on those machines 🙂 |
@alexeldeib I had some work on a Xenial build for #1041 but discarded it, as I felt the PR was big / complex enough. I still think it's worthwhile to have support for an older glibc, especially with the runtime feature checking that @fbs added recently. Why don't you submit a PR and add a github actions build for the glibc 2.23 build? |
Would generating libbpf compile-once-run-anywhere code (see here) be an alternative to static builds? i.e. as a way to implement point 2 from here: #342 (comment) On your dev box you could write your bpftrace program, which gets compiled into code linking in libbpf making the right macro calls. You could build this code with the toolchain you normally use to deploy binaries onto your production box (and whether you may want to use static linking for it or not is a matter to be solved outside of bpftrace, perhaps you could statically link just libbpf but not libc, etc.). With sufficient automation you could do this nearly "live" if you wanted to (e.g. build in a container matching your production box, and scp over the binary). |
`bcc` and `bpftrace` provide many tools that are useful for obtaining insights into various aspects of the system. Include them in the container. All the required tools are being built for ppc64le, so switch to using the upstream packages in the `edge` repository. After the next alpinelinux release, we can switch over to using alpine:latest The image size has drastically increased, largely due to the dependencies pulled in by bcc/bpftrace: - python3 is ~45M - clang is ~33M - llvm is ~80M - bcc itself is ~50M A static build for bpftrace should help, but has many problems right now and is being discussed in the upstream community: - bpftrace/bpftrace#342 Signed-off-by: Naveen N. Rao <[email protected]>
This should help embed libbcc into bpftrace: * bpftrace/bpftrace#342 As well as other tools like ebpf_exporter, so that there is no longer a requirement to have a particular version libbcc installed.
During the BPF Microconference at LPC, we got feedback that installing llvm/clang/libbpfcc in production servers just to use
bpftrace
would be overkill and sometimes unfeasible. An alternative would be linking to those dependencies statically, but most distributions don't provide the static librarylibclang.a
on their libclang packages.I started to think about alternatives to a static build, and two ideas came up:
Have an option to compile a
.bt
file into an executableIn this approach, we would add a new option to
bpftrace
(let's say,-o
), which instead of attaching to the probes we provided, would build an executable binary with our bpftrace program. We could then send this binary to a given server without having to install llvm/clang libraries there.Downsides would be:
Compile a
.bt
file into an intermediate format, and run with a minimal bpftrace binarySimilar to the previous idea, but instead of creating an executable binary we would have an intermediate format with our probes as well as BPF programs for each probe. We could then use a minimal bpftrace binary (not linked against llvm/clang) to run this intermediate file.
On the upside, I think this would be more maintainable than the previous approach (internally we could also use the intermediate format). Downsides are the same as the previous approach.
I still think it's worth getting our static builds to work though, but if we can't do that reliably, we could start investigating those alternatives.
The text was updated successfully, but these errors were encountered: