Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Donation Proposal]: Beyla, eBPF auto-instrumentation tool for metrics and traces #2406

Open
grcevski opened this issue Oct 23, 2024 · 10 comments

Comments

@grcevski
Copy link

grcevski commented Oct 23, 2024

Description

Grafana Labs would like to offer the donation of Beyla to the OpenTelemetry project.

Beyla is a mature eBPF-based auto-instrumentation tool for OpenTelemetry metrics and traces, for multiple languages and protocols. It enables cluster-wide/system-wide auto-instrumentation of applications without the need for application code/configuration changes or application restarts. To achieve this, Beyla uses a combination of protocol-level instrumentation based on network events and language/runtime-level instrumentation where needed. While Beyla works on bare metal installations, virtual machines, etc., the tool is also fully Kubernetes-aware and can be deployed as a daemonset or as a sidecar. Beyla is used by a number of customers in production, including Grafana Labs itself for the Grafana Cloud hosted offering.

Some of the main uses of Beyla are:

  • Provide auto-instrumentation for programming languages where OpenTelemetry SDK zero-code auto-instrumentation is not supported, such as Rust, C++, Erlang, Zig, Ruby, Swift, Perl, Lua, Dart, R, Java GraalVM Native, Julia…
  • Provide auto-instrumentation for legacy applications, where it’s not easy to migrate the codebase to the OpenTelemetry SDK compatible frameworks.
  • Provide auto-instrumentation for applications where the source is not available or are proprietary and/or distributed in binary form.
  • Provide a unified way to capture application-level metrics across all different technologies used by a customer.
  • Provide network-level metrics, regardless of the L3/L4/L7 protocol used for the purpose of building service graphs and reachability reports.
  • Provide process-level metrics for instrumented applications.

Some of the core features of Beyla include:

  • Application level instrumentation (metrics and traces) for HTTP, HTTPS (libssl3 and Go), HTTP/2, gRPC, SQL, Kafka, Redis, CUDA (Nvidia GPUs).
  • Augments the protocol level instrumentation detection with runtime instrumentation for certain programming languages, e.g. Go and NodeJS.
  • Network level instrumentation for any protocol for the purpose of connectivity monitoring, which doesn’t conflict with any Kubernetes CNI (including Cilium CNI), including send/receive byte level accounting.
  • No root access required: Beyla does not require to be run as root, nor to be run in privileged mode in Docker containers. Beyla is able to use the finer grained Linux system capabilities (permissions) to run with minimal security configuration. Beyla will gracefully degrade the functionality when certain permissions are not granted. For example, Beyla will not use certain helpers like bpf_probe_write_user when CAP_SYS_ADMIN is not granted.
  • Supports multi-process instrumentation, it can run as a daemonset and instrument the whole system/node/Kubernetes cluster from a single Beyla instance.
  • Is OpenTelemetry SDK instrumentation aware and avoids telemetry duplication. When the whole system is instrumented, Beyla will auto-detect if certain applications are already sending traces or metrics and will disable its own instrumentation for those applications, depending on what the application generates. For example, if a web application is generating OpenTelemetry traces, but not HTTP metrics, Beyla will still generate the HTTP metrics for that application and avoid generating traces.
  • Non-intrusive. Requires no additional agents or application level modifications, access to application source or configuration.
    Minimal performance/memory overhead. We share all probes and maps among all processes, and since the userspace side of the application is built with Go, it often has much lower overhead for metric and trace generation compared to the OpenTelemetry support for certain programming languages (e.g. interpreted languages).
  • It’s built with libbpf (ebpf-go) which means a single compiled binary can be deployed on any Linux kernel version which supports CO-RE. Currently Beyla supports all LTS versions of major Linux distributions, including kernel 4.18 with the backported patches by RedHat for CO-RE and BTF.

Benefits to the OpenTelemetry community

Donating Beyla will fill a gap in the overall OpenTelemetry application level instrumentation ecosystem, for applications which use programming languages which are not supported by the OpenTelemetry SDKs, which use proprietary frameworks or use older technologies. We also believe that it will fill in a gap with network level monitoring for the purpose of building solutions for service graphs and connectivity tracking.

This donation has a lot of synergy with the OpenTelemetry Profiling Agent, and we believe that in the future we can create a non-intrusive, generic profiling to TraceID correlation by leveraging the two projects.

Reasons for donation

We at Grafana Labs prefer that customers use the upstream OpenTelemetry SDKs for application level instrumentation, however we often find that certain customers are unable to use the recommended approach because of their current technology use. We built Beyla as an easy way for our customers to get started with OpenTelemetry, while they are in their transition process of upgrading their software, which sometimes takes years. Oftentimes, customers also use binary distributions of software, and are unable to instrument these applications depending on the technology the binaries are built with.

We believe that we are not alone in this need to move customers to OpenTelemetry quicker, where they can’t currently leverage the existing OpenTelemetry ecosystem. This is why we’d like to make this project a community project, where multiple companies can be stakeholders and we can build a better community around it, compared to what Grafana Labs can do alone.

Relation with Other OpenTelemetry Projects

We also see this donation as an opportunity to combine the eBPF based auto-instrumentation OpenTelemetry efforts. Our project borrows parts of the OpenTelemetry Go Auto-Instrumentation project and some of our Beyla maintainers participate in that project too. We’d like to fully merge our work on Go with OpenTelemetry Go Auto-Instrumentation and avoid the double contribution we do at the moment. Beyla’s support for auto-instrumentation goes way beyond Go auto-instrumentation, which is why we are proposing a new project donation. We’d like to fully merge all of our work on Go with the OpenTelemetry Go Auto-Instrumentation project and vendor it in Beyla as an import once the merge is complete. We are also open to combining the Go Auto-Instrumentation project into a new project for out-of-process auto-instrumentation with our donation.

We also see this donation as an opportunity to re-invigorate the OpenTelemetry eBPF Networking project. Beyla includes support for the majority of the functionality of that project, but it’s built with eBPF-Go (libbpf), which means it uses CO-RE and it can be deployed on any kernel without specific kernel builds or deploying compilation toolchain on the target system.

Our development stack is identical to what’s used by OpenTelemetry Go Auto-Instrumentation and the OpenTelemetry eBPF Profiler. Developers on those projects will easily be able to contribute to this project and it will bring all of the OpenTelemetry eBPF tooling at the same level.

Repository

https://github.com/grafana/beyla

Existing usage

Beyla is used by hundreds of users in production, including Grafana Cloud itself. We have a strong open-source community usage, the number of pulls of our Docker image is around 100,000 a month and it has been growing steadily since inception of the project. For example, our Docker image pulls in April of 2024 were around 30,000 a month.

Maintenance

We have 4 full-time maintainers on the project which will move work full-time on the OpenTelemetry project if accepted. We have over 40 contributors on the project, most of which are not Grafana Labs employees or affiliated in any way with Grafana Labs.

Licenses

Apache 2.0 License
Our eBPF probe source is dual licensed with GPL/MIT as per the requirements of the Linux Kernel. This is identical to the approach used by OpenTelemetry Go Auto-Instrumentation and OpenTelemetry Profiler.

Trademarks

The name Beyla currently appears in a number of places in the codebase and is a Grafana Labs Trademark. We are happy to donate the name too, however we understand that it’s not compatible with how OpenTelemetry projects are typically named. We are happy to remove any of these name references when the project is donated, if the name donation is not acceptable.

Other notes

This proposal has been socialized with @MrAlias (maintainer of OpenTelemetry Go Auto Instrumentation) and @atoulme (maintainer of OpenTelemetry eBPF Networking)

@edeNFed
Copy link
Contributor

edeNFed commented Oct 23, 2024

I’m looking forward to Beyla's potential donation to the OpenTelemetry project, as it helps cover important gaps in auto-instrumentation for unsupported languages and environments.

That said, this donation comes with some challenges since a lot of Beyla’s work overlaps with existing OpenTelemetry projects like Go auto-instrumentation, eBPF Profiler, eBPF networking and OpenTelemetry Operator. The community already has efforts addressing these areas, so it’s important to understand how Beyla will fit in and integrate with these projects.

As part of the donation, it’s crucial to ensure the current core OpenTelemetry repositories remain the main source of truth, and that we avoid duplicating code or functionality. It would be helpful to see how Beyla and existing projects can come together without redundancy.

I’m also interested in how Beyla will eventually be integrated as a collector receiver in the OpenTelemetry architecture. To make this work smoothly, Beyla should be able to use existing components as dependencies rather than duplicating what’s already there.

@grcevski
Copy link
Author

That said, this donation comes with some challenges since a lot of Beyla’s work overlaps with existing OpenTelemetry projects like Go auto-instrumentation, eBPF Profiler, eBPF networking and OpenTelemetry Operator. The community already has efforts addressing these areas, so it’s important to understand how Beyla will fit in and integrate with these projects.

Thanks for the comments Eden. The main overlap in functionality is related to Go Auto Instrumentation, for which we propose to merge our functionality there and vendor it in the new project. The main challenge I see is the multi-process support, which we need for fleet wide monitoring, however I'm sure we can overcome these challenges. For eBPF Networking, I think we can use this as an opportunity to bring the functionality at the same level as Go Auto, using similar development stack and libbpf CO-RE based approach.

I don't think the donation overlaps in any way with the OpenTelemetry Operator or the OpenTelemetry eBPF Profiler. I think providing a generic way to extract trace/span information for the eBPF Profiler will be great to be able to correlate traces with profilers.

@grcevski
Copy link
Author

I’m also interested in how Beyla will eventually be integrated as a collector receiver in the OpenTelemetry architecture. To make this work smoothly, Beyla should be able to use existing components as dependencies rather than duplicating what’s already there.

I'm not sure there's much duplication there, except with the eBPF networking component, which we addressed in relationships to existing OpenTelemetry Projects. There's a recent request to add Beyla as a component in the OpenTelemetry Collector, which this would help a lot. open-telemetry/opentelemetry-collector-contrib#34321

@damemi
Copy link

damemi commented Oct 23, 2024

Thanks for the detailed proposal @grcevski! I think this is great for building progress on OpenTelemetry/eBPF and covering existing gaps.

To mirror what @edeNFed said, avoiding confusion and duplication is important. But I think you have explained that the idea is to vendor the existing Go Auto-Instrumentation as a dependency into the Beyla donation. That makes sense to me, as it fits with the goals we've been working on together in Go Auto (ie, to make that repo a library/API/SDK that can be imported by other implementations).

To that, it makes sense that OpenTelemetry would provide both (a) an open-source library/framework for eBPF instrumentation with a "raw" agent as the default artifact and (b) an open-source component consuming that framework to provide second-level functionality and usability. @jsuereth and I were actually talking about this, and he compared this situation to roughly to how the collector works.

I think the potential overlap with the OpenTelemetry Operator is in the fact that the Operator does deploy that default agent from Go Auto-Instrumentation, but that's about it. To draw back to the collector comparison, I would say that the Operator is to the Collector as Beyla is to Collector-Contrib: built on a stable, minimal core with added functionality. Both exist to give users options based on their needs.

All that said, we should make sure to apply the same standards for donation that we are also applying to the Compile-time Go Instrumentation donation. Specifically:

  • The relationship between the new repo and existing repo must be well-defined. Will maintainers from existing Go Auto-Instrumentation overlap with the new repo? Will Beyla have its own SIG and meetings? Does this add any burden to the existing project?
  • Are there maintainers from multiple companies? You mentioned that you have other contributors, would it be possible to propose an initial maintainers list for the new repo? (like we are asking from the compile-time proposal)

All in all, I wouldn't be surprised to see these 3 projects collaborate and converge more often as time goes on. Thanks for your work on this @grcevski!

@svrnm
Copy link
Member

svrnm commented Oct 23, 2024

I am by no means an expert on ebpf but one thing I'd like to ask:

would it be possible to work towards one ebpf solution that combines what beyla does (auto instrumentation with traces, metrics I suppose + networking) + the profiler?

Because at the end what people want (see this discussion for example: open-telemetry/opentelemetry-specification#4255) is a combination of all four signals, but if those 2 projects are separate we either need a way to install them side-by-side or people have to choose.

@damemi
Copy link

damemi commented Oct 23, 2024

I think that one ebpf solution would be something like Beyla. But, I don't think that idea means all of the code for every signal+language lives in one monorepo with the higher-level component. That's what I mean by separate repos at least

@RonFed
Copy link

RonFed commented Oct 23, 2024

I agree with @edeNFed and @damemi comments.

Having projects handling auto-instrumentation and on top of them higher level implementations (like the Operator or Beyla) which uses multiple other projects is a good structure in my opinion.

As a maintainer in the go-auto-instrumentation project, I'd be happy to accept donations from Beyla to the current project.

@dashpole
Copy link

I'm excited to see this donation proposal! I have made a few contributions to Beyla in the past, and have found the maintainers knowledgeable, kind, and helpful. I also think Beyla fills an important gap by providing language-agnostic telemetry. There are definitely details to work out, but i'm very supportive of this proposal.

@mtwo
Copy link
Member

mtwo commented Oct 25, 2024

This looks great, and thanks @grcevski for calling out how this relates to and can merge or interoperate with Go auto-instrumentation, network monitoring (@yonch FYI), and the profiling agent (FYI @christos68k, @petethepig, @felixge, @fabled)! These were going to be the first questions that I asked, and it looks like we already have good notions about how things can proceed with each. Now that we have several projects in flight that use eBPF, it seems sensible to have them inherit from a common base, if possible.

@alolita and I will be on point for this process for the Governance Committee. We'll circle back in a few days with next steps once more community members have time to comment.

@cforce
Copy link

cforce commented Oct 28, 2024

Don't miss out on insights from OpenTelemetry Network traces! There’s been always demand for deeper eBPF integration within the OpenTelemetry Collector 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants