Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Donation Proposal]: OpenTelemetry Instrumentation for Android #1400

Closed
breedx-splk opened this issue Mar 28, 2023 · 30 comments
Closed

[Donation Proposal]: OpenTelemetry Instrumentation for Android #1400

breedx-splk opened this issue Mar 28, 2023 · 30 comments

Comments

@breedx-splk
Copy link
Contributor

breedx-splk commented Mar 28, 2023

Description

Splunk would like to offer the donation of Splunk OpenTelemetry Instrumentation for Android to the OpenTelemetry project. This is also known as OpenTelemetry-based Android RUM (client) instrumentation.

This library provides a fluent way for mobile application authors to configure instrumentation and the otel sdk. Please see the above link for additional features/benefits.

image

The above diagram represents a high-level overview of the current Android SDK design. The bottom section is existing OpenTelemetry functionality, and the topmost level has vendor-specific customizations. The middle section between the two dashed lines is the candidate for donation.

User code is able to create an instance of OpenTelemetryRum, which tracks the RUM session id and the OpenTelemetry API instance. The fluent builder allows user code to configure any number of available (or custom) instrumentations. Current existing instrumentation includes:

  • Activity lifecycle
  • Fragment lifecycle
  • Current “screen” tracking
  • Detect ANRs
  • Detect crashes
  • Carrier/network state
  • Carrier/network change
  • Jank (slow renders)
  • Startup timing

Benefits to the OpenTelemetry community

We believe that the OpenTelemetry community will benefit by strengthening its client instrumentation offerings, specifically in the mobile space with Android. This donation will support the goals of the Client SIG (community PR, doc) and will help shape (and be shaped by) Real User Monitoring (RUM) OTEP) developments.

It has been shown in other areas of OpenTelemetry that a functioning "straw-man" implementation can help accelerate development of standards/specifications. This donation will provide a strong, user-tested foundation for continued RUM community development.

Reasons for donation

Splunk continues its dedication to the OpenTelemetry community and bolsters the project goals of standardizing telemetry, including mobile and RUM telemetry. Standards cannot be developed reliably in isolation, and community involvement strengthens standards.

The Splunk Android instrumentation has been developed over the last couple years and is already based on vanilla/upstream OpenTelemetry components. It has been developed in the open, and carries an Apache 2.0 license already.

Repository

https://github.com/signalfx/splunk-otel-android

Existing usage

The current Android instrumentation is at 1.0.0 and is used in production by Splunk RUM customers.

Maintenance

Splunk developers will help maintain the OpenTelemetry Android project. @breedx-splk has offered to lead maintenance initially, and we are hopeful that community engagement will follow. Community members have also expressed interest in supporting and improving the contribution, specifically @LikeTheSalad (Elastic) and @martinkuba (Lightstep).

Licenses

Apache 2.0

Trademarks

The name "Splunk" currently appears in the codebase, but we do not intend to donate any code that includes the name "splunk". Much of the core functionality will be donated in theio.opentelemetry.rum java package namespaces (or similar).

Other notes

Tagging @open-telemetry/technical-committee per CONTRIBUTING guidelines.

@theletterf
Copy link
Member

This is great. I'd be glad to help with the docs.

@tylerbenson
Copy link
Member

As far as I can tell, the current implementation doesn't allow exporting in anything except Zipkin (seems hard coded in RumInitializer). I suggest that OTLP be implemented as the default exporter format before this be donated.

@bogdandrutu
Copy link
Member

I suggest that OTLP be implemented as the default exporter format before this be donated.

Can happen also as a first thing after, but definitely needs to happen.

@reyang
Copy link
Member

reyang commented Mar 29, 2023

"Offline buffering of telemetry via storage" - this is a good candidate for the spec.

@yurishkuro
Copy link
Member

If this donation is about instrumentation, why is the export format even in question? Wouldn't it be configured in the SDK?

@reyang
Copy link
Member

reyang commented Mar 29, 2023

If this donation is about instrumentation, why is the export format even in question? Wouldn't it be configured in the SDK?

My current understanding - the donation is a family pack of several components that cover instrumentation, SDK plugins, and other stuff.

@yurishkuro
Copy link
Member

Sounds like we need a better narrative, some architectural overview based on which we can do due diligence.

@breedx-splk
Copy link
Contributor Author

Sounds like we need a better narrative, some architectural overview based on which we can do due diligence.

I can work on providing that. I think it would also help to clarify some specifics.

@LikeTheSalad
Copy link

My current understanding - the donation is a family pack of several components that cover instrumentation, SDK plugins, and other stuff.

Maybe adding Android-specific bytecode instrumentation support could also be one of the tools provided here. Since Android doesn't support runtime instrumentation agents like in plain Java, the auto-instrumentation mechanism provided in this repo doesn't work automatically on Android apps, so one of the benefits of having an Android OTel dedicated repo could be providing auto-instrumentation tools as well. We'd be glad to help add said support to this SDK based on how it's currently done on the Elastic SDK.

@bherbst
Copy link

bherbst commented Apr 3, 2023

Putting the request around exporters a different way - I'd love to see the relationship between the Android instrumentation and the OTel SDK changed so that instead of configuring an OpenTelemetry instance itself, the Android instrumentation accepts a client-configured instance to support any manner of changing the SDK's behavior and also so clients can use the same SDK instance for other tracing needs.

I would love to use a canonical Android instrumentation like this, assuming I can configure it to fit my org's needs (e.g. using the OtlpHttpExporter instead of Zipkin). I'd also be overjoyed to contribute back.

Huge plus one to the desire for buffering on disk too - this is a fairly thorny problem with OTel out of the box on mobile. If the app is killed by the OS or crashes, vital traces leading up to thoes events are currently lost.

In the longer term it may make sense to start breaking out some of the individual pieces of this instrumentation into smaller more focused instrumentations. I could see Activity lifecycles, Fragment lifecycles, ANR & crash reporting, jank reporting, and network state all being separate modules, and pieces such as a disk buffer could be easily broken out as well. Breaking them out would help manage complexity & binary size, particularly when we start thinking about continued evolution like extending the "current screen" and Activity/Fragment lifecycle tracking to Jetpack Compose.

@breedx-splk
Copy link
Contributor Author

@bherbst said:

I would love to use a canonical Android instrumentation like this, assuming I can configure it to fit my org's needs (e.g. using the OtlpHttpExporter instead of Zipkin). I'd also be overjoyed to contribute back.

That's great, love to hear it. The need to support OTLP and not Zipkin is definitely expected, no surprise there.

[hypothetically] Android instrumentation accepts a client-configured instance to support any manner of changing the SDK's behavior and also so clients can use the same SDK instance for other tracing needs.

Of course all of this is changeable, and we're always open to improvements. Right now, the intention was to provide convenience methods for configuring the most common parts of the SDK while limiting other configuration that might be problematic or less useful on mobile. Users do have the ability to fetch the OpenTelemetry instance from OpenTelemetryRum to use in other tracing needs. If we wanted to provide some "advanced" or similar option that allows the entire SDK to be preconfigured and injected, I think there's a bigger discussion to be had.

Huge plus one to the desire for buffering on disk too

Right now the implementation has this tied up in Zipkin, and I would expect this to be an early issue that we could improve, because it sounds like folks want this.

Also +1 to future modularization. There will be an evolution over time.

@pyricau
Copy link

pyricau commented Apr 24, 2023

This looks interesting, though modularization would likely be a super high priority for this to be adopted. Mobile Observability is all over the place, with a lot of different use cases, different designs, etc. We (as in, the mobile community) need really strong & small cores that are designed for mobile and can be composed, rather than a "framework" with a ton of configuration options.

Some things that come to mind:

To be clear, I'm not trying to argue against this generous donation, but rather pointing out that it's impossible to get an implementation that gets everything "right" and that ideally having composable ways to track all these things would allow for alternate implementations of a number of parts (which could be donated over time, etc)

One last thing, I'm not sure how important this is but the Android community has by and large moved to Kotlin. Java SDKs are of course the most compatible option, but that also leaves on the table some amount of API niceties.

@breedx-splk
Copy link
Contributor Author

[...] pointing out that it's impossible to get an implementation that gets everything "right" and that ideally having composable ways to track all these things would allow for alternate implementations of a number of parts (which could be donated over time, etc)

@pyricau Thanks for taking the time to provide this feedback, it's appreciated. We are certain that we didn't get things "perfect" out of the box, and I respect this idea that there probably is no "perfect" among a diverse set of user scenarios/use cases.

[...] need really strong & small cores that are designed for mobile and can be composed, rather than a "framework" with a ton of configuration options.

I'm still digesting this, but I think I agree with at least part of this. Composability? Absolutely! Frankly, the current implementation doesn't do nearly enough of that because there's only been one opinionated vendor using it so far. I would definitely expect to see this improve with community involvement.

As far as "framework with a ton of config options", I would expect this conversation to continue for many months....but in concept, ideally, the instrumentation would include some set of out-of-the-box (built-in) features that are useful to most users, along with configuration mechanism to both disable those while being able to enable other features that might be off by default. I'm not super opinionated (yet?) about what that might look like long term -- where it be programmatic APIs (like we have now) or config files or build tooling, or something else entirely.

As pluggable components are built in the future, there should be apis or tooling to help include those in projects, and there need to be some user-facing apis for doing o11y things at runtime...ideally all in a way that is "designed for mobile", as you pointed out. I'm not sure exactly what makes this feel "frameworky" (my word), or why that's implicitly a bad thing, heh. If the number of configuration options seems too vast, then definitely there is room for improvement.

Thanks again for your input.

@LikeTheSalad
Copy link

[...] ideally, the instrumentation would include some set of out-of-the-box (built-in) features that are useful to most users, along with configuration mechanism

I believe this is a key point. While I’m aware that not all Android apps will have the same needs, meaning that there won’t be a “perfect” solution for them all, I’m also aware that not everybody would have the need or time to come up with a tailored solution for their apps, so having a default set of tools will come in handy for a lot of use cases.

At the same time, I believe that having a tool like this would be helpful to provide a baseline set of features where the community would define what are the bare minimum functionalities that an Android observability tool needs to have (while providing ways to extend them for those who need more), it would provide some sort of “structure” of what an Android observability tool should be composed of, which I believe would come in quite handy for people who don’t know what is it that they can or should observe from Android apps, and, after they see those in action by using the out-of-the-box implementations, it should be easier for them to pick and choose what implementations to override with either third party plugins or their own solutions if needed.

Generally speaking, I believe a tool like this would provide a starting point for Android observability which I personally think is needed since it seems like there’s currently no clear path on what Android (and mobile in general) observability should look like for OTel. The Java SDK needs to be quite generic, plus not all of Java’s tools are available in Android runtime and there’s also a bunch of Android-specific use cases that can’t be part of a generic Java solution either.

I think it’s great that, even from its proposal, a tool like this is already bringing some great ideas such as, what could be the best approach to identify ANRs, app startup tracking, and so on, because then those could become the “out-of-the-box” options thanks to the community feedback. It all starts with having a point of reference to compare to and act upon such as an Android-specific OTel SDK as the one here proposed.

@jack-berg
Copy link
Member

Quick update - the TC due diligence review for this is currently in progress.

@danielbanks
Copy link

danielbanks commented Jun 1, 2023

@jack-berg I'm looking to integrate OTel with our Android application and a dedicated SDK for Android would be a massive benefit, even if the first iteration isn't perfect. Do you have an update of if/when this will happen?

I'm also happy to help contribute to this, but wasn't sure what state it is in, i.e. close to release vs just reviewing requirements etc.

@jack-berg
Copy link
Member

Hi all. The TC due diligence for this donation is complete: https://docs.google.com/document/d/1sxsw3Jp3X6mr_gOYR4kuKVOFL2bxsieQkRquud7mIXQ

In summary, we'd like to accept the donation, and have a couple of recommendations on how to proceed. Please review the recommendations. If they are acceptable, we can proceed.

Thanks everyone for your patience!

@LikeTheSalad
Copy link

Thank you @jack-berg this is great news. There's just one part of the doc that caught me by surprise regarding the alternatives evaluated during the due diligence so I left a comment in there if you can have a look. Apart from that, I agree with the recommendations and I hope the community gets involved in order to fulfill those in the best way possible!

@breedx-splk
Copy link
Contributor Author

Thanks @jack-berg ! Loving to see this progress. I had one small comment, but it's forward looking and I don't think anything in there is surprising nor problematic. Can the acceptance just be done in writing in this issue? If so, I am confident we can get that completed very soon.

@bidetofevil
Copy link

bidetofevil commented Jun 15, 2023

It's great to see momentum on improving OTel support for Android! At Embrace we are looking to leverage the OTel Java SDK as well but we are approach the problem slightly differently, at least to start.

Our first foray into OTel involves solving performance tracing: logging Spans (as defined by the OTel spec) for specific workflows/operations, using as much of the OTel Java SDK as possible. (We are doing this for iOS too using the Swift SDK too, BTW)

From what I can tell (and correct me if I'm wrong), the Splunk OTel RUM SDK creates an opinionated model of the Android app execution and usage lifecycle via Spans, and record mobile specific things like activity navigation, app crashes, ANRs, etc. into a "current" Span defined by said model. Custom spans and other events are also going to be parented by the current Span.

For us, we already have an opinionated model of the Android app usage and execution lifecycle on top of which we record app events, but all this is not in OTel yet, though we'd like them to be one day. As such, we haven't fully thought through how we'd like to model the app process via OTel Spans, especially because code execution in the app can't really be cleanly attributed to Activities given the amount of background threads that could be doing things. The kind of performance tracing that are are looking to do can potentially be agnostic to whatever UI is on the screen, so the concept of "current" Span being mapped to an Activity doesn't really fit in with my current thinking. But we admittedly have a lot to catch up on with y'all!

Just to be clear - I'm not objecting to the donation or adding additional requests. But conceptually, we want an Android extension to the OTel Java SDK to eventually have the following attributes:

  • An modular and extensible architecture that allows for different opinionated modelings of the app lifecycle and user sessions
  • Strong semantic conventions to define the shapes of "mobile specific things" (as well as what OTel primitives to use) that get logged while the app is running (e.g. ANRs, UI events, etc.)
  • A plugin-able interface for services for writing said mobile specific things to the appropriate places that allows for different implementations to generate the thing differently
  • Implementations for common functionality like local persistence exporter that are designed for Android devices of all performance level that takes care of basic concerns like limits, batching, backpressure, etc. but is opened in what data format is used and how it's stored
  • A Kotlin-based, more Android-y API that looks like an Android library rather than a backend Java library

We (Embrace and myself personally) would love to contribute to the development of all of this out in the open and with the community, and we are in the process of open sourcing our SDKs (timeline TBD) in order to further that. I look forward to discussing this further with y'all in the next Client SIG meeting!

@abhaysood
Copy link

abhaysood commented Jun 16, 2023

I'm really happy to see activity in this area.

I agree with points mentioned by @bidetofevil. In addition, I'd like to add one more attribute: Otel should not dictate the library versions of dependencies, such as AndroidX fragments, navigation, etc. I noticed this in the Splunk SDK. This can become extremely challenging very quickly.

I've been doing some prototypes by using Otel-Java for Android recently, and I think there is a lot to do at the moment. Kudos to Splunk for making this donation and triggering these discussions.

@LikeTheSalad
Copy link

I agree with and I think I understand the overall idea of the points provided by @bidetofevil - However, there's one that's not quite clear to me, which is the one that reads:

A Kotlin-based, more Android-y API that looks like an Android library rather than a backend Java library

I think it's not specific enough. And just to be clear, I like the idea of having a Kotlin extension library that provides more Kotlin idiomatic ways to interact with the Java SDK, as it's common with Google libraries such as the Jetpack ones that provide both the pure Java and the -ktx artifacts. However, my doubt is more related to what a backend Java library means in this sentence, because apart from the tools provided in the OTel Java Instrumentation repo, the core OTel Java SDK seems to be pretty generic enough for any platform.

So I believe we should come up with a list of the parts of the current API that don't align with Android/Kotlin standards to make it easier to address because I think it's difficult to get a clear idea of what actions to take based only on what an "Android library" vs "backend library" API should look like, as different people might have different views on those. And once we have that defined, I think we could try to address the Kotlin-specific ones in the existing OTel Kotlin extension lib.

@bidetofevil
Copy link

Haha, yeah, I didn't mean for my list of bullet points to be a technical spec and meant to telegraph that by being pretty imprecise and adjectiving the noun Android. When we get down to it, I will for sure come up with a list of specific things. While I understand the library is generic enough for Android usage, I'm looking for an interface that feels more like an Android library in terms of conventions, not just compatibility.

Off the top of my head, I'm looking for the following:

  • Constructors with named parameters with sensible defaults rather than Builders (where it makes sense)
  • val, var, and defaulting to non-null for parameters
  • Lambdas where appropriate

Again, I want to stress that I know the existing Java API is totally compatible for Android devs - these are just quality of life improvements that will make the API seem more natural for us. I'm not looking to bike shed either or sneak in my favourite conventions - there are just some things that are generally accepted to be idiomatic for modern Android code that makes it different than server-side Java. As much as possible, I want to bring that to an Android extension.

Is that more clear?

@LikeTheSalad
Copy link

Ah, I got what you're saying, I think it's the Android/backend wording that confused me a little. So yeah I agree for a lot of the Java-style ways to interact with the API we can and should wrap those with Kotlin idioms where it makes sense! The Kotlin extensions lib seems to be a good option to add those in for reusability.

Thanks for the clarification!

@bherbst
Copy link

bherbst commented Jun 20, 2023

I'll also say that using SPI as the configuration mechansim for some components (e.g. context storage) is a fairly foreign pattern for most Android developers. It certainly works, but is one of those things that just doesn't feel right stepping into the space.

Secondly, while I know that OTel's policy is to support back to Java 6 specifically for Android when a particular package is identified as potentially useful for Android developers, the fact that it isn't the default creates friction. I haven't had time to address it yet, but my PR to make the RxJava 2 instrumentation Android-friendly ended up not working because that instrumentation depends on another OTel package (instrumentation-annotations-support) that is also not compatible. This introduces a fair amount of additional work any time an Android team wants to use a new OTel artifact.

@jack-berg
Copy link
Member

I'll also say that using SPI as the configuration mechansim for some components (e.g. context storage) is a fairly foreign pattern for most Android developers

I'd hope (and assume) that reconfiguring context storage is not a common thing. Configuration of almost all other SDK components does not require SPI.

while I know that OTel's policy is to support back to Java 6

We support java 8+.

@jack-berg
Copy link
Member

Hi everyone - just wanted to give an update on the donation. As mentioned, the due diligence is complete. We had some followup discussion with various folks and made minor updates.

We're currently blocked on satisfying the acceptance requirement:

Add maintainers from at least two different companies.

As described in the community membership document, there's a fairly high bar for maintainers which includes requirements such as being active members of the community, a deep understanding of the domain, and sustained direct contribution. If any OpenTelemetry community members are interested in working in the Android space in a maintainer capacity, you're encouraged to self nominate. Since this project doesn't yet exist, you can reach out to myself or another TC member via CNCF slack to discuss more.

I'll update this issue if / when we have met the maintainer requirement and are able to proceed.

Thanks for your patience.

@bherbst
Copy link

bherbst commented Jul 7, 2023

while I know that OTel's policy is to support back to Java 6

We support java 8+.

The insturmentation repo has animalsniffer configured for some packages specifically to support Android versions without full Java 8 support (original issue here: open-telemetry/opentelemetry-java-instrumentation#3913).

Setting the project-wide standard at Java 8 is exactly the problem with properly supporting Android here- while some Java 8 APIs are available on older versions via desugaring many are not. For example, the instrumentation-annotations-support package used by a few of the instrumentation libraries such as the RxJava instrumentation uses MethodHandle which wasn't added until Android API level 26 (aka Android 8.0, aka Oreo).

While we are starting to get to a point in the Android ecosystem where apps may start setting their minimum support level to 26 and above, many apps are still supporting as low as API 21 and if OTel can't support that, it will be a non-starter for many.

To be clear - I don't think that this is a particularly widespread problem in OTel. Most OTel packages work just fine on Android devices without JDK support. But there are definite rough edges to compatibility.

@jack-berg
Copy link
Member

opentelemetry-java and opentelemetry-java-instrumentation each have animalsniffer configured ensuring Android API version 21 compatibility for artifacts we anticipate are likely to be used in android applications. If you come across a module like instrumentation-annotations-support which does not have have animalsniffer enabled but you believe should, please open an issue and I'm sure the maintainers (myself included) will be happy to discuss feasibility.

@jack-berg
Copy link
Member

jack-berg commented Jul 21, 2023

Hi all -

This donation has been officially accepted! A new repository open-telemetry/opentelemetry-android has been created to house the donated code, with @breedx-splk and @LikeTheSalad as the initial maintainers of the project.

@breedx-splk please seed the project with donated code, following the recommendations from the diligence document.

Thank you @breedx-splk for driving this and thank you everyone for your interest, comments, and feedback on OpenTelemetry Android. I invite you all to follow the opentelemetry-android project and hope to see you participating, whether through issues, PRs, or code reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests