Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible resource leak with opentelemetry-otlp and tonic #1048

Closed
redshirtrob opened this issue Apr 27, 2023 · 5 comments
Closed

Possible resource leak with opentelemetry-otlp and tonic #1048

redshirtrob opened this issue Apr 27, 2023 · 5 comments
Labels
A-trace Area: issues related to tracing bug Something isn't working M-sdk triage:todo Needs to be traiged.

Comments

@redshirtrob
Copy link

I'm experimenting with using opentelemetry-otlp to export spans using tonic. I also have tokio-console wired in to observe behavior.

When exporting spans using otlp, I see an unbounded growth in tonic executors. Here's an example screenshot from tokio-console:

otlp-tokio-console

Note all the tonic executor tasks with 3 polls. The longer the program runs, the more such tasks pile up.

Here's the code I used to produce this:

use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::Registry;
use opentelemetry::{
    global,
    sdk::{
        propagation::TraceContextPropagator,
        trace::{RandomIdGenerator, Sampler},
    },
};

use opentelemetry_otlp::WithExportConfig;

#[tokio::main]
async fn main() {
    global::set_text_map_propagator(TraceContextPropagator::new());
    
    let tracer = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint("http://localhost:8360")
        )
        .with_trace_config(
            opentelemetry::sdk::trace::Config::default()
                .with_id_generator(RandomIdGenerator::default())
                .with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(0.05))))
                .with_resource(opentelemetry::sdk::Resource::new(vec![
                    opentelemetry::KeyValue::new("service.name", "aardvark-service"),
                ]))
        )
        .install_batch(opentelemetry::runtime::TokioCurrentThread)
        .unwrap();

    let layer = tracing_opentelemetry::layer()
        .with_tracer(tracer);

    let subscriber = Registry::default().with(layer);

    let console_layer = console_subscriber::ConsoleLayer::builder()
        .with_default_env()
        .enable_self_trace(true)
        .spawn();

    let subscriber = subscriber.with(console_layer);
    
    tracing::subscriber::set_global_default(subscriber).unwrap();
    
    loop {
        let _span = tracing::span!(tracing::Level::INFO, "test-span").entered();
        std::thread::sleep(std::time::Duration::from_millis(10));
    }
}

If I switch to the jaeger exporter, I do not see this kind of behavior:

jaeger-tokio-console

Here's the same code with a jaeger exporter:

use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::Registry;
use opentelemetry::{
    global,
    sdk::{
        propagation::TraceContextPropagator,
        trace::{RandomIdGenerator, Sampler},
    },
};

#[tokio::main]
async fn main() {
    global::set_text_map_propagator(TraceContextPropagator::new());
    
    let tracer = opentelemetry_jaeger::new_agent_pipeline()
        .with_endpoint("localhost:6831")
        .with_service_name("aardvark-service")
        .with_instrumentation_library_tags(true)
        .with_trace_config(
            opentelemetry::sdk::trace::Config::default()
                .with_id_generator(RandomIdGenerator::default())
                .with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(0.05))))
        )
        .install_batch(opentelemetry::runtime::TokioCurrentThread)
        .unwrap();

    let layer = tracing_opentelemetry::layer()
        .with_tracer(tracer);

    let subscriber = Registry::default().with(layer);

    let console_layer = console_subscriber::ConsoleLayer::builder()
        .with_default_env()
        .enable_self_trace(true)
        .spawn();

    let subscriber = subscriber.with(console_layer);
    
    tracing::subscriber::set_global_default(subscriber).unwrap();
    
    loop {
        let _span = tracing::span!(tracing::Level::INFO, "test-span").entered();
        std::thread::sleep(std::time::Duration::from_millis(10));
    }
}

Here's my Cargo.toml:

[package]
name = "tracing-repro"
version = "0.1.0"
edition = "2021"

[dependencies]
console-subscriber = { version = "0.1.8" }
opentelemetry = { version = "0.18", features = ["trace", "metrics", "rt-tokio-current-thread",] }
opentelemetry-jaeger = { version = "0.17", features = ["tokio", "rt-tokio-current-thread",] }
opentelemetry-otlp = "0.11"
tokio = { version = "=1.21.2", features = ["full", "tracing"] }
tracing = "0.1.37"
tracing-subscriber = {version = "0.3.16", features = ["env-filter", "fmt", "registry"]}
tracing-opentelemetry = "0.18"

I'm not 100% certain this is an opentelemetry-otlp issue. I realize this could be a tokio-console reporting issue, or perhaps poor coding on my part. I'm starting here because:

  1. This feels like an actual resource leak to this old C programmer
  2. I can alter the behavior by swapping out the exporter

I'm happy to do any other investigation or provide any other information, but I may need guideposts along the way.

@TommyCpp TommyCpp added the A-trace Area: issues related to tracing label May 1, 2023
@MadVikingGod
Copy link

This looks to be because of tonic. I ran this again with grpc-sys instead of tonic, with results similar to your jaeger example. I then added the tokio-console to the tonic example and saw the same problematic behavior.

@djc
Copy link
Contributor

djc commented May 26, 2023

Sorry to hear you're running into issues! I'm afraid I won't be able to dig into this more, maybe ask the tonic folks (I recommend their Discord channel) for some feedback on where the problem could be happening? If you're seeing it with the tonic example that seems a smoking gun that there might be a problem deeper in the stack.

@pksunkara
Copy link

Would this cause delay (of 2-3 mins) in sending spans (with simple span processor) and sometimes even crashing the program? Because I am currently observing this.

I couldn't get grpc-sys to work on my system so am unable to test it.

@stormshield-fabs
Copy link
Contributor

I've updated the dependencies in the provided code and was not able to reproduce this issue. I'm using Jaeger as the OTLP backend (docker run --rm -it -p4317:4317 -p16686:16686 jaegertracing/all-in-one:latest).

I suggest that we close this issue, especially since it hasn't seen any updates in more than 6 months.

Cargo.toml
[package]
name = "tracing-repro"
version = "0.2.0"
edition = "2021"

[dependencies]
console-subscriber = { version = "0.2" }
opentelemetry = { version = "0.21", features = ["trace", "metrics"] }
opentelemetry-otlp = "0.14"
opentelemetry_sdk = { version = "0.21", features = ["rt-tokio-current-thread"] }
tokio = { version = "1.36", features = ["full", "tracing"] }
tracing = "0.1.40"
tracing-opentelemetry = "0.22"
tracing-subscriber = { version = "0.3" }
main.rs
use std::time::Duration;

use opentelemetry_sdk::runtime::TokioCurrentThread;
use opentelemetry_sdk::trace::Config;
use opentelemetry_sdk::Resource;
use tokio::time::sleep;
use tracing::info_span;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::Registry;

use opentelemetry_otlp::WithExportConfig;

#[tokio::main]
async fn main() {
    let tracer = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint("http://localhost:4317"),
        )
        .with_trace_config(Config::default().with_resource(Resource::new(vec![
            opentelemetry::KeyValue::new("service.name", "aardvark-service"),
        ])))
        .install_batch(TokioCurrentThread)
        .unwrap();

    let layer = tracing_opentelemetry::layer().with_tracer(tracer);

    let subscriber = Registry::default().with(layer);

    let console_layer = console_subscriber::ConsoleLayer::builder()
        .with_default_env()
        .spawn();

    let subscriber = subscriber.with(console_layer);

    tracing::subscriber::set_global_default(subscriber).unwrap();

    loop {
        let _span = info_span!("test-span").entered();
        sleep(Duration::from_millis(10)).await;
    };
}

@cijothomas
Copy link
Member

Thanks @stormshield-fabs . Will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trace Area: issues related to tracing bug Something isn't working M-sdk triage:todo Needs to be traiged.
Projects
None yet
Development

No branches or pull requests

8 participants