Exporter jaeger encountered the following error(s): thrift agent failed with message too long #851

taladar · 2022-07-25T11:09:53Z

I am using opentelemetry-jaeger 0.16.0 with tracing-opentelemetry 0.17.4 on Linux along with the latest Jaeger binary (non-docker) release.

I did see the old tickets on this issue (#648 #676 and #759 ) but none of the workarounds described there seem to have any significant impact on the issue.

I tried setting OTEL_BSP_MAX_EXPORT_BATCH_SIZE to progressively lower values down to 1 which did not make the error disappear. I tried using with_auto_split_batch(true) with with_max_packet_size with values of 8192, 4096, 1024, 512 and finally 256 and the error did not disappear. I tried install_simple() instead of install_batch(Tokio) and the error did not disappear.

I do not produce spans that are particularly large (just half a line of text and a host name as a parameter) but they are relatively short and plentiful.

Some data appears in jaeger but it is just a dozen or two spans before it aborts (per run of my program obviously).

I honestly have my doubts about the UDP packet size explanation in the previous tickets since everyone else seems to be able to handle sending larger amounts of data over UDP just fine.

Maybe it would be useful to add more information to the error message, e.g. how long it was, how many and which spans were batched and what the limit for 'too long' is.

bes · 2022-08-01T20:39:57Z

I too am experiencing this issue and the proposed fixes (as seen in different issues) don't work for me.

OTEL_BSP_MAX_EXPORT_BATCH_SIZE=25 OTEL_BSP_MAX_QUEUE_SIZE=32768 cargo run

fn init_tracer() -> Result<sdktrace::Tracer, TraceError> {
    opentelemetry_jaeger::new_pipeline()
        .with_service_name("trace-demo")
        .with_max_packet_size(9216) // Default max UDP packet size on OSX
        .with_auto_split_batch(true) // Auto split batches so they fit under packet size
        .install_batch(opentelemetry::runtime::Tokio)
}

#[tokio::main()]
async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {
    // JAEGER
    // Create a layer with the configured tracer
    let tracer = init_tracer()?;
    let otel_layer = tracing_opentelemetry::layer().with_tracer(tracer);
    let subscriber = Registry::default().with(otel_layer);
    tracing::subscriber::set_global_default(subscriber).expect("setting default subscriber failed");
}

But still seeing

OpenTelemetry trace error occurred. Exporter jaeger encountered the following error(s): thrift agent failed with message too long

Before the changes above I saw

OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full

I am running a few hundred async tasks in parallel, and a jaeger instance locally.

fmassot · 2022-11-15T12:41:14Z

I observed the same issue on our OSS project quickwit-oss/quickwit#2295

I tried a bunch of different settings but did not manage to make it work. The errors happen when there are a lot of spans, I will try to isolate that and report it here.

fmassot · 2022-11-18T17:51:25Z

I dig into the issue a bit and the problem comes from the number of bytes of spans that will be sent to jaeger.

I guess the error that you have is the same as mine (I added some printf! in the code to have that):

upload error ExportFailed(ThriftAgentError(ProtocolError { kind: SizeLimit, message: "single span's jaeger exporter payload size of 28330 bytes over max UDP packet size of 10000 bytes" }))

I solved the issue by doing two things:

reduce the size of some of my spans: I discovered that I was putting way too much info in them as I'm using the macros #[instrument] on methods and some of my arguments can print a lot of stuff
reduce the size of the batch to a minimum of 8 or 16.

@bes In your case, what is the max size of your UDP packet? I'm on macos and I had to run sudo sysctl -w net.inet.udp.maxdgram=65535 to have a decent size.

taladar · 2022-11-18T23:58:36Z

That whole protocol design seems broken if you need to tweak the max UDP package size to work around the flaws in its design.

TommyCpp · 2022-11-20T20:42:30Z

As an alternative. Have you tried auto_split_batch. This config will automatically split the span batches if it exceeded the UDP max size for one packet.

Note that it has a performance overhead

taladar · 2022-11-21T06:26:26Z

As mentioned in my first post, I did try that and it was just as broken.

TommyCpp · 2022-11-21T06:58:32Z

As mentioned in my first post, I did try that and it was just as broken.

Ah sorry I missed it. In this case, the most likely cause is that one of the spans exceeded the limit of UDP packet. Since we cannot split the span we have to fail the request.

As for the debugging information. We use the apache thrift rust client so the only information we will know is the error passed to us from thrift agent. I will see what's available and add some more context in the error message.

For the protocol design part, the UDP limit for jaeger is a known issue(See https://www.jaegertracing.io/docs/1.39/client-libraries/#emsgsize-and-udp-buffer-limits) so I don't think we can do more about that. One suggestion is to switch to http client w/ collector

taladar · 2022-11-21T07:15:43Z

add some more context in the error message.

It would probably be useful if we could get the messages size of the message that failed. The whole set of parameters I tried are very opaque in their effect. If you can't print the size maybe you could print the content you pass to thrift. I don't think performance matters very much at the point where nothing is really working.

One suggestion is to switch to http client w/ collector

Could you elaborate on that?

punkeel · 2023-08-05T12:06:33Z

macOS has a max UDP size set to 9216 by default. Would it be acceptable to adjust the values in this library to do the right thing out of the box? Requiring every project to figure this out through trial and error, then fix it, sounds like a waste of time.

$ uname -a
Darwin host 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:22:05 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6000 arm64 arm Darwin
$ sysctl net.inet.udp.maxdgram
net.inet.udp.maxdgram: 9216

cijothomas · 2023-08-07T18:14:33Z

add some more context in the error message.

It would probably be useful if we could get the messages size of the message that failed. The whole set of parameters I tried are very opaque in their effect. If you can't print the size maybe you could print the content you pass to thrift. I don't think performance matters very much at the point where nothing is really working.

One suggestion is to switch to http client w/ collector

Could you elaborate on that?

The dedicated Jaeger exporter is going to be deprecated, so its unlikely to get bug fixes. The recommendation is to use use OTLP Exporter as Jaeger can now natively understand OTLP:
#1022 - This PR has an example. (During a recent refactoring, it looks like the example got lost, will work to bring it back.)

hdost · 2024-02-21T09:15:53Z

As mentioned we're looking to stop producing the crate , if you have feedback please leave it on #995

cijothomas · 2024-03-17T16:51:04Z

Closing jaeger related issues, given its imminent removal. See #995

fmassot mentioned this issue Nov 15, 2022

Jaeger exporter logs errors quickwit-oss/quickwit#2295

Closed

hdost added the M-exporter-jaeger Deprecated - we will likely close most related bugs. label Feb 21, 2024

cijothomas closed this as completed Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exporter jaeger encountered the following error(s): thrift agent failed with message too long #851

Exporter jaeger encountered the following error(s): thrift agent failed with message too long #851

taladar commented Jul 25, 2022

bes commented Aug 1, 2022 •

edited

Loading

fmassot commented Nov 15, 2022 •

edited

Loading

fmassot commented Nov 18, 2022 •

edited

Loading

taladar commented Nov 18, 2022

TommyCpp commented Nov 20, 2022

taladar commented Nov 21, 2022

TommyCpp commented Nov 21, 2022 •

edited

Loading

taladar commented Nov 21, 2022

punkeel commented Aug 5, 2023 •

edited

Loading

cijothomas commented Aug 7, 2023

hdost commented Feb 21, 2024

cijothomas commented Mar 17, 2024

Exporter jaeger encountered the following error(s): thrift agent failed with message too long #851

Exporter jaeger encountered the following error(s): thrift agent failed with message too long #851

Comments

taladar commented Jul 25, 2022

bes commented Aug 1, 2022 • edited Loading

fmassot commented Nov 15, 2022 • edited Loading

fmassot commented Nov 18, 2022 • edited Loading

taladar commented Nov 18, 2022

TommyCpp commented Nov 20, 2022

taladar commented Nov 21, 2022

TommyCpp commented Nov 21, 2022 • edited Loading

taladar commented Nov 21, 2022

punkeel commented Aug 5, 2023 • edited Loading

cijothomas commented Aug 7, 2023

hdost commented Feb 21, 2024

cijothomas commented Mar 17, 2024

bes commented Aug 1, 2022 •

edited

Loading

fmassot commented Nov 15, 2022 •

edited

Loading

fmassot commented Nov 18, 2022 •

edited

Loading

TommyCpp commented Nov 21, 2022 •

edited

Loading

punkeel commented Aug 5, 2023 •

edited

Loading