Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The value of parentId depends on sampling #687

Closed
avnersin opened this issue May 6, 2024 · 1 comment · Fixed by #720
Closed

The value of parentId depends on sampling #687

avnersin opened this issue May 6, 2024 · 1 comment · Fixed by #720
Labels
bug A general bug
Milestone

Comments

@avnersin
Copy link

avnersin commented May 6, 2024

Describe the bug

We are using micrometer with otel in a spring boot application.
We have a test where we call various rest endpoints (without any input trace headers) and capture trace information on the server side.
The test has an assertion that the current parent span id (tracer.currentSpan().context().parentId()) is always null.
This assertion should be correct as we do not have any nested spans in this test.
However, intermittently we see a non null parent span id.
This appears to be affected by the sampling configuration. If we set management.tracing.sampling.probability to 0.0 then the issue is not reproducible and the parent span id is always null in this test.

Environment

  • Spring Boot 3.2.5
  • micrometer-tracing 1.2.5
  • OS: macOS
  • Java version: 17

To Reproduce

tracing-test.zip

Attached is a simplified test that reproduces the issue.
Run mvn clean install or run TracingTest from your IDE.
The test runs 100 iterations of calling a rest endpoint that returns the current traceId, spanId and parentSpanId on the server side.
The assertion that the parentSpanId is null fails on an arbitrary iteration.
If we uncomment line 18, that is - if we set management.tracing.sampling.probability to 0.0 then the test does not fail.

Expected behavior

It is counter intuitive that the result of tracer.currentSpan().context().parentId() will be different for different invocations of the exact same flow. We expect to get in this flow a null span parent id, regardless of sampling.

Additional context

In the attached simplified test we could only reproduce a "0000000000000000" parent span id. In our real test (that we cannot attach here), we see "real" parent span ids. So the test attached here does not exactly reproduce the issue we see in our real test. However, in our real test we also see that the issue is not reproduced if we set management.tracing.sampling.probability to 0.0. Therefore, it looks like the cause of the issue in both tests is the decision to record a particular span which is causing a different behaviour with respect to the parent span id value.

@jonatan-ivanov jonatan-ivanov transferred this issue from micrometer-metrics/micrometer May 6, 2024
@jonatan-ivanov
Copy link
Member

jonatan-ivanov commented May 15, 2024

I was able to reproduce this using purely OTel, here is a much simpler reproducer:

// using io.opentelemetry.api.trace.Tracer

@Bean
CommandLineRunner commandLineRunner(Tracer tracer) {
	return args -> {
		Span span = tracer.spanBuilder("test").startSpan();
		try (Scope ignored = span.makeCurrent()) {
			if (span instanceof ReadableSpan readableSpan) { // if the span is sampled				System.out.println(readableSpan.toSpanData().getParentSpanId());
			}
			else { // if the span is not sampled
				System.out.println(span.getClass().getName());
			}
		}
		finally {
			span.end();
		}
	};
}

You were right the behavior depends on sampling. If you add the above to your app, and enable 100% sampling in application.properties:

management.tracing.sampling.probability=1.0

you will get an "invalid" parentId from OTel: 0000000000000000 (since there is no parent).
But if you disable sampling:

management.tracing.sampling.probability=1.0

The Span that OTel returns is a PropagatedSpan which is not a ReadableSpan ( and it is not public either).

I reached out to OTel maintainers, this might be a design flaw in OTel (I'm not sure if this is intentional or not at this point). If this cannot/will not be fixed in OTel (and for earlier versions of Micrometer Tracing), I think we might be able to return the invalid id in OtelTraceContext if OTel does not return a ReadableSpan.

marcingrzejszczak added a commit that referenced this issue May 21, 2024
- without this change we're returning an INVALID parent span even though we define in the javadocs of TraceContext that we would return null
- with this change we're verifying whether the INVALID parent is set in which case we're returning null

fixes gh-687
@jonatan-ivanov jonatan-ivanov added this to the 1.2.7 milestone May 21, 2024
@jonatan-ivanov jonatan-ivanov changed the title The return value of TraceContext.parentId depends on sampling The value of parentId depends on sampling May 21, 2024
marcingrzejszczak added a commit that referenced this issue May 24, 2024
…720)

* Assumes that INVALID parent span id is actually null in TraceContext

- without this change we're returning an INVALID parent span even though we define in the javadocs of TraceContext that we would return null
- with this change we're verifying whether the INVALID parent is set in which case we're returning null

note: OTel doesn't store parent id for non sampled spans in which case we will return null

fixes gh-687
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants