-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traceid and spanid wrong when using Kotlin Reactor Coroutines #174
Comments
Since I'm not an expert in Kotlin - we have a test here that proves that this is working (https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/test/kotlin/io/micrometer/core/instrument/kotlin/AsContextElementKtTests.kt). How does what we do differ from what you do? |
@marcingrzejszczak To be honest I don't know exactly. I don't use Observation directly but I am using the Reactor Coroutine library like the following:
This |
@marcingrzejszczak @shakuzen
The first statement logs the trace id and span id as expected. The second statement mostly has zeroed values. Of course the difference is that we have thread switch after the |
The problem is related to setting the span to null on the scope reset when there is no other span observation to fall back to, and how Basically, when coroutines being finished, it does the following:
In the end we get the current Otel context being set to the invalid span, even if before it was set to null, which results to a polluted ThreadLocal variable. Somewhere in I'm not entirely sure what is the right approach here, as I don't fully understand the difference between |
AFAIR the problem in OTel is that you can't remove a span from thread local - you can only set it to invalid. OpenTelemetry will never return a public interface Span extends ImplicitContextKeyed {
/**
* Returns the {@link Span} from the current {@link Context}, falling back to a default, no-op
* {@link Span} if there is no span in the current context.
*/
static Span current() {
Span span = Context.current().get(SpanContextKey.KEY);
return span == null ? getInvalid() : span;
}
/**
* Returns an invalid {@link Span}. An invalid {@link Span} is used when tracing is disabled,
* usually because there is no OpenTelemetry SDK installed.
*/
static Span getInvalid() {
return PropagatedSpan.INVALID;
} So if we |
@marcingrzejszczak @vooft What does this mean that the library cannot be used in conjunction with Kotlin Coroutines + Reactor? |
I didn't say that. I'm just replying to @vooft comment |
@marcingrzejszczak I am trying to use logger.info("has context")
CoroutineScope(Dispatchers.Default).launch(errorHandler) {
withContext(observationRegistry.asContextElement()) {
logger.info("has no context")
}
}
logger.info("also has no context") I am using spring boot and injecting observationRegistry, not sure if that is proper usage. Also I can get the "has no context" logging properly if I change |
Hi @marcingrzejszczak. We have the same problem as @akoufa but with coroutines router (we are using Brave insead of OTel). @Configuration(proxyBeanMethods = false)
class RoutesConfiguration(@Autowired private val observationRegistry: ObservationRegistry) {
private val logger = KotlinLogging.logger {}
// traceId&spanId are logged only before delay
@Bean
fun notWorkingCoRouter() = coRouter {
GET("/log1") {
logger.info("before")
delay(500)
logger.info { "after" }
ServerResponse.ok().buildAndAwait()
}
}
// traceId&spanId are logged before and after delay
@Bean
fun workingRouter() = router {
GET("/log2") {
mono(observationRegistry.asContextElement()) {
logger.info("before")
delay(500)
logger.info { "after" }
ServerResponse.ok().buildAndAwait()
}
}
}
} Do you think this issue can be fixed or we should use some kind of workaround? |
Is there any workaround that does not include injecting the observationRegistry to all controllers? And is there any plan to fix this issue? |
I don't have enough Kotlin knowledge. If there's anyone that can help us out with this then please be my guest and comment |
@marcingrzejszczak in general the problem is that when a coroutine opens, the context might be lost in the child coroutines. for example: @RestController
class FooRestController {
@GetMapping
suspend fun foo(): Foo {
logger.info { "before" }
delay(500) // kotlin decide to change to coroutine#2
logger.info { "after" } // we lose the observation context
}
@GetMapping
fun foo(): Mono<Foo> {
return mono(observationRegistry.asContextElement()) {
logger.info { "before" }
delay(500) // kotlin decide to change to coroutine#2
logger.info { "after" } // we do not loss the context, because the parent coroutine was started with the observation context and its being propogated
}
}
} If it was possible to somehow add the Another option is somehow to hook between the MDC context, Reactor context, and Coroutine context. with this library, another working example of propagating properly would be: @RestController
class FooRestController {
@GetMapping
suspend fun foo(): Foo {
logger.info { "before" }
delay(500) // kotlin decide to change to coroutine#2
logger.info { "after" } // we lose the observation context
}
@GetMapping
suspend fun foo(): Foo {
return withContext(MDCContext()) {
logger.info { "before" }
delay(500) // kotlin decide to change to coroutine#2
logger.info { "after" } // we do not loss the context, because kotlinx-coroutines-slf4j handles the binding between MDC and the coroutine context
}
}
} i am not sure where exactly it should be fixed, but in theory if we had an entrypoint to add the observationContext as a coroutine context into the place that spring initialize the coroutine scope for the controller's suspend function.. it should work as expected. |
FYI, on the Spring side, for Controllers, the coroutine entry point is provided by For Web MVC: Its Notably, it does not give a For WebFlux: Similar to Web MVC, it calls Maybe Spring should take whatever is in Reactor Context at this point and copy it into Coroutine Context? EDIT: I don't have a suggested workaround, a fix probably needs to happen in Spring itself. EDIT 2: There are two |
I think we need help from Sebastien. Let us summon our Kotlin Guru @sdeleuze! |
If it helps, i played with spring's code and the only thing that worked for me is getting the observationContext and using it in CoroutineContext combinedContext;
if (observationRegistry != null) {
combinedContext = KCoroutineUtils.asCoroutineContext(observationRegistry)
.plus(context);
} else {
combinedContext = context;
}
Mono<Object> mono = MonoKt.mono(combinedContext, (scope, continuation) ->
KCallables.callSuspend(function, getSuspendedFunctionArgs(method, target, args), continuation))
.filter(result -> !Objects.equals(result, Unit.INSTANCE))
.onErrorMap(InvocationTargetException.class, InvocationTargetException::getTargetException); KCoroutineUtils object KCoroutineUtils {
@JvmStatic
fun asCoroutineContext(observationRegistry: ObservationRegistry): CoroutineContext {
return observationRegistry.asContextElement()
}
} For the POC i just took the observationContext from the applicationContext in if (isSuspendingFunction) {
ObservationRegistry observationRegistry = exchange.getApplicationContext().getBean(ObservationRegistry.class);
value = CoroutinesUtils.invokeSuspendingFunction(method, getBean(), observationRegistry, args);
} If we find a way to inject the observationRegistry at this point, everything will work as intended. I also tried to use |
@marcingrzejszczak Do we have any update on this or a suggestion for a temporary workaround on our side? Thank you |
Be aware there are a bunch of Coroutines context issues still open on Spring Framework, and this area needs refinements to support all the use cases. See for example spring-projects/spring-framework#26977, spring-projects/spring-framework#27522 and spring-projects/spring-framework#27010. It is expected to have seemless context propagation when doing Reactor to Coroutines conversion (this is handled on Coroutines Reactor integration side) but when creating a new Coroutines context and when an existing should be reused likely requires some refinements. We probably need to plan some related discussion with me and @marcingrzejszczak and/or @bclozel to discuss the design of such refined integration. |
Any updates regarding this issue? |
We're updating the issue when there are updates. We need to discuss things with @sdeleuze but there were other priorities so far |
Hey, I am starting working on that and may ask guidance to Kotin team on the best way to fix that. This kind of issue seems to impact both Micrometer and Spring as soon as |
So the challenge was to pass the I made progresses on Spring Framework side, for now focusing on the functional router DSL:
I will try to refine the annotation-based WebFlux support as well. I am not sure what need to be done or not on Micrometer side, but hopefully those progresses on Spring Framework side will help. Feedback on those draft commits welcomed if any, otherwise I will move forward on Spring Framework side and drop a comment here when the various improvement will be available in Spring Framework |
With the 3 issues above fixed, If somebody could build and share a reproducer with Spring |
I have just added the last part of Spring Framework 6.1 |
Could it be related to the usage of |
@meberhard Not sure if this has anything to do, but have you tried declaring |
@grassehh is correct - |
Hint: you can also use |
I tried removing the two lines
-> same result, no context/traceId in the
Also tried that -> moved this to the WebConfig, same result (no context/traceId in the I tried it both - with the
Just FYI - with
good point -> I don't even know why it worked at all with the
Awesome! Thanks a lot for this 🙏 |
I have fixed the nested route + context issue via spring-projects/spring-framework#31831, so please test and report any remaining problem with Spring Framework |
I can confirm that the nesting now works as expected, but the context is still not propagated to the
For the route |
@meberhard This works for me so please double check: @Bean
fun router() = coRouter {
context { CoroutineName("Custom context") }
"/nested".nest {
GET("/") {
throw RuntimeException()
}
}
onError<RuntimeException> { _, _ ->
ok().bodyValueAndAwait(currentCoroutineContext().toString())
}
} If you can reproduce, please create a Spring Framework issue with a self contained repro as an attached archive or a link to a repository. |
I am not sure if this is related but in my original comment #174 (comment), when starting a new coroutine within a contexed span, it would lose all span/trace info. Is this expected when launching a new coroutine? |
We have potentially a repro on spring-projects/spring-framework#31893 (which uses |
See spring-projects/spring-framework#31893 (comment) for ideas on what to explore on Micrometer side to provide proper Coroutines support. |
So, @sdeleuze and I were investigating on a different issue and talked about this one in the process. Upgrade to Spring Boot 3.2.2Sebastien fixed a few things in recent Spring Framework versions, so please check this version first. Contribute a custom CoWebFilterIf the above didn't work, please continue with this step. First, add the following dependencies to your application: implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-reactor'
implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-slf4j:1.7.3' Add the following class to your application in a package that's scanned for components: package com.example.kotlinschedule
import kotlinx.coroutines.slf4j.MDCContext
import kotlinx.coroutines.withContext
import org.springframework.stereotype.Component
import org.springframework.web.server.CoWebFilter
import org.springframework.web.server.CoWebFilterChain
import org.springframework.web.server.ServerWebExchange
@Component
class MDCCoWebFilter : CoWebFilter() {
override suspend fun filter(exchange: ServerWebExchange, chain: CoWebFilterChain) {
withContext(MDCContext()) {
chain.filter(exchange)
}
}
} Then check with one of your controllers whether the traceId and spanId information is available from logs statements in controller handlers. |
@bclozel as I wrote in spring issue I think this may be what you are looking for I can enrich the example and post the zip in this discussion if you want |
@antechrestos I'm not sure I understand. You're saying this filter solution works for your case then? I don't need a sample now just confirmation. For me in some cases it worked even without that filter, hence my question to the community. Thanks! |
@bclozel without filter,, when I get result from a CoroutineCrudRepository the traceid is lost With a filter that get the current observation.asCoroutineContext it works Yet my code does no delay, and so on. |
The code I used @Bean
fun coroutineContextFilter(observationRegistry: ObservationRegistry): CoWebFilter = object : CoWebFilter() {
override suspend fun filter(exchange: ServerWebExchange, chain: CoWebFilterChain) {
withContext(observationRegistry.asContextElement()){
chain.filter(exchange)
}
}
} |
@bclozel I enriched my example A @Bean
fun coroutineContextFilter(observationRegistry: ObservationRegistry): CoWebFilter = object : CoWebFilter() {
override suspend fun filter(exchange: ServerWebExchange, chain: CoWebFilterChain) {
when(exchange.request.uri.path) {
"/api/v1/users-without-context" -> chain.filter(exchange)
"/api/v1/users-with-mdc" -> withContext(MDCContext()){
chain.filter(exchange)
}
else -> withContext(observationRegistry.asContextElement()){
chain.filter(exchange)
}
}
}
} And a controller with different scenario and same treatment @PostMapping("/api/v1/users-without-context")
@ResponseStatus(HttpStatus.CREATED)
suspend fun createUserNoContext(@RequestBody user: CreateUserDto) : UserDto {
return launchUserCreation(user)
}
@PostMapping("/api/v1/users")
@ResponseStatus(HttpStatus.CREATED)
suspend fun createUser(@RequestBody user: CreateUserDto) : UserDto {
return launchUserCreation(user)
}
@PostMapping("/api/v1/users-with-mdc")
@ResponseStatus(HttpStatus.CREATED)
suspend fun createUserMdc(@RequestBody user: CreateUserDto) : UserDto {
return launchUserCreation(user)
}
private suspend fun launchUserCreation(user: CreateUserDto): UserDto {
logger.debug("Start save of user {}", user)
return userRepository.create(
User.newUser(
firstname = user.firstname,
name = user.name,
)
)
.let { toWeb(it) }
.also {
logger.debug("Wait for it")
longTreatment(it)
logger.debug("Created user {}", it)
}
}
private suspend fun longTreatment(userDto: UserDto) {
Mono.just(userDto)
.flatMap { mono(MDCContext()) {
logger.debug("Start long treatment")
delay(Duration.ofSeconds(2).toMillis())
logger.debug("End long treatment")
} }
.tap(
Micrometer.observation(observationRegistry) { registry ->
val observation = Observation.createNotStarted("long-treatment", registry)
.contextualName("long-treatment")
observation
}
)
.awaitSingle()
} Without contexts
We can see that starting a new observation let the However we see thaat soan are well corralated With contextwhatever context ( I prefer using |
@sdeleuze why not putting a default |
@antechrestos Maybe we could use a variant of that combined with @bclozel findings, see spring-projects/spring-framework#32165 (comment) for more details. |
@sdeleuze nice |
I have created a separate issue for extending Micrometer Observation API with some more convenient methods to help using it from Kotlin code: micrometer-metrics/micrometer#4754. Not sure if you folks would agree that it's necessary though. |
@sdeleuze I have been trying to find a ticket in Spring Boot which would take care of adding the following 2 bits of configuration automatically (i.e. things that are needed to ensure context propagation), as otherwise pretty much every micro-service developed with Spring Boot & Coroutines needs to add these (or something similar): Is this something on your radar? |
@ilya40umov Related issue is likely going to be #174 (comment) (we need to update the title to expand its scope) so feel free to comment there after reading my latest proposal. |
@bclozel Hi, I'm trying to start coroutine from GatewayFilter. The tracer works fine till I enter into this flat map " return super.writeWith(fluxBody.buffer().flatMap { dataBuffers: List ->..." https://github.com/component override fun filter(exchange: ServerWebExchange, chain: GatewayFilterChain): Mono {
} } |
Hello. When using Kotlin Reactor Coroutines like the following:
then the traceId and spanId that are being included in the log statement above look like the following:
The traceId and spanId in the logs are always 0 (zeros) .
This happens only for log statements inside such
mono {}
blocks. Other log statements outside of these blocks work as expected.I have created a sample reproducer project here:
https://github.com/akoufa/springbootissues
Just start the Spring Boot app and use the following curl request:
The text was updated successfully, but these errors were encountered: