-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proper context propagation for reactive streams #7644
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 54 metrics, 9 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.41.0-SNAPSHOT~c367127391, baseline=1.41.0-SNAPSHOT~70c9598d61
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.067 s) : 0, 1066718
Total [baseline] (10.398 s) : 0, 10398429
Agent [candidate] (1.07 s) : 0, 1069547
Total [candidate] (10.375 s) : 0, 10375452
section appsec
Agent [baseline] (1.208 s) : 0, 1207953
Total [baseline] (10.559 s) : 0, 10559158
Agent [candidate] (1.202 s) : 0, 1202416
Total [candidate] (10.646 s) : 0, 10645657
section iast
Agent [baseline] (1.193 s) : 0, 1193439
Total [baseline] (10.86 s) : 0, 10860255
Agent [candidate] (1.198 s) : 0, 1197828
Total [candidate] (10.888 s) : 0, 10887890
section profiling
Agent [baseline] (1.276 s) : 0, 1275884
Total [baseline] (10.679 s) : 0, 10679104
Agent [candidate] (1.282 s) : 0, 1281796
Total [candidate] (10.637 s) : 0, 10636895
gantt
title petclinic - break down per module: candidate=1.41.0-SNAPSHOT~c367127391, baseline=1.41.0-SNAPSHOT~70c9598d61
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (680.928 ms) : 0, 680928
BytebuddyAgent [candidate] (682.761 ms) : 0, 682761
GlobalTracer [baseline] (310.332 ms) : 0, 310332
GlobalTracer [candidate] (311.365 ms) : 0, 311365
AppSec [baseline] (53.525 ms) : 0, 53525
AppSec [candidate] (53.535 ms) : 0, 53535
Remote Config [baseline] (663.534 µs) : 0, 664
Remote Config [candidate] (670.705 µs) : 0, 671
Telemetry [baseline] (7.61 ms) : 0, 7610
Telemetry [candidate] (7.53 ms) : 0, 7530
section appsec
BytebuddyAgent [baseline] (702.158 ms) : 0, 702158
BytebuddyAgent [candidate] (698.818 ms) : 0, 698818
GlobalTracer [baseline] (309.268 ms) : 0, 309268
GlobalTracer [candidate] (307.88 ms) : 0, 307880
AppSec [baseline] (162.814 ms) : 0, 162814
AppSec [candidate] (162.653 ms) : 0, 162653
IAST [baseline] (21.5 ms) : 0, 21500
IAST [candidate] (20.768 ms) : 0, 20768
Remote Config [baseline] (649.255 µs) : 0, 649
Remote Config [candidate] (632.529 µs) : 0, 633
Telemetry [baseline] (8.575 ms) : 0, 8575
Telemetry [candidate] (8.38 ms) : 0, 8380
section iast
BytebuddyAgent [baseline] (794.96 ms) : 0, 794960
BytebuddyAgent [candidate] (798.638 ms) : 0, 798638
GlobalTracer [baseline] (299.309 ms) : 0, 299309
GlobalTracer [candidate] (299.798 ms) : 0, 299798
AppSec [baseline] (54.488 ms) : 0, 54488
AppSec [candidate] (56.042 ms) : 0, 56042
IAST [baseline] (23.383 ms) : 0, 23383
IAST [candidate] (21.985 ms) : 0, 21985
Remote Config [baseline] (596.164 µs) : 0, 596
Remote Config [candidate] (603.407 µs) : 0, 603
Telemetry [baseline] (7.03 ms) : 0, 7030
Telemetry [candidate] (7.058 ms) : 0, 7058
section profiling
BytebuddyAgent [baseline] (680.233 ms) : 0, 680233
BytebuddyAgent [candidate] (683.346 ms) : 0, 683346
GlobalTracer [baseline] (395.945 ms) : 0, 395945
GlobalTracer [candidate] (398.543 ms) : 0, 398543
AppSec [baseline] (54.953 ms) : 0, 54953
AppSec [candidate] (55.316 ms) : 0, 55316
Remote Config [baseline] (660.296 µs) : 0, 660
Remote Config [candidate] (670.153 µs) : 0, 670
Telemetry [baseline] (7.48 ms) : 0, 7480
Telemetry [candidate] (7.534 ms) : 0, 7534
ProfilingAgent [baseline] (97.718 ms) : 0, 97718
ProfilingAgent [candidate] (97.303 ms) : 0, 97303
Profiling [baseline] (97.744 ms) : 0, 97744
Profiling [candidate] (97.328 ms) : 0, 97328
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.41.0-SNAPSHOT~c367127391, baseline=1.41.0-SNAPSHOT~70c9598d61
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.076 s) : 0, 1075639
Total [baseline] (8.597 s) : 0, 8597143
Agent [candidate] (1.068 s) : 0, 1068009
Total [candidate] (8.549 s) : 0, 8549375
section iast
Agent [baseline] (1.195 s) : 0, 1194959
Total [baseline] (9.07 s) : 0, 9070414
Agent [candidate] (1.218 s) : 0, 1217778
Total [candidate] (9.168 s) : 0, 9167720
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.211 s) : 0, 1211095
Total [baseline] (9.142 s) : 0, 9142052
Agent [candidate] (1.198 s) : 0, 1198464
Total [candidate] (9.107 s) : 0, 9107305
section iast_TELEMETRY_OFF
Agent [baseline] (1.206 s) : 0, 1205807
Total [baseline] (9.149 s) : 0, 9149118
Agent [candidate] (1.201 s) : 0, 1201421
Total [candidate] (9.107 s) : 0, 9106727
gantt
title insecure-bank - break down per module: candidate=1.41.0-SNAPSHOT~c367127391, baseline=1.41.0-SNAPSHOT~70c9598d61
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (686.12 ms) : 0, 686120
BytebuddyAgent [candidate] (681.867 ms) : 0, 681867
GlobalTracer [baseline] (313.289 ms) : 0, 313289
GlobalTracer [candidate] (310.688 ms) : 0, 310688
AppSec [baseline] (54.258 ms) : 0, 54258
AppSec [candidate] (53.617 ms) : 0, 53617
Remote Config [baseline] (672.117 µs) : 0, 672
Remote Config [candidate] (663.064 µs) : 0, 663
Telemetry [baseline] (7.57 ms) : 0, 7570
Telemetry [candidate] (7.521 ms) : 0, 7521
section iast
BytebuddyAgent [baseline] (796.372 ms) : 0, 796372
BytebuddyAgent [candidate] (813.592 ms) : 0, 813592
GlobalTracer [baseline] (299.356 ms) : 0, 299356
GlobalTracer [candidate] (303.262 ms) : 0, 303262
AppSec [baseline] (55.22 ms) : 0, 55220
AppSec [candidate] (57.574 ms) : 0, 57574
IAST [baseline] (22.702 ms) : 0, 22702
IAST [candidate] (21.591 ms) : 0, 21591
Remote Config [baseline] (599.43 µs) : 0, 599
Remote Config [candidate] (620.579 µs) : 0, 621
Telemetry [baseline] (7.024 ms) : 0, 7024
Telemetry [candidate] (7.194 ms) : 0, 7194
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (807.934 ms) : 0, 807934
BytebuddyAgent [candidate] (798.005 ms) : 0, 798005
GlobalTracer [baseline] (303.845 ms) : 0, 303845
GlobalTracer [candidate] (300.697 ms) : 0, 300697
AppSec [baseline] (56.379 ms) : 0, 56379
AppSec [candidate] (55.533 ms) : 0, 55533
IAST [baseline] (21.297 ms) : 0, 21297
IAST [candidate] (22.7 ms) : 0, 22700
Remote Config [baseline] (604.753 µs) : 0, 605
Remote Config [candidate] (615.168 µs) : 0, 615
Telemetry [baseline] (7.14 ms) : 0, 7140
Telemetry [candidate] (7.126 ms) : 0, 7126
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (801.899 ms) : 0, 801899
BytebuddyAgent [candidate] (799.975 ms) : 0, 799975
GlobalTracer [baseline] (303.423 ms) : 0, 303423
GlobalTracer [candidate] (301.599 ms) : 0, 301599
AppSec [baseline] (54.633 ms) : 0, 54633
AppSec [candidate] (54.934 ms) : 0, 54934
IAST [baseline] (24.369 ms) : 0, 24369
IAST [candidate] (23.443 ms) : 0, 23443
Remote Config [baseline] (618.417 µs) : 0, 618
Remote Config [candidate] (630.654 µs) : 0, 631
Telemetry [baseline] (7.033 ms) : 0, 7033
Telemetry [candidate] (7.018 ms) : 0, 7018
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 16 unstable metrics. Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.41.0-SNAPSHOT~c367127391, baseline=1.41.0-SNAPSHOT~70c9598d61
dateFormat X
axisFormat %s
section baseline
no_agent (1.348 ms) : 1328, 1367
. : milestone, 1348,
appsec (1.742 ms) : 1718, 1766
. : milestone, 1742,
appsec_no_iast (1.715 ms) : 1691, 1739
. : milestone, 1715,
iast (1.493 ms) : 1470, 1516
. : milestone, 1493,
profiling (1.524 ms) : 1499, 1549
. : milestone, 1524,
tracing (1.475 ms) : 1451, 1499
. : milestone, 1475,
section candidate
no_agent (1.338 ms) : 1318, 1357
. : milestone, 1338,
appsec (1.717 ms) : 1693, 1741
. : milestone, 1717,
appsec_no_iast (1.732 ms) : 1708, 1756
. : milestone, 1732,
iast (1.488 ms) : 1465, 1510
. : milestone, 1488,
profiling (1.482 ms) : 1457, 1507
. : milestone, 1482,
tracing (1.459 ms) : 1434, 1484
. : milestone, 1459,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.41.0-SNAPSHOT~c367127391, baseline=1.41.0-SNAPSHOT~70c9598d61
dateFormat X
axisFormat %s
section baseline
no_agent (371.734 µs) : 351, 392
. : milestone, 372,
iast (480.042 µs) : 459, 501
. : milestone, 480,
iast_FULL (549.976 µs) : 529, 571
. : milestone, 550,
iast_GLOBAL (505.858 µs) : 484, 528
. : milestone, 506,
iast_HARDCODED_SECRET_DISABLED (486.94 µs) : 466, 508
. : milestone, 487,
iast_INACTIVE (443.831 µs) : 423, 464
. : milestone, 444,
iast_TELEMETRY_OFF (467.967 µs) : 447, 489
. : milestone, 468,
tracing (448.486 µs) : 427, 470
. : milestone, 448,
section candidate
no_agent (368.125 µs) : 349, 387
. : milestone, 368,
iast (480.283 µs) : 459, 501
. : milestone, 480,
iast_FULL (555.022 µs) : 534, 576
. : milestone, 555,
iast_GLOBAL (505.003 µs) : 483, 527
. : milestone, 505,
iast_HARDCODED_SECRET_DISABLED (486.305 µs) : 465, 507
. : milestone, 486,
iast_INACTIVE (447.226 µs) : 426, 469
. : milestone, 447,
iast_TELEMETRY_OFF (469.329 µs) : 448, 490
. : milestone, 469,
tracing (441.077 µs) : 421, 461
. : milestone, 441,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.41.0-SNAPSHOT~c367127391, baseline=1.41.0-SNAPSHOT~70c9598d61
dateFormat X
axisFormat %s
section baseline
no_agent (14.963 s) : 14963000, 14963000
. : milestone, 14963000,
appsec (14.922 s) : 14922000, 14922000
. : milestone, 14922000,
iast (18.597 s) : 18597000, 18597000
. : milestone, 18597000,
iast_GLOBAL (18.002 s) : 18002000, 18002000
. : milestone, 18002000,
profiling (15.076 s) : 15076000, 15076000
. : milestone, 15076000,
tracing (15.295 s) : 15295000, 15295000
. : milestone, 15295000,
section candidate
no_agent (15.681 s) : 15681000, 15681000
. : milestone, 15681000,
appsec (15.138 s) : 15138000, 15138000
. : milestone, 15138000,
iast (18.711 s) : 18711000, 18711000
. : milestone, 18711000,
iast_GLOBAL (17.786 s) : 17786000, 17786000
. : milestone, 17786000,
profiling (15.06 s) : 15060000, 15060000
. : milestone, 15060000,
tracing (15.475 s) : 15475000, 15475000
. : milestone, 15475000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.41.0-SNAPSHOT~c367127391, baseline=1.41.0-SNAPSHOT~70c9598d61
dateFormat X
axisFormat %s
section baseline
no_agent (1.47 ms) : 1458, 1481
. : milestone, 1470,
appsec (2.336 ms) : 2295, 2376
. : milestone, 2336,
iast (2.065 ms) : 2013, 2116
. : milestone, 2065,
iast_GLOBAL (2.107 ms) : 2055, 2159
. : milestone, 2107,
profiling (1.943 ms) : 1901, 1984
. : milestone, 1943,
tracing (1.916 ms) : 1876, 1955
. : milestone, 1916,
section candidate
no_agent (1.461 ms) : 1450, 1472
. : milestone, 1461,
appsec (2.338 ms) : 2298, 2379
. : milestone, 2338,
iast (2.063 ms) : 2012, 2114
. : milestone, 2063,
iast_GLOBAL (2.113 ms) : 2060, 2165
. : milestone, 2113,
profiling (2.414 ms) : 2226, 2602
. : milestone, 2414,
tracing (1.915 ms) : 1876, 1954
. : milestone, 1915,
|
7b21170
to
e3b6a78
Compare
c585613
to
f2dcbe1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@Advice.Thrown final Throwable throwable) { | ||
DECORATE.onError(scope, throwable); | ||
DECORATE.beforeFinish(scope); | ||
scope.close(); | ||
result = DECORATE.wrapAsyncResultOrFinishSpan(result, scope.span()); | ||
// we must check against the method return signature and not the type of the returned object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate this note in the code.
@@ -20,6 +20,7 @@ dependencies { | |||
testImplementation group: 'com.couchbase.client', name: 'java-client', version: '3.1.0' | |||
testImplementation group: 'org.testcontainers', name: 'couchbase', version: libs.versions.testcontainers.get() | |||
testImplementation project(':dd-java-agent:instrumentation:reactor-core-3.1') | |||
testImplementation project(':dd-java-agent:instrumentation:reactive-streams') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any other change in Couchbase instrumentation that do anything with reactive-streams. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are no changes but the reactive-streams module contains instrumentation for reactive streams that should be put into the test classpath to have them running (before similar advices were on reactor-core hence I'm adding a module dependency)
(RedisSubscriptionState) | ||
InstrumentationContext.get( | ||
"io.lettuce.core.RedisPublisher$RedisSubscription", | ||
"datadog.trace.instrumentation.lettuce5.rx.RedisSubscriptionState") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, what is the motivation for not referencing RedisSubscriptionState by a class instead of a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key class is (package) private so I cannot reference directly using class literals. The context api allows using either both arguments string literals either both class literals. So I had to use the string version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it! I didn't know it was either/or.
@@ -81,7 +80,6 @@ class MuleForkedTest extends WithHttpServer<MuleTestContainer> { | |||
container.stop() | |||
} | |||
|
|||
@Flaky("SocketTimeoutException / https://github.com/DataDog/dd-trace-java/issues/4690") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#4690 is still open. Is it irrelevant or should it be closed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'll close as soon as this PR will be merged thanks!
import org.reactivestreams.Subscriber; | ||
|
||
/** | ||
* This instrumentation is responsible for capturing the span when {@link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this description
7ae9d62
to
5d892f2
Compare
...atadog/trace/instrumentation/reactivestreams/ReactiveStreamsAsyncResultSupportExtension.java
Show resolved
Hide resolved
@@ -60,7 +62,7 @@ public WrappedSubscriber(Subscriber<T> delegate, AgentSpan span) { | |||
|
|||
@Override | |||
public void onSubscribe(Subscription s) { | |||
this.delegate.onSubscribe(s); | |||
this.delegate.onSubscribe(new WrappedSubscription(s, this.span)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know s
is not a WrappedSubscription
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have two nested methods we should wrap twice the subscription. Subscription mechanism is mostly recursive so I don't think we can skip wrapping if it's already wrapped because there are use cases (I added a test for it) that should support it
public static AgentScope before(@Advice.This final Subscriber self) { | ||
final AgentSpan span = | ||
InstrumentationContext.get(Subscriber.class, AgentSpan.class).get(self); | ||
return span == null || span == activeSpan() ? null : activateSpan(span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have reference counting at the scope level to guard against creating a new scope for the active span, so this check will just double the number of scope stack accesses we need to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks I will remove this kind of checks in other similar places so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardstartin I was thinking that perhaps I cannot do it since I need to know on the onEnter if the scope was already active in order to know if I have to close or not in the onExit. In fact, if I let flow the already active scope I should not close on the onExit otherwise I would break things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we expand the the AgentTracer
interface to yield a new scope unless the span is already active instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong. There is actually a refcount on scope activation so activating twice does not hurt. I simplified those things accordingly then thank you
@@ -22,7 +22,7 @@ public static void initialize() {} | |||
|
|||
@Override | |||
public boolean supports(Class<?> result) { | |||
return Flux.class.isAssignableFrom(result) || Mono.class.isAssignableFrom(result); | |||
return result == Flux.class || result == Mono.class; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are both abstract classes, which makes sense because of the change here which checks the method return type instead of the result type (which couldn't have been abstract). I think the parameter name result
should be changed to methodReturnType
or similar to avoid questioning whether the result could be abstract or not.
boolean useStrictTraceWrites() { | ||
// there are continuations we are not handling today on reactor internals. | ||
false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intend to track these down or are the cases impossible to handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it and actually excluded context propagation on some runnable used internally by the reactor eventloop that's in my mind a better solution
@@ -17,10 +18,11 @@ public static AgentScope before() { | |||
@Advice.OnMethodExit(suppress = Throwable.class, onThrowable = Throwable.class) | |||
public static void after( | |||
@Advice.Enter final AgentScope scope, | |||
@Advice.Origin final MethodType methodType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the Method
origin is expensive, it looks like this works in a similar way. Do we know how often this will be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called every time a method is annotated with @Trace
(or @DoNotTrace
in this case). Actually that method call frequency depend on the user code since it is linked to the trace annotations (traced methods in general). What would be an alternative way to obtain the same info that's cheaper and that I can use here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to add a field to store the method's return type, then we could get it with a field access instead. I don't think we need to block on this, but maybe we should keep this instrumentation disabled by default until we know how high the overhead is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some performance concerns and think we can improve the tracer API to support this better, but don't want to block the PR, it's better to have something that works than the current state.
c50306c
to
4108acc
Compare
What Does This Do
This PR aims definitely addressing context propagation for reactivestreams
Publisher
andSubscriber
. It addresses also patterns (Mono
,Flux
,Processor
) provided by project reactor.The old advices we had were either not always applied (
TerminalSubscriberInstrumentation
) either not applied by default (ReactorHooks
). In particularTerminalSubscriberInstrumentation
was only applied when the Mono or Flux was peeking elements (i.e. methodsdoOnXXXX
orlog
).Also it had no effects on propagation. The propagation was only handled by our
java_concurrent
advices that was missing use cases like subscription on unhandled thread pools or work queues.This PR also unify the way traced methods finish spans when the return type is a Producer. We had two different extensions:
Considering this use case
In that case there it may happen that
ReactiveStreamsAsyncResultSupportExtension
is used (since Mono implements Publisher). This one was wrapping the publisher in an internal types causing class casts when bytebuddy replaces the return type since our wrapped type cannot be casted back to MonoGenerally speaking this PR avoid rewrapping Publisher and Subscriber in order to save a lot of object allocations (but transforms more types).
Motivation
Additional Notes
Contributor Checklist
type:
and (comp:
orinst:
) labels in addition to any usefull labelsclose
,fix
or any linking keywords when referencing an issue.Use
solves
instead, and assign the PR milestone to the issueJira ticket: AIDM-331