-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support service name overrides for DSM #7798
Conversation
Kafka / producer-benchmarkParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics. See unchanged results
|
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 53 metrics, 10 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.43.0-SNAPSHOT~6856dc5a90, baseline=1.43.0-SNAPSHOT~2f767ab81d
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.078 s) : 0, 1078306
Total [baseline] (10.424 s) : 0, 10424468
Agent [candidate] (1.081 s) : 0, 1080627
Total [candidate] (10.433 s) : 0, 10432782
section appsec
Agent [baseline] (1.214 s) : 0, 1213661
Total [baseline] (10.66 s) : 0, 10660096
Agent [candidate] (1.222 s) : 0, 1222185
Total [candidate] (10.69 s) : 0, 10690301
section iast
Agent [baseline] (1.205 s) : 0, 1204762
Total [baseline] (10.85 s) : 0, 10849908
Agent [candidate] (1.206 s) : 0, 1205704
Total [candidate] (10.832 s) : 0, 10831551
section profiling
Agent [baseline] (1.274 s) : 0, 1274412
Total [baseline] (10.733 s) : 0, 10732795
Agent [candidate] (1.275 s) : 0, 1275480
Total [candidate] (10.724 s) : 0, 10724239
gantt
title petclinic - break down per module: candidate=1.43.0-SNAPSHOT~6856dc5a90, baseline=1.43.0-SNAPSHOT~2f767ab81d
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (685.024 ms) : 0, 685024
BytebuddyAgent [candidate] (686.286 ms) : 0, 686286
GlobalTracer [baseline] (314.364 ms) : 0, 314364
GlobalTracer [candidate] (315.291 ms) : 0, 315291
AppSec [baseline] (53.972 ms) : 0, 53972
AppSec [candidate] (54.121 ms) : 0, 54121
Remote Config [baseline] (678.013 µs) : 0, 678
Remote Config [candidate] (679.547 µs) : 0, 680
Telemetry [baseline] (10.663 ms) : 0, 10663
Telemetry [candidate] (10.595 ms) : 0, 10595
section appsec
BytebuddyAgent [baseline] (703.259 ms) : 0, 703259
BytebuddyAgent [candidate] (708.486 ms) : 0, 708486
GlobalTracer [baseline] (312.009 ms) : 0, 312009
GlobalTracer [candidate] (315.031 ms) : 0, 315031
AppSec [baseline] (166.298 ms) : 0, 166298
AppSec [candidate] (166.657 ms) : 0, 166657
IAST [baseline] (20.216 ms) : 0, 20216
IAST [candidate] (19.564 ms) : 0, 19564
Remote Config [baseline] (625.339 µs) : 0, 625
Remote Config [candidate] (648.237 µs) : 0, 648
Telemetry [baseline] (7.767 ms) : 0, 7767
Telemetry [candidate] (7.894 ms) : 0, 7894
section iast
BytebuddyAgent [baseline] (801.478 ms) : 0, 801478
BytebuddyAgent [candidate] (802.006 ms) : 0, 802006
GlobalTracer [baseline] (304.06 ms) : 0, 304060
GlobalTracer [candidate] (304.26 ms) : 0, 304260
AppSec [baseline] (55.466 ms) : 0, 55466
AppSec [candidate] (56.994 ms) : 0, 56994
IAST [baseline] (22.093 ms) : 0, 22093
IAST [candidate] (20.707 ms) : 0, 20707
Remote Config [baseline] (605.515 µs) : 0, 606
Remote Config [candidate] (612.18 µs) : 0, 612
Telemetry [baseline] (7.432 ms) : 0, 7432
Telemetry [candidate] (7.479 ms) : 0, 7479
section profiling
BytebuddyAgent [baseline] (680.13 ms) : 0, 680130
BytebuddyAgent [candidate] (679.904 ms) : 0, 679904
GlobalTracer [baseline] (397.177 ms) : 0, 397177
GlobalTracer [candidate] (396.676 ms) : 0, 396676
AppSec [baseline] (54.372 ms) : 0, 54372
AppSec [candidate] (54.624 ms) : 0, 54624
Remote Config [baseline] (664.52 µs) : 0, 665
Remote Config [candidate] (661.504 µs) : 0, 662
Telemetry [baseline] (12.656 ms) : 0, 12656
Telemetry [candidate] (11.355 ms) : 0, 11355
ProfilingAgent [baseline] (90.66 ms) : 0, 90660
ProfilingAgent [candidate] (93.528 ms) : 0, 93528
Profiling [baseline] (90.684 ms) : 0, 90684
Profiling [candidate] (93.552 ms) : 0, 93552
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.43.0-SNAPSHOT~6856dc5a90, baseline=1.43.0-SNAPSHOT~2f767ab81d
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.09 s) : 0, 1089634
Total [baseline] (8.575 s) : 0, 8574684
Agent [candidate] (1.079 s) : 0, 1079497
Total [candidate] (8.541 s) : 0, 8540787
section iast
Agent [baseline] (1.205 s) : 0, 1205495
Total [baseline] (9.107 s) : 0, 9107250
Agent [candidate] (1.205 s) : 0, 1205215
Total [candidate] (9.097 s) : 0, 9096963
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.208 s) : 0, 1207667
Total [baseline] (9.127 s) : 0, 9127055
Agent [candidate] (1.209 s) : 0, 1208986
Total [candidate] (9.122 s) : 0, 9122034
section iast_TELEMETRY_OFF
Agent [baseline] (1.21 s) : 0, 1209816
Total [baseline] (9.095 s) : 0, 9094555
Agent [candidate] (1.202 s) : 0, 1202332
Total [candidate] (9.138 s) : 0, 9137535
gantt
title insecure-bank - break down per module: candidate=1.43.0-SNAPSHOT~6856dc5a90, baseline=1.43.0-SNAPSHOT~2f767ab81d
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (692.591 ms) : 0, 692591
BytebuddyAgent [candidate] (685.267 ms) : 0, 685267
GlobalTracer [baseline] (317.565 ms) : 0, 317565
GlobalTracer [candidate] (314.435 ms) : 0, 314435
AppSec [baseline] (54.462 ms) : 0, 54462
AppSec [candidate] (54.055 ms) : 0, 54055
Remote Config [baseline] (680.376 µs) : 0, 680
Remote Config [candidate] (694.94 µs) : 0, 695
Telemetry [baseline] (10.561 ms) : 0, 10561
Telemetry [candidate] (11.398 ms) : 0, 11398
section iast
BytebuddyAgent [baseline] (801.955 ms) : 0, 801955
BytebuddyAgent [candidate] (801.34 ms) : 0, 801340
GlobalTracer [baseline] (303.942 ms) : 0, 303942
GlobalTracer [candidate] (304.42 ms) : 0, 304420
AppSec [baseline] (57.334 ms) : 0, 57334
AppSec [candidate] (55.937 ms) : 0, 55937
IAST [baseline] (20.545 ms) : 0, 20545
IAST [candidate] (21.927 ms) : 0, 21927
Remote Config [baseline] (618.431 µs) : 0, 618
Remote Config [candidate] (601.837 µs) : 0, 602
Telemetry [baseline] (7.442 ms) : 0, 7442
Telemetry [candidate] (7.36 ms) : 0, 7360
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (802.893 ms) : 0, 802893
BytebuddyAgent [candidate] (804.503 ms) : 0, 804503
GlobalTracer [baseline] (304.745 ms) : 0, 304745
GlobalTracer [candidate] (304.718 ms) : 0, 304718
AppSec [baseline] (57.702 ms) : 0, 57702
AppSec [candidate] (57.356 ms) : 0, 57356
IAST [baseline] (20.586 ms) : 0, 20586
IAST [candidate] (20.498 ms) : 0, 20498
Remote Config [baseline] (604.12 µs) : 0, 604
Remote Config [candidate] (615.416 µs) : 0, 615
Telemetry [baseline] (7.47 ms) : 0, 7470
Telemetry [candidate] (7.52 ms) : 0, 7520
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (804.615 ms) : 0, 804615
BytebuddyAgent [candidate] (799.262 ms) : 0, 799262
GlobalTracer [baseline] (305.686 ms) : 0, 305686
GlobalTracer [candidate] (303.69 ms) : 0, 303690
AppSec [baseline] (57.684 ms) : 0, 57684
AppSec [candidate] (56.859 ms) : 0, 56859
IAST [baseline] (20.133 ms) : 0, 20133
IAST [candidate] (20.924 ms) : 0, 20924
Remote Config [baseline] (605.041 µs) : 0, 605
Remote Config [candidate] (613.766 µs) : 0, 614
Telemetry [baseline] (7.338 ms) : 0, 7338
Telemetry [candidate] (7.318 ms) : 0, 7318
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 17 unstable metrics. Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.43.0-SNAPSHOT~6856dc5a90, baseline=1.43.0-SNAPSHOT~2f767ab81d
dateFormat X
axisFormat %s
section baseline
no_agent (1.333 ms) : 1313, 1353
. : milestone, 1333,
appsec (1.746 ms) : 1722, 1771
. : milestone, 1746,
appsec_no_iast (1.727 ms) : 1703, 1752
. : milestone, 1727,
iast (1.466 ms) : 1444, 1489
. : milestone, 1466,
profiling (1.511 ms) : 1487, 1535
. : milestone, 1511,
tracing (1.467 ms) : 1442, 1492
. : milestone, 1467,
section candidate
no_agent (1.331 ms) : 1312, 1350
. : milestone, 1331,
appsec (1.741 ms) : 1717, 1765
. : milestone, 1741,
appsec_no_iast (1.715 ms) : 1690, 1739
. : milestone, 1715,
iast (1.502 ms) : 1480, 1524
. : milestone, 1502,
profiling (1.478 ms) : 1454, 1501
. : milestone, 1478,
tracing (1.468 ms) : 1444, 1492
. : milestone, 1468,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.43.0-SNAPSHOT~6856dc5a90, baseline=1.43.0-SNAPSHOT~2f767ab81d
dateFormat X
axisFormat %s
section baseline
no_agent (370.99 µs) : 350, 392
. : milestone, 371,
iast (482.034 µs) : 461, 503
. : milestone, 482,
iast_FULL (638.608 µs) : 617, 660
. : milestone, 639,
iast_GLOBAL (513.851 µs) : 492, 536
. : milestone, 514,
iast_HARDCODED_SECRET_DISABLED (486.71 µs) : 465, 508
. : milestone, 487,
iast_INACTIVE (447.753 µs) : 426, 469
. : milestone, 448,
iast_TELEMETRY_OFF (478.151 µs) : 456, 500
. : milestone, 478,
tracing (442.097 µs) : 422, 463
. : milestone, 442,
section candidate
no_agent (367.339 µs) : 348, 387
. : milestone, 367,
iast (486.566 µs) : 465, 509
. : milestone, 487,
iast_FULL (642.278 µs) : 621, 664
. : milestone, 642,
iast_GLOBAL (514.931 µs) : 493, 537
. : milestone, 515,
iast_HARDCODED_SECRET_DISABLED (484.927 µs) : 464, 506
. : milestone, 485,
iast_INACTIVE (446.73 µs) : 426, 468
. : milestone, 447,
iast_TELEMETRY_OFF (474.606 µs) : 453, 496
. : milestone, 475,
tracing (445.852 µs) : 425, 467
. : milestone, 446,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.43.0-SNAPSHOT~6856dc5a90, baseline=1.43.0-SNAPSHOT~2f767ab81d
dateFormat X
axisFormat %s
section baseline
no_agent (15.547 s) : 15547000, 15547000
. : milestone, 15547000,
appsec (15.232 s) : 15232000, 15232000
. : milestone, 15232000,
iast (18.435 s) : 18435000, 18435000
. : milestone, 18435000,
iast_GLOBAL (18.084 s) : 18084000, 18084000
. : milestone, 18084000,
profiling (15.03 s) : 15030000, 15030000
. : milestone, 15030000,
tracing (15.077 s) : 15077000, 15077000
. : milestone, 15077000,
section candidate
no_agent (15.166 s) : 15166000, 15166000
. : milestone, 15166000,
appsec (15.12 s) : 15120000, 15120000
. : milestone, 15120000,
iast (18.66 s) : 18660000, 18660000
. : milestone, 18660000,
iast_GLOBAL (18.309 s) : 18309000, 18309000
. : milestone, 18309000,
profiling (14.834 s) : 14834000, 14834000
. : milestone, 14834000,
tracing (15.238 s) : 15238000, 15238000
. : milestone, 15238000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.43.0-SNAPSHOT~6856dc5a90, baseline=1.43.0-SNAPSHOT~2f767ab81d
dateFormat X
axisFormat %s
section baseline
no_agent (1.469 ms) : 1458, 1481
. : milestone, 1469,
appsec (2.348 ms) : 2307, 2390
. : milestone, 2348,
iast (2.079 ms) : 2027, 2132
. : milestone, 2079,
iast_GLOBAL (2.129 ms) : 2076, 2181
. : milestone, 2129,
profiling (1.937 ms) : 1896, 1978
. : milestone, 1937,
tracing (1.919 ms) : 1880, 1958
. : milestone, 1919,
section candidate
no_agent (1.472 ms) : 1461, 1484
. : milestone, 1472,
appsec (2.34 ms) : 2299, 2381
. : milestone, 2340,
iast (2.079 ms) : 2027, 2131
. : milestone, 2079,
iast_GLOBAL (2.121 ms) : 2068, 2174
. : milestone, 2121,
profiling (1.946 ms) : 1904, 1988
. : milestone, 1946,
tracing (1.925 ms) : 1885, 1965
. : milestone, 1925,
|
Kafka / consumer-benchmarkParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics. See unchanged results
|
|
||
if (taskDescription != null) { | ||
try { | ||
Field prop = taskDescription.getClass().getDeclaredField("properties"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should optimize this by unreflecting a method handle for this getter. It will save access checking costs
if (appName != null) { | ||
AgentTracer.get() | ||
.getDataStreamsMonitoring() | ||
.setThreadServiceName(taskRunner.getThreadId(), appName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is product wise coherent to have a different service name from what represented by DSM and what represented by the tracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense in case with Spark / Flink. It may not be possible to pass the service name from outside in managed environments (for instance spans generated within Databricks for DJM may have service name == "databricks.all-purpose-cluster.dlt-execution-123"). At the same time customers expect to see the Spark Job name, not the cluster node name.
Also there a cases when a single running application (Task Manager in Flink) runs multiple independent stream processing tasks. Each task has it's own "service name", using node name is incorrect.
The better approach would be to dynamically set global service name depending on the context. This seems like a way large change. Happy to discuss this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you can do it by calling updatePreferredServiceName
(
Line 293 in 00856e0
void updatePreferredServiceName(String serviceName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK if it depends from the current context then it's not globally applying to the JVM.
List<StatsBucket> includedBuckets = new ArrayList<>(); | ||
Iterator<Map.Entry<Long, StatsBucket>> mapIterator = timeToBucket.entrySet().iterator(); | ||
// stats are grouped by time buckets and service names | ||
Map<String, List<StatsBucket>> includedBuckets = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also put the service name in the statsPoint. What do you think? It would avoid doing this grouping logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The service name is reported per bucket. You can put it inside the stats point but then you'll have to modify the writer to correctly split the bucket. Effectively doing similar thing in a different place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that addition is a good to have to improve the experience of DSM consumers. I've still some concerns especially around adding an unbounded map to store thread id to name correspondences. It should be done in a threadlocal and actually it might not work if the kafka producer (in this example) is called in a thread different from the spark task one. Also it will be nice to have more e2e testing with spark and kafka
@@ -200,7 +200,8 @@ public static void onEnter(@Advice.Argument(value = 0) int estimatedPayloadSize) | |||
saved.getTimestampNanos(), | |||
saved.getPathwayLatencyNano(), | |||
saved.getEdgeLatencyNano(), | |||
estimatedPayloadSize); | |||
estimatedPayloadSize, | |||
saved.getServiceNameOverride()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are message produced in the same thread where the task is running? Also are we testing it? I did not see spark tests with kafka but perhaps I missed it
@@ -74,6 +75,8 @@ public class DefaultDataStreamsMonitoring implements DataStreamsMonitoring, Even | |||
private volatile boolean agentSupportsDataStreams = false; | |||
private volatile boolean configSupportsDataStreams = false; | |||
private final ConcurrentHashMap<String, SchemaSampler> schemaSamplers; | |||
private static final ConcurrentHashMap<Long, String> threadServiceNames = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've some concerns adding unbounded maps here. It might be ok for the spark instrumentation but this has been added to a generic API on DataStreamMonitoring. It can be used in the future in a bad way and start leaking. If service names have to be pinned to a thread, possible a threadlocal variable should be used in order to ensure that that value will go away when the thread is collected.
This PR adds a mechanism which allows overriding service name for DSM checkpoints per thread.
Stats buckets will be split by service name, so per 1 time interval we may have several "named" buckets.
This will be used to correctly capture service (application) names for spark / flink jobs.