-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve data streams performance #7749
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 1 performance regressions! Performance is the same for 53 metrics, 9 unstable metrics.
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.069 s) : 0, 1069458
Total [baseline] (10.392 s) : 0, 10391909
Agent [candidate] (1.074 s) : 0, 1074405
Total [candidate] (10.367 s) : 0, 10366964
section appsec
Agent [baseline] (1.202 s) : 0, 1202335
Total [baseline] (10.661 s) : 0, 10661319
Agent [candidate] (1.204 s) : 0, 1204305
Total [candidate] (10.624 s) : 0, 10623803
section iast
Agent [baseline] (1.209 s) : 0, 1209066
Total [baseline] (10.983 s) : 0, 10983177
Agent [candidate] (1.201 s) : 0, 1201424
Total [candidate] (10.856 s) : 0, 10855524
section profiling
Agent [baseline] (1.266 s) : 0, 1265624
Total [baseline] (10.586 s) : 0, 10585556
Agent [candidate] (1.289 s) : 0, 1288707
Total [candidate] (10.775 s) : 0, 10774970
gantt
title petclinic - break down per module: candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (682.934 ms) : 0, 682934
BytebuddyAgent [candidate] (684.357 ms) : 0, 684357
GlobalTracer [baseline] (310.557 ms) : 0, 310557
GlobalTracer [candidate] (314.061 ms) : 0, 314061
AppSec [baseline] (54.006 ms) : 0, 54006
AppSec [candidate] (53.962 ms) : 0, 53962
Remote Config [baseline] (674.808 µs) : 0, 675
Remote Config [candidate] (679.002 µs) : 0, 679
Telemetry [baseline] (7.572 ms) : 0, 7572
Telemetry [candidate] (7.524 ms) : 0, 7524
section appsec
BytebuddyAgent [baseline] (699.666 ms) : 0, 699666
BytebuddyAgent [candidate] (699.307 ms) : 0, 699307
GlobalTracer [baseline] (307.783 ms) : 0, 307783
GlobalTracer [candidate] (310.092 ms) : 0, 310092
AppSec [baseline] (162.248 ms) : 0, 162248
AppSec [candidate] (162.92 ms) : 0, 162920
Remote Config [baseline] (639.909 µs) : 0, 640
Remote Config [candidate] (641.068 µs) : 0, 641
Telemetry [baseline] (8.182 ms) : 0, 8182
Telemetry [candidate] (7.849 ms) : 0, 7849
IAST [baseline] (20.152 ms) : 0, 20152
IAST [candidate] (19.521 ms) : 0, 19521
section iast
BytebuddyAgent [baseline] (805.475 ms) : 0, 805475
BytebuddyAgent [candidate] (799.079 ms) : 0, 799079
GlobalTracer [baseline] (301.801 ms) : 0, 301801
GlobalTracer [candidate] (302.232 ms) : 0, 302232
AppSec [baseline] (55.35 ms) : 0, 55350
AppSec [candidate] (56.468 ms) : 0, 56468
Remote Config [baseline] (617.004 µs) : 0, 617
Remote Config [candidate] (602.487 µs) : 0, 602
Telemetry [baseline] (7.211 ms) : 0, 7211
Telemetry [candidate] (7.108 ms) : 0, 7108
IAST [baseline] (24.772 ms) : 0, 24772
IAST [candidate] (22.176 ms) : 0, 22176
section profiling
ProfilingAgent [baseline] (96.196 ms) : 0, 96196
ProfilingAgent [candidate] (98.296 ms) : 0, 98296
BytebuddyAgent [baseline] (675.185 ms) : 0, 675185
BytebuddyAgent [candidate] (686.927 ms) : 0, 686927
GlobalTracer [baseline] (392.889 ms) : 0, 392889
GlobalTracer [candidate] (400.312 ms) : 0, 400312
AppSec [baseline] (54.569 ms) : 0, 54569
AppSec [candidate] (55.518 ms) : 0, 55518
Remote Config [baseline] (661.2 µs) : 0, 661
Remote Config [candidate] (672.733 µs) : 0, 673
Telemetry [baseline] (7.439 ms) : 0, 7439
Telemetry [candidate] (7.609 ms) : 0, 7609
Profiling [baseline] (96.22 ms) : 0, 96220
Profiling [candidate] (98.32 ms) : 0, 98320
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.069 s) : 0, 1068582
Total [baseline] (8.557 s) : 0, 8556965
Agent [candidate] (1.079 s) : 0, 1078622
Total [candidate] (8.578 s) : 0, 8578128
section iast
Agent [baseline] (1.199 s) : 0, 1198905
Total [baseline] (9.101 s) : 0, 9101009
Agent [candidate] (1.204 s) : 0, 1203672
Total [candidate] (9.148 s) : 0, 9148019
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.216 s) : 0, 1215752
Total [baseline] (9.084 s) : 0, 9084108
Agent [candidate] (1.208 s) : 0, 1208109
Total [candidate] (9.09 s) : 0, 9090031
section iast_TELEMETRY_OFF
Agent [baseline] (1.197 s) : 0, 1196751
Total [baseline] (9.123 s) : 0, 9123391
Agent [candidate] (1.215 s) : 0, 1214601
Total [candidate] (9.205 s) : 0, 9205397
gantt
title insecure-bank - break down per module: candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (682.229 ms) : 0, 682229
BytebuddyAgent [candidate] (686.833 ms) : 0, 686833
GlobalTracer [baseline] (310.625 ms) : 0, 310625
GlobalTracer [candidate] (315.274 ms) : 0, 315274
AppSec [baseline] (53.719 ms) : 0, 53719
AppSec [candidate] (54.473 ms) : 0, 54473
Remote Config [baseline] (671.843 µs) : 0, 672
Remote Config [candidate] (673.813 µs) : 0, 674
Telemetry [baseline] (7.604 ms) : 0, 7604
Telemetry [candidate] (7.572 ms) : 0, 7572
section iast
BytebuddyAgent [baseline] (798.775 ms) : 0, 798775
BytebuddyAgent [candidate] (800.808 ms) : 0, 800808
GlobalTracer [baseline] (299.904 ms) : 0, 299904
GlobalTracer [candidate] (302.649 ms) : 0, 302649
AppSec [baseline] (54.441 ms) : 0, 54441
AppSec [candidate] (58.269 ms) : 0, 58269
IAST [baseline] (24.284 ms) : 0, 24284
IAST [candidate] (20.408 ms) : 0, 20408
Remote Config [baseline] (613.878 µs) : 0, 614
Remote Config [candidate] (604.044 µs) : 0, 604
Telemetry [baseline] (7.093 ms) : 0, 7093
Telemetry [candidate] (7.097 ms) : 0, 7097
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (810.41 ms) : 0, 810410
BytebuddyAgent [candidate] (803.6 ms) : 0, 803600
GlobalTracer [baseline] (303.926 ms) : 0, 303926
GlobalTracer [candidate] (304.272 ms) : 0, 304272
AppSec [baseline] (55.672 ms) : 0, 55672
AppSec [candidate] (57.281 ms) : 0, 57281
IAST [baseline] (23.908 ms) : 0, 23908
IAST [candidate] (21.373 ms) : 0, 21373
Remote Config [baseline] (623.377 µs) : 0, 623
Remote Config [candidate] (606.584 µs) : 0, 607
Telemetry [baseline] (7.19 ms) : 0, 7190
Telemetry [candidate] (7.083 ms) : 0, 7083
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (796.398 ms) : 0, 796398
BytebuddyAgent [candidate] (807.867 ms) : 0, 807867
GlobalTracer [baseline] (300.086 ms) : 0, 300086
GlobalTracer [candidate] (306.089 ms) : 0, 306089
AppSec [baseline] (55.197 ms) : 0, 55197
AppSec [candidate] (55.602 ms) : 0, 55602
IAST [baseline] (23.599 ms) : 0, 23599
IAST [candidate] (23.409 ms) : 0, 23409
Remote Config [baseline] (637.055 µs) : 0, 637
Remote Config [candidate] (608.916 µs) : 0, 609
Telemetry [baseline] (7.026 ms) : 0, 7026
Telemetry [candidate] (7.035 ms) : 0, 7035
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 17 unstable metrics. Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
dateFormat X
axisFormat %s
section baseline
no_agent (370.282 µs) : 348, 392
. : milestone, 370,
iast (481.099 µs) : 460, 502
. : milestone, 481,
iast_FULL (550.495 µs) : 529, 572
. : milestone, 550,
iast_GLOBAL (511.122 µs) : 488, 534
. : milestone, 511,
iast_HARDCODED_SECRET_DISABLED (480.963 µs) : 460, 502
. : milestone, 481,
iast_INACTIVE (440.193 µs) : 420, 461
. : milestone, 440,
iast_TELEMETRY_OFF (468.239 µs) : 447, 489
. : milestone, 468,
tracing (450.043 µs) : 429, 471
. : milestone, 450,
section candidate
no_agent (369.543 µs) : 350, 389
. : milestone, 370,
iast (484.038 µs) : 463, 505
. : milestone, 484,
iast_FULL (551.184 µs) : 530, 573
. : milestone, 551,
iast_GLOBAL (511.819 µs) : 489, 534
. : milestone, 512,
iast_HARDCODED_SECRET_DISABLED (483.626 µs) : 462, 505
. : milestone, 484,
iast_INACTIVE (450.21 µs) : 428, 472
. : milestone, 450,
iast_TELEMETRY_OFF (474.067 µs) : 452, 496
. : milestone, 474,
tracing (442.117 µs) : 421, 463
. : milestone, 442,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
dateFormat X
axisFormat %s
section baseline
no_agent (1.346 ms) : 1326, 1366
. : milestone, 1346,
appsec (1.694 ms) : 1669, 1719
. : milestone, 1694,
appsec_no_iast (1.724 ms) : 1701, 1748
. : milestone, 1724,
iast (1.473 ms) : 1451, 1496
. : milestone, 1473,
profiling (1.477 ms) : 1453, 1502
. : milestone, 1477,
tracing (1.479 ms) : 1452, 1506
. : milestone, 1479,
section candidate
no_agent (1.332 ms) : 1313, 1351
. : milestone, 1332,
appsec (1.721 ms) : 1696, 1746
. : milestone, 1721,
appsec_no_iast (1.712 ms) : 1688, 1736
. : milestone, 1712,
iast (1.484 ms) : 1462, 1506
. : milestone, 1484,
profiling (1.474 ms) : 1451, 1497
. : milestone, 1474,
tracing (1.447 ms) : 1422, 1472
. : milestone, 1447,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
dateFormat X
axisFormat %s
section baseline
no_agent (1.464 ms) : 1453, 1475
. : milestone, 1464,
appsec (2.306 ms) : 2265, 2347
. : milestone, 2306,
iast (2.053 ms) : 2002, 2104
. : milestone, 2053,
iast_GLOBAL (2.096 ms) : 2044, 2147
. : milestone, 2096,
profiling (1.926 ms) : 1886, 1967
. : milestone, 1926,
tracing (1.902 ms) : 1863, 1941
. : milestone, 1902,
section candidate
no_agent (1.468 ms) : 1456, 1479
. : milestone, 1468,
appsec (2.315 ms) : 2274, 2356
. : milestone, 2315,
iast (2.062 ms) : 2010, 2114
. : milestone, 2062,
iast_GLOBAL (2.11 ms) : 2058, 2162
. : milestone, 2110,
profiling (1.924 ms) : 1883, 1965
. : milestone, 1924,
tracing (1.901 ms) : 1862, 1940
. : milestone, 1901,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
dateFormat X
axisFormat %s
section baseline
no_agent (15.466 s) : 15466000, 15466000
. : milestone, 15466000,
appsec (15.119 s) : 15119000, 15119000
. : milestone, 15119000,
iast (18.847 s) : 18847000, 18847000
. : milestone, 18847000,
iast_GLOBAL (18.082 s) : 18082000, 18082000
. : milestone, 18082000,
profiling (15.238 s) : 15238000, 15238000
. : milestone, 15238000,
tracing (15.415 s) : 15415000, 15415000
. : milestone, 15415000,
section candidate
no_agent (15.132 s) : 15132000, 15132000
. : milestone, 15132000,
appsec (15.24 s) : 15240000, 15240000
. : milestone, 15240000,
iast (18.857 s) : 18857000, 18857000
. : milestone, 18857000,
iast_GLOBAL (17.882 s) : 17882000, 17882000
. : milestone, 17882000,
profiling (15.605 s) : 15605000, 15605000
. : milestone, 15605000,
tracing (15.227 s) : 15227000, 15227000
. : milestone, 15227000,
|
InboxItem payload = inbox.poll(); | ||
if (payload == null) { | ||
Thread.sleep(10); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is busy waiting really better than a parked thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the profiles, it looks much better (overhead of this queue went to nearly 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did something similar in Go: https://github.com/DataDog/dd-trace-go/pull/2455/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the profiles, it looks much better (overhead of this queue went to nearly 0)
Mb. overhead metric calculated from profile is not right? What is the change in latency?
I second on this one, sounds strange that doing more work results in less overhead. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know Java enough, but in Go, the overhead of doing Go routine synchronization was much more than the overhead of checking if there is data once every 10 ms.
The problem, is that at each write, a Go channel had to then try to wake up the consumer, and that was causing a lot of overhead. I expect it's about the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something similar was done in #4409 to address the same bottleneck - pthread_cond_signal
.
I don't think it's helpful to describe this as "strange" without consideration of the production and consumption rates of this queue. The production rate is equivalent to rps
, and the consumption rate is a constant 100. Considering the overhead from the perspective of the producer, which is no longer responsible for unblocking threads and can just drop the item off in the queue (improving the latency of the producer thread). On the other hand, the consumer is now waking up 100 times per second, either to do some work or to go back to sleep, incurring a useless context switch every 10ms. (However, when it was waiting on a condition, it might have woken up more than 100 times per second anyway if rps
were greater than 100 and would only have woken up less often if the traffic were bursty.) So this change should reduce CPU overhead whenever (items produced per second) * (cpu cost of pthread_cond_signal) > (cost of scheduling the thread) * 100
. Assuming pthread_cond_signal
costs the same as scheduling the thread (we have to assume because the profiler only samples one of these) this is a win whenever rps > 100. I think it's more likely that the cost of scheduling the thread is 10% of pthread_cond_signal
meaning this change would win at rps > 10.
If you aim to reduce latency impact on application threads, this change is a win because the application threads are no longer responsible for waking up one of our background threads. If you aim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the context!
8c82831
to
fe758d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great use of the profiler, and the changes make sense to me.
What Does This Do
We want to enable collection of Data Streams stats by default. To do that, we want to make sure that performance overhead is minimal.
This PR looks at all the biggest culprits for overhead (by using Datadog profiling) and removes as much overhead as possible.
With this PR, compared to APM overhead, DSM overhead will be small.
Motivation
Setting schema name & schema type are taking a lot of CPUMost of the DSM overhead is caused by the MspcBlockingConsumerArrayQueue:
Also, some time is spent setting the pathway hash tag on spans, which we don't use:
Additional Notes
Contributor Checklist
type:
and (comp:
orinst:
) labels in addition to any usefull labelsclose
,fix
or any linking keywords when referencing an issue.Use
solves
instead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]