Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve data streams performance #7749

Merged
merged 6 commits into from
Oct 16, 2024
Merged

Conversation

piochelepiotr
Copy link
Contributor

@piochelepiotr piochelepiotr commented Oct 10, 2024

What Does This Do

We want to enable collection of Data Streams stats by default. To do that, we want to make sure that performance overhead is minimal.
This PR looks at all the biggest culprits for overhead (by using Datadog profiling) and removes as much overhead as possible.

With this PR, compared to APM overhead, DSM overhead will be small.

Motivation

image Setting schema name & schema type are taking a lot of CPU

Most of the DSM overhead is caused by the MspcBlockingConsumerArrayQueue:
image
Also, some time is spent setting the pathway hash tag on spans, which we don't use:
image

Additional Notes

Contributor Checklist

Jira ticket: [PROJ-IDENT]

@pr-commenter
Copy link

pr-commenter bot commented Oct 10, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master piotr-wolski/improve-dsm-perf
git_commit_date 1729007930 1729026848
git_commit_sha a1c2f48 6125e40
release_version 1.41.0-SNAPSHOT~a1c2f48c91 1.41.0-SNAPSHOT~6125e4016d
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1729029137 1729029137
ci_job_id 673304100 673304100
ci_pipeline_id 46668997 46668997
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 1 performance regressions! Performance is the same for 53 metrics, 9 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:insecure-bank:iast:AppSec worse
[+1.437ms; +6.218ms] or [+2.639%; +11.422%]
58.269ms 54.441ms
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.069 s) : 0, 1069458
Total [baseline] (10.392 s) : 0, 10391909
Agent [candidate] (1.074 s) : 0, 1074405
Total [candidate] (10.367 s) : 0, 10366964
section appsec
Agent [baseline] (1.202 s) : 0, 1202335
Total [baseline] (10.661 s) : 0, 10661319
Agent [candidate] (1.204 s) : 0, 1204305
Total [candidate] (10.624 s) : 0, 10623803
section iast
Agent [baseline] (1.209 s) : 0, 1209066
Total [baseline] (10.983 s) : 0, 10983177
Agent [candidate] (1.201 s) : 0, 1201424
Total [candidate] (10.856 s) : 0, 10855524
section profiling
Agent [baseline] (1.266 s) : 0, 1265624
Total [baseline] (10.586 s) : 0, 10585556
Agent [candidate] (1.289 s) : 0, 1288707
Total [candidate] (10.775 s) : 0, 10774970
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.069 s -
Agent appsec 1.202 s 132.877 ms (12.4%)
Agent iast 1.209 s 139.608 ms (13.1%)
Agent profiling 1.266 s 196.166 ms (18.3%)
Total tracing 10.392 s -
Total appsec 10.661 s 269.41 ms (2.6%)
Total iast 10.983 s 591.268 ms (5.7%)
Total profiling 10.586 s 193.647 ms (1.9%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.074 s -
Agent appsec 1.204 s 129.9 ms (12.1%)
Agent iast 1.201 s 127.019 ms (11.8%)
Agent profiling 1.289 s 214.302 ms (19.9%)
Total tracing 10.367 s -
Total appsec 10.624 s 256.839 ms (2.5%)
Total iast 10.856 s 488.56 ms (4.7%)
Total profiling 10.775 s 408.007 ms (3.9%)
gantt
    title petclinic - break down per module: candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (682.934 ms) : 0, 682934
BytebuddyAgent [candidate] (684.357 ms) : 0, 684357
GlobalTracer [baseline] (310.557 ms) : 0, 310557
GlobalTracer [candidate] (314.061 ms) : 0, 314061
AppSec [baseline] (54.006 ms) : 0, 54006
AppSec [candidate] (53.962 ms) : 0, 53962
Remote Config [baseline] (674.808 µs) : 0, 675
Remote Config [candidate] (679.002 µs) : 0, 679
Telemetry [baseline] (7.572 ms) : 0, 7572
Telemetry [candidate] (7.524 ms) : 0, 7524
section appsec
BytebuddyAgent [baseline] (699.666 ms) : 0, 699666
BytebuddyAgent [candidate] (699.307 ms) : 0, 699307
GlobalTracer [baseline] (307.783 ms) : 0, 307783
GlobalTracer [candidate] (310.092 ms) : 0, 310092
AppSec [baseline] (162.248 ms) : 0, 162248
AppSec [candidate] (162.92 ms) : 0, 162920
Remote Config [baseline] (639.909 µs) : 0, 640
Remote Config [candidate] (641.068 µs) : 0, 641
Telemetry [baseline] (8.182 ms) : 0, 8182
Telemetry [candidate] (7.849 ms) : 0, 7849
IAST [baseline] (20.152 ms) : 0, 20152
IAST [candidate] (19.521 ms) : 0, 19521
section iast
BytebuddyAgent [baseline] (805.475 ms) : 0, 805475
BytebuddyAgent [candidate] (799.079 ms) : 0, 799079
GlobalTracer [baseline] (301.801 ms) : 0, 301801
GlobalTracer [candidate] (302.232 ms) : 0, 302232
AppSec [baseline] (55.35 ms) : 0, 55350
AppSec [candidate] (56.468 ms) : 0, 56468
Remote Config [baseline] (617.004 µs) : 0, 617
Remote Config [candidate] (602.487 µs) : 0, 602
Telemetry [baseline] (7.211 ms) : 0, 7211
Telemetry [candidate] (7.108 ms) : 0, 7108
IAST [baseline] (24.772 ms) : 0, 24772
IAST [candidate] (22.176 ms) : 0, 22176
section profiling
ProfilingAgent [baseline] (96.196 ms) : 0, 96196
ProfilingAgent [candidate] (98.296 ms) : 0, 98296
BytebuddyAgent [baseline] (675.185 ms) : 0, 675185
BytebuddyAgent [candidate] (686.927 ms) : 0, 686927
GlobalTracer [baseline] (392.889 ms) : 0, 392889
GlobalTracer [candidate] (400.312 ms) : 0, 400312
AppSec [baseline] (54.569 ms) : 0, 54569
AppSec [candidate] (55.518 ms) : 0, 55518
Remote Config [baseline] (661.2 µs) : 0, 661
Remote Config [candidate] (672.733 µs) : 0, 673
Telemetry [baseline] (7.439 ms) : 0, 7439
Telemetry [candidate] (7.609 ms) : 0, 7609
Profiling [baseline] (96.22 ms) : 0, 96220
Profiling [candidate] (98.32 ms) : 0, 98320
Loading
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.069 s) : 0, 1068582
Total [baseline] (8.557 s) : 0, 8556965
Agent [candidate] (1.079 s) : 0, 1078622
Total [candidate] (8.578 s) : 0, 8578128
section iast
Agent [baseline] (1.199 s) : 0, 1198905
Total [baseline] (9.101 s) : 0, 9101009
Agent [candidate] (1.204 s) : 0, 1203672
Total [candidate] (9.148 s) : 0, 9148019
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.216 s) : 0, 1215752
Total [baseline] (9.084 s) : 0, 9084108
Agent [candidate] (1.208 s) : 0, 1208109
Total [candidate] (9.09 s) : 0, 9090031
section iast_TELEMETRY_OFF
Agent [baseline] (1.197 s) : 0, 1196751
Total [baseline] (9.123 s) : 0, 9123391
Agent [candidate] (1.215 s) : 0, 1214601
Total [candidate] (9.205 s) : 0, 9205397
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.069 s -
Agent iast 1.199 s 130.324 ms (12.2%)
Agent iast_HARDCODED_SECRET_DISABLED 1.216 s 147.171 ms (13.8%)
Agent iast_TELEMETRY_OFF 1.197 s 128.17 ms (12.0%)
Total tracing 8.557 s -
Total iast 9.101 s 544.044 ms (6.4%)
Total iast_HARDCODED_SECRET_DISABLED 9.084 s 527.143 ms (6.2%)
Total iast_TELEMETRY_OFF 9.123 s 566.427 ms (6.6%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.079 s -
Agent iast 1.204 s 125.05 ms (11.6%)
Agent iast_HARDCODED_SECRET_DISABLED 1.208 s 129.487 ms (12.0%)
Agent iast_TELEMETRY_OFF 1.215 s 135.979 ms (12.6%)
Total tracing 8.578 s -
Total iast 9.148 s 569.891 ms (6.6%)
Total iast_HARDCODED_SECRET_DISABLED 9.09 s 511.903 ms (6.0%)
Total iast_TELEMETRY_OFF 9.205 s 627.269 ms (7.3%)
gantt
    title insecure-bank - break down per module: candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (682.229 ms) : 0, 682229
BytebuddyAgent [candidate] (686.833 ms) : 0, 686833
GlobalTracer [baseline] (310.625 ms) : 0, 310625
GlobalTracer [candidate] (315.274 ms) : 0, 315274
AppSec [baseline] (53.719 ms) : 0, 53719
AppSec [candidate] (54.473 ms) : 0, 54473
Remote Config [baseline] (671.843 µs) : 0, 672
Remote Config [candidate] (673.813 µs) : 0, 674
Telemetry [baseline] (7.604 ms) : 0, 7604
Telemetry [candidate] (7.572 ms) : 0, 7572
section iast
BytebuddyAgent [baseline] (798.775 ms) : 0, 798775
BytebuddyAgent [candidate] (800.808 ms) : 0, 800808
GlobalTracer [baseline] (299.904 ms) : 0, 299904
GlobalTracer [candidate] (302.649 ms) : 0, 302649
AppSec [baseline] (54.441 ms) : 0, 54441
AppSec [candidate] (58.269 ms) : 0, 58269
IAST [baseline] (24.284 ms) : 0, 24284
IAST [candidate] (20.408 ms) : 0, 20408
Remote Config [baseline] (613.878 µs) : 0, 614
Remote Config [candidate] (604.044 µs) : 0, 604
Telemetry [baseline] (7.093 ms) : 0, 7093
Telemetry [candidate] (7.097 ms) : 0, 7097
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (810.41 ms) : 0, 810410
BytebuddyAgent [candidate] (803.6 ms) : 0, 803600
GlobalTracer [baseline] (303.926 ms) : 0, 303926
GlobalTracer [candidate] (304.272 ms) : 0, 304272
AppSec [baseline] (55.672 ms) : 0, 55672
AppSec [candidate] (57.281 ms) : 0, 57281
IAST [baseline] (23.908 ms) : 0, 23908
IAST [candidate] (21.373 ms) : 0, 21373
Remote Config [baseline] (623.377 µs) : 0, 623
Remote Config [candidate] (606.584 µs) : 0, 607
Telemetry [baseline] (7.19 ms) : 0, 7190
Telemetry [candidate] (7.083 ms) : 0, 7083
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (796.398 ms) : 0, 796398
BytebuddyAgent [candidate] (807.867 ms) : 0, 807867
GlobalTracer [baseline] (300.086 ms) : 0, 300086
GlobalTracer [candidate] (306.089 ms) : 0, 306089
AppSec [baseline] (55.197 ms) : 0, 55197
AppSec [candidate] (55.602 ms) : 0, 55602
IAST [baseline] (23.599 ms) : 0, 23599
IAST [candidate] (23.409 ms) : 0, 23409
Remote Config [baseline] (637.055 µs) : 0, 637
Remote Config [candidate] (608.916 µs) : 0, 609
Telemetry [baseline] (7.026 ms) : 0, 7026
Telemetry [candidate] (7.035 ms) : 0, 7035
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-10-15T21:24:42 2024-10-15T21:31:33
git_branch master piotr-wolski/improve-dsm-perf
git_commit_date 1729007930 1729026848
git_commit_sha a1c2f48 6125e40
release_version 1.41.0-SNAPSHOT~a1c2f48c91 1.41.0-SNAPSHOT~6125e4016d
start_time 2024-10-15T21:24:28 2024-10-15T21:31:20
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1729028239 1729028239
ci_job_id 673304101 673304101
ci_pipeline_id 46668997 46668997
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 17 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
    dateFormat X
    axisFormat %s
section baseline
no_agent (370.282 µs) : 348, 392
.   : milestone, 370,
iast (481.099 µs) : 460, 502
.   : milestone, 481,
iast_FULL (550.495 µs) : 529, 572
.   : milestone, 550,
iast_GLOBAL (511.122 µs) : 488, 534
.   : milestone, 511,
iast_HARDCODED_SECRET_DISABLED (480.963 µs) : 460, 502
.   : milestone, 481,
iast_INACTIVE (440.193 µs) : 420, 461
.   : milestone, 440,
iast_TELEMETRY_OFF (468.239 µs) : 447, 489
.   : milestone, 468,
tracing (450.043 µs) : 429, 471
.   : milestone, 450,
section candidate
no_agent (369.543 µs) : 350, 389
.   : milestone, 370,
iast (484.038 µs) : 463, 505
.   : milestone, 484,
iast_FULL (551.184 µs) : 530, 573
.   : milestone, 551,
iast_GLOBAL (511.819 µs) : 489, 534
.   : milestone, 512,
iast_HARDCODED_SECRET_DISABLED (483.626 µs) : 462, 505
.   : milestone, 484,
iast_INACTIVE (450.21 µs) : 428, 472
.   : milestone, 450,
iast_TELEMETRY_OFF (474.067 µs) : 452, 496
.   : milestone, 474,
tracing (442.117 µs) : 421, 463
.   : milestone, 442,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 370.282 µs [348.344 µs, 392.22 µs] -
iast 481.099 µs [459.809 µs, 502.388 µs] 110.817 µs (29.9%)
iast_FULL 550.495 µs [529.178 µs, 571.812 µs] 180.213 µs (48.7%)
iast_GLOBAL 511.122 µs [488.401 µs, 533.842 µs] 140.84 µs (38.0%)
iast_HARDCODED_SECRET_DISABLED 480.963 µs [459.698 µs, 502.227 µs] 110.681 µs (29.9%)
iast_INACTIVE 440.193 µs [419.746 µs, 460.639 µs] 69.911 µs (18.9%)
iast_TELEMETRY_OFF 468.239 µs [447.104 µs, 489.374 µs] 97.957 µs (26.5%)
tracing 450.043 µs [428.872 µs, 471.215 µs] 79.761 µs (21.5%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 369.543 µs [349.868 µs, 389.218 µs] -
iast 484.038 µs [462.73 µs, 505.347 µs] 114.496 µs (31.0%)
iast_FULL 551.184 µs [529.828 µs, 572.54 µs] 181.641 µs (49.2%)
iast_GLOBAL 511.819 µs [489.404 µs, 534.234 µs] 142.276 µs (38.5%)
iast_HARDCODED_SECRET_DISABLED 483.626 µs [462.473 µs, 504.78 µs] 114.083 µs (30.9%)
iast_INACTIVE 450.21 µs [428.405 µs, 472.014 µs] 80.667 µs (21.8%)
iast_TELEMETRY_OFF 474.067 µs [452.318 µs, 495.815 µs] 104.524 µs (28.3%)
tracing 442.117 µs [421.46 µs, 462.774 µs] 72.574 µs (19.6%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.346 ms) : 1326, 1366
.   : milestone, 1346,
appsec (1.694 ms) : 1669, 1719
.   : milestone, 1694,
appsec_no_iast (1.724 ms) : 1701, 1748
.   : milestone, 1724,
iast (1.473 ms) : 1451, 1496
.   : milestone, 1473,
profiling (1.477 ms) : 1453, 1502
.   : milestone, 1477,
tracing (1.479 ms) : 1452, 1506
.   : milestone, 1479,
section candidate
no_agent (1.332 ms) : 1313, 1351
.   : milestone, 1332,
appsec (1.721 ms) : 1696, 1746
.   : milestone, 1721,
appsec_no_iast (1.712 ms) : 1688, 1736
.   : milestone, 1712,
iast (1.484 ms) : 1462, 1506
.   : milestone, 1484,
profiling (1.474 ms) : 1451, 1497
.   : milestone, 1474,
tracing (1.447 ms) : 1422, 1472
.   : milestone, 1447,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.346 ms [1.326 ms, 1.366 ms] -
appsec 1.694 ms [1.669 ms, 1.719 ms] 348.06 µs (25.9%)
appsec_no_iast 1.724 ms [1.701 ms, 1.748 ms] 378.468 µs (28.1%)
iast 1.473 ms [1.451 ms, 1.496 ms] 127.204 µs (9.5%)
profiling 1.477 ms [1.453 ms, 1.502 ms] 131.574 µs (9.8%)
tracing 1.479 ms [1.452 ms, 1.506 ms] 133.107 µs (9.9%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.332 ms [1.313 ms, 1.351 ms] -
appsec 1.721 ms [1.696 ms, 1.746 ms] 388.869 µs (29.2%)
appsec_no_iast 1.712 ms [1.688 ms, 1.736 ms] 379.844 µs (28.5%)
iast 1.484 ms [1.462 ms, 1.506 ms] 152.081 µs (11.4%)
profiling 1.474 ms [1.451 ms, 1.497 ms] 142.085 µs (10.7%)
tracing 1.447 ms [1.422 ms, 1.472 ms] 114.689 µs (8.6%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master piotr-wolski/improve-dsm-perf
git_commit_date 1729007930 1729026848
git_commit_sha a1c2f48 6125e40
release_version 1.41.0-SNAPSHOT~a1c2f48c91 1.41.0-SNAPSHOT~6125e4016d
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1729028759 1729028759
ci_job_id 673304102 673304102
ci_pipeline_id 46668997 46668997
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant appsec appsec

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.464 ms) : 1453, 1475
.   : milestone, 1464,
appsec (2.306 ms) : 2265, 2347
.   : milestone, 2306,
iast (2.053 ms) : 2002, 2104
.   : milestone, 2053,
iast_GLOBAL (2.096 ms) : 2044, 2147
.   : milestone, 2096,
profiling (1.926 ms) : 1886, 1967
.   : milestone, 1926,
tracing (1.902 ms) : 1863, 1941
.   : milestone, 1902,
section candidate
no_agent (1.468 ms) : 1456, 1479
.   : milestone, 1468,
appsec (2.315 ms) : 2274, 2356
.   : milestone, 2315,
iast (2.062 ms) : 2010, 2114
.   : milestone, 2062,
iast_GLOBAL (2.11 ms) : 2058, 2162
.   : milestone, 2110,
profiling (1.924 ms) : 1883, 1965
.   : milestone, 1924,
tracing (1.901 ms) : 1862, 1940
.   : milestone, 1901,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.464 ms [1.453 ms, 1.475 ms] -
appsec 2.306 ms [2.265 ms, 2.347 ms] 841.951 µs (57.5%)
iast 2.053 ms [2.002 ms, 2.104 ms] 588.692 µs (40.2%)
iast_GLOBAL 2.096 ms [2.044 ms, 2.147 ms] 631.874 µs (43.2%)
profiling 1.926 ms [1.886 ms, 1.967 ms] 462.345 µs (31.6%)
tracing 1.902 ms [1.863 ms, 1.941 ms] 437.622 µs (29.9%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.468 ms [1.456 ms, 1.479 ms] -
appsec 2.315 ms [2.274 ms, 2.356 ms] 847.019 µs (57.7%)
iast 2.062 ms [2.01 ms, 2.114 ms] 594.685 µs (40.5%)
iast_GLOBAL 2.11 ms [2.058 ms, 2.162 ms] 641.85 µs (43.7%)
profiling 1.924 ms [1.883 ms, 1.965 ms] 456.35 µs (31.1%)
tracing 1.901 ms [1.862 ms, 1.94 ms] 433.028 µs (29.5%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.41.0-SNAPSHOT~6125e4016d, baseline=1.41.0-SNAPSHOT~a1c2f48c91
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.466 s) : 15466000, 15466000
.   : milestone, 15466000,
appsec (15.119 s) : 15119000, 15119000
.   : milestone, 15119000,
iast (18.847 s) : 18847000, 18847000
.   : milestone, 18847000,
iast_GLOBAL (18.082 s) : 18082000, 18082000
.   : milestone, 18082000,
profiling (15.238 s) : 15238000, 15238000
.   : milestone, 15238000,
tracing (15.415 s) : 15415000, 15415000
.   : milestone, 15415000,
section candidate
no_agent (15.132 s) : 15132000, 15132000
.   : milestone, 15132000,
appsec (15.24 s) : 15240000, 15240000
.   : milestone, 15240000,
iast (18.857 s) : 18857000, 18857000
.   : milestone, 18857000,
iast_GLOBAL (17.882 s) : 17882000, 17882000
.   : milestone, 17882000,
profiling (15.605 s) : 15605000, 15605000
.   : milestone, 15605000,
tracing (15.227 s) : 15227000, 15227000
.   : milestone, 15227000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.466 s [15.466 s, 15.466 s] -
appsec 15.119 s [15.119 s, 15.119 s] -347.0 ms (-2.2%)
iast 18.847 s [18.847 s, 18.847 s] 3.381 s (21.9%)
iast_GLOBAL 18.082 s [18.082 s, 18.082 s] 2.616 s (16.9%)
profiling 15.238 s [15.238 s, 15.238 s] -228.0 ms (-1.5%)
tracing 15.415 s [15.415 s, 15.415 s] -51.0 ms (-0.3%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.132 s [15.132 s, 15.132 s] -
appsec 15.24 s [15.24 s, 15.24 s] 108.0 ms (0.7%)
iast 18.857 s [18.857 s, 18.857 s] 3.725 s (24.6%)
iast_GLOBAL 17.882 s [17.882 s, 17.882 s] 2.75 s (18.2%)
profiling 15.605 s [15.605 s, 15.605 s] 473.0 ms (3.1%)
tracing 15.227 s [15.227 s, 15.227 s] 95.0 ms (0.6%)

@piochelepiotr piochelepiotr marked this pull request as ready for review October 10, 2024 14:18
@piochelepiotr piochelepiotr requested review from a team as code owners October 10, 2024 14:18
Comment on lines +316 to +320
InboxItem payload = inbox.poll();
if (payload == null) {
Thread.sleep(10);
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is busy waiting really better than a parked thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the profiles, it looks much better (overhead of this queue went to nearly 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@ddyurchenko ddyurchenko Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the profiles, it looks much better (overhead of this queue went to nearly 0)

Mb. overhead metric calculated from profile is not right? What is the change in latency?
I second on this one, sounds strange that doing more work results in less overhead. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know Java enough, but in Go, the overhead of doing Go routine synchronization was much more than the overhead of checking if there is data once every 10 ms.
The problem, is that at each write, a Go channel had to then try to wake up the consumer, and that was causing a lot of overhead. I expect it's about the same here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar was done in #4409 to address the same bottleneck - pthread_cond_signal.

I don't think it's helpful to describe this as "strange" without consideration of the production and consumption rates of this queue. The production rate is equivalent to rps, and the consumption rate is a constant 100. Considering the overhead from the perspective of the producer, which is no longer responsible for unblocking threads and can just drop the item off in the queue (improving the latency of the producer thread). On the other hand, the consumer is now waking up 100 times per second, either to do some work or to go back to sleep, incurring a useless context switch every 10ms. (However, when it was waiting on a condition, it might have woken up more than 100 times per second anyway if rps were greater than 100 and would only have woken up less often if the traffic were bursty.) So this change should reduce CPU overhead whenever (items produced per second) * (cpu cost of pthread_cond_signal) > (cost of scheduling the thread) * 100. Assuming pthread_cond_signal costs the same as scheduling the thread (we have to assume because the profiler only samples one of these) this is a win whenever rps > 100. I think it's more likely that the cost of scheduling the thread is 10% of pthread_cond_signal meaning this change would win at rps > 10.

If you aim to reduce latency impact on application threads, this change is a win because the application threads are no longer responsible for waking up one of our background threads. If you aim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the context!

Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great use of the profiler, and the changes make sense to me.

@piochelepiotr piochelepiotr merged commit affb61a into master Oct 16, 2024
104 checks passed
@piochelepiotr piochelepiotr deleted the piotr-wolski/improve-dsm-perf branch October 16, 2024 02:33
@github-actions github-actions bot added this to the 1.41.0 milestone Oct 16, 2024
@nayeem-kamal nayeem-kamal added tag: performance Performance related changes comp: data streams Data Streams Monitoring labels Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: data streams Data Streams Monitoring tag: performance Performance related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants