Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: integrate real symbol table into stats system #4980

Merged
merged 186 commits into from
Jun 28, 2020

Conversation

jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Nov 7, 2018

Description: Switches the default from using fake symbol tables to using real symbol tables.
Risk Level: medium -- every effort was made to eliminate risk of contention by symbolizing stat names during initialization phases. However, some code paths may have evaded this. Using real symbol tables does appear to be significantly faster. For example, a microbenchmark for the HTTP response code stats shows approximately 40% speed improvement, in addition to a significant reduction in per-cluster memory usage.

----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
BM_AddResponsesFakeSymtab         9116 ns         9116 ns        66687
BM_AddResponsesRealSymtab         5613 ns         5613 ns       125048
BM_ResponseTimingFakeSymtab        678 ns          678 ns      1032669
BM_ResponseTimingRealSymtab        379 ns          379 ns      1840949

Testing: //test/...
Docs Changes: should update stats.md to reflect the default change.
Release Notes: will need those still.
Fixes: #3585, #4196

Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
@jmarantz
Copy link
Contributor Author

jmarantz commented Nov 7, 2018

@ambuc FYI

Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
@jmarantz
Copy link
Contributor Author

jmarantz commented Nov 7, 2018

Got some encouraging perf results to motivate this large complicated scary PR :)

Memory usage is cut by 50% in HeapStatsThreadLocalStoreTest.Memory*:

--- a/test/common/stats/thread_local_store_test.cc
+++ b/test/common/stats/thread_local_store_test.cc
@@ -668,7 +668,7 @@ TEST_F(HeapStatsThreadLocalStoreTest, MemoryWithoutTls) {
-  EXPECT_LT(end_mem - start_mem, 28 * million); // actual value: 27203216 as of Oct 29, 2018
+  EXPECT_LT(end_mem - start_mem, 13 * million); // actual value: 12443472 as of Nov 7, 2018
 
 TEST_F(HeapStatsThreadLocalStoreTest, MemoryWithTls) {
@@ -691,7 +691,7 @@ TEST_F(HeapStatsThreadLocalStoreTest, MemoryWithTls) {
-  EXPECT_LT(end_mem - start_mem, 31 * million); // actual value: 30482576 as of Oct 29, 2018
+  EXPECT_LT(end_mem - start_mem, 16 * million); // actual value: 15722832 as of Nov 7, 2018

Speed tests also look like they doubled as well...but will confirm after a couple of corroborating runs.

@jmarantz
Copy link
Contributor Author

jmarantz commented Nov 7, 2018

Siege results (with a modified siege.py to include contention) tell an interesting story. I was expecting a perf improvement, but it was about a wash. But now that contentions are tracked explicitly we can see that in the current state of this PR, mutex contention increases significantly.

./siege.py ~/git4/envoy/bazel-bin/source/exe/envoy-static ~/git3/envoy/bazel-bin/source/exe/envoy-static  /tmp/envoy-perf/
Logging 10 runs to /tmp/envoy-perf/siege.log ...1...2...3...4...5...6...7...8...9...10

             Clean      Std Dev      Failures  Experimental  Std Dev      Failures  Improvement
             -----      -------      --------  ------------  -------      --------  -----------
Trans Rate   39682.54   3843.494     0         38759.69      2458.056     0         2.326%
Throughput   130.16     12.606       0         127.13        8.062        0         2.328%
Failed       0.0        0.0          0         0.0           0.0          0         0
EnvoyMem     4195776.0  3275.021     0         4162984.0     1233.033     0         0.782%
VSZ          343464.0   2.191        0         343496.0      2.191        0         -0.009%
RSS          51936.0    5102.665     0         54304.0       3349.689     0         -4.559%
Contentions  5.0        4.561        0         208.0         33.147       0         -4060.0%
WaitCycles   297543.0   2381894.537  0         57518417.0    11873445.75  0         -19231.128%

More work is required on the PR to remove the contentions, by symbolizing stat names on startup rather than composing strings on every request to lookup a stat. Or better yet, hold onto the stat objects themselves when possible.

@jmarantz
Copy link
Contributor Author

jmarantz commented Nov 7, 2018

Microbenchmark shows significant improvement. This differs from the siege results in that there will be no contention in the single-threaded benchmark, and of course it measures only the stat lookups and does not include all the other work needed to proxy a few K of lorem ipsum.

old:

--------------------------------------------------------------
Benchmark                       Time           CPU Iterations
--------------------------------------------------------------
BM_StatsNoTls_mean       17363665 ns   17363797 ns         31
BM_StatsNoTls_median     17291555 ns   17291754 ns         31
BM_StatsNoTls_stddev       362025 ns     362036 ns         31
BM_StatsWithTls_mean     20593810 ns   20593997 ns         27
BM_StatsWithTls_median   20169259 ns   20169508 ns         27
BM_StatsWithTls_stddev    1412912 ns    1412939 ns         27

new

--------------------------------------------------------------
Benchmark                       Time           CPU Iterations
--------------------------------------------------------------
BM_StatsNoTls_mean        7010833 ns    7010901 ns         78
BM_StatsNoTls_median      6977452 ns    6977538 ns         78
BM_StatsNoTls_stddev       133710 ns     133713 ns         78
BM_StatsWithTls_mean      9030426 ns    9030522 ns         64
BM_StatsWithTls_median    8972650 ns    8972531 ns         64
BM_StatsWithTls_stddev     171496 ns     171491 ns         64

This is more than a 2x improvement in stats overhead (assuming we are looking up stats from symbolized names rather than from strings).

… it in, and preconstruct strings.

Signed-off-by: Joshua Marantz <[email protected]>
…_view rather than std::string.

Signed-off-by: Joshua Marantz <[email protected]>
@mattklein123 mattklein123 self-assigned this Nov 8, 2018
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
…, to avoid plumbing one common SymbolTable through the object graph.

Signed-off-by: Joshua Marantz <[email protected]>
…ile-system not per-file, and add histogramming of symbol-table encodes.

Signed-off-by: Joshua Marantz <[email protected]>
mattklein123
mattklein123 previously approved these changes Jun 25, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@jmarantz
Copy link
Contributor Author

tsan tests failures due to timeouts, one of which is eds_speed_test_benchmark_test. On my workstation, with tsan:

//test/common/upstream:eds_speed_test_benchmark_test PASSED in 156.6s

So I'm trying to re-run that; I guess maybe I've trigged that already but it will run after 'format'?

@jmarantz
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s), but failed to run 2 pipeline(s).

@mattklein123
Copy link
Member

@jmarantz sorry can you merge master again to fix the conflict? The TSAN flake should be fixed.

/wait

@jmarantz
Copy link
Contributor Author

I think I have another TSAN timeout issue as well, which I am hoping will be mitigated by #11768

but let's see what happens after the merge.

@jmarantz
Copy link
Contributor Author

@mattklein123 this looks like it passes now; I had to de-flake the http2 integration test with a wait-for-counter-ge, and increase the test-size for ads_integration_test. I suspect that tsan overhead disproportionately slows down the tests with real symbol table due to all the locks being taken, regardless of whether they contend.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, LGTM with a small merge issue I think.

/wait

@@ -107,6 +107,9 @@ New Features
* metrics service: added added :ref:`API version <envoy_v3_api_field_config.metrics.v3.MetricsServiceConfig.transport_api_version>` to explicitly set the version of gRPC service endpoint and message to be used.
* network filters: added a :ref:`postgres proxy filter <config_network_filters_postgres_proxy>`.
* network filters: added a :ref:`rocketmq proxy filter <config_network_filters_rocketmq_proxy>`.
* performance: stats symbol table implementation (enabled by default; to disable it, add
`--use-fake-symbol-table 1` to the command-line arguments when starting Envoy).
* prometheus stats: fix the sort order of output lines to comply with the standard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge issue?

Signed-off-by: Joshua Marantz <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@mattklein123 mattklein123 merged commit 0c03a76 into envoyproxy:master Jun 28, 2020
@jmarantz
Copy link
Contributor Author

Thanks for all the help pushing that through @mattklein123! This has to be close a the record for longest PR issue->merge time delta.

@jmarantz jmarantz deleted the integration-symtab branch June 29, 2020 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stats memory usage can probably be improved quite a bit
3 participants