Prevent caching of PDH counter instances by default #2654

ofek · 2018-11-28T05:14:58Z

Motivation

Some checks' counters may change frequently e.g. active VMs

Benchmarks

https://ci.appveyor.com/project/Datadog/integrations-core/builds/20973815#L868

It looks like the change has near linear scaling: on mocks and real setups it's about up to 3 times slower compared to caching everything forever. The timing for each case seems to simply depend on the number of counter instances.

hyperv w/ 2 VMs running:

--------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------
Name (time in ms)        Min                 Max              Mean             StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            2.4076 (1.0)      105.0396 (1.0)      6.6155 (1.0)      19.5643 (1.58)     2.6306 (1.0)      0.2246 (1.0)         13;32  151.1598 (1.0)         334           1
test_no_cache         5.8093 (2.41)     109.4109 (1.04)     7.9470 (1.20)     12.3704 (1.0)      6.2489 (2.38)     0.4086 (1.82)          3;8  125.8331 (0.83)        138           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

active_directory w/ our mocks

------------------------------------------------------------------------------------------ benchmark: 2 tests -----------------------------------------------------------------------------------------
Name (time in us)          Min                    Max                Mean                StdDev              Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            125.0000 (1.0)      51,377.8000 (1.0)      174.7816 (1.0)      1,023.3445 (1.0)      131.8000 (1.0)       13.4000 (1.0)         4;641        5.7214 (1.0)        4726           1
test_no_cache         429.0000 (3.43)     58,860.6000 (1.15)     613.5881 (3.51)     1,638.4698 (1.60)     515.2000 (3.91)     123.1500 (9.19)         9;97        1.6298 (0.28)       1299           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

aspdotnet w/ our mocks

------------------------------------------------------------------------------------------ benchmark: 2 tests -----------------------------------------------------------------------------------------
Name (time in us)          Min                    Max                  Mean                StdDev              Median                 IQR            Outliers         OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            321.5000 (1.0)      67,419.0000 (1.0)        466.7370 (1.0)      1,959.9679 (1.0)      350.6000 (1.0)       45.2000 (1.0)         5;283  2,142.5344 (1.0)        2370           1
test_no_cache         865.9000 (2.69)     83,835.8000 (1.24)     1,180.8342 (2.53)     2,812.2705 (1.43)     949.0000 (2.71)     144.9000 (3.21)        3;141    846.8589 (0.40)        885           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

dotnetclr w/ our mocks

--------------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------------
Name (time in us)            Min                    Max                  Mean                StdDev                Median                   IQR            Outliers         OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache              464.9000 (1.0)      61,455.1000 (1.0)        882.7009 (1.0)      2,645.8914 (1.0)        580.6000 (1.0)        480.5000 (1.0)          9;21  1,132.8866 (1.0)        1554           1
test_no_cache         1,208.2000 (2.60)     94,526.0000 (1.54)     2,099.1361 (2.38)     4,733.5765 (1.79)     1,483.2000 (2.55)     1,151.3000 (2.40)          3;8    476.3865 (0.42)        770           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

exchange_server w/ our mocks

-------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------
Name (time in ms)        Min                Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            1.4012 (1.0)      72.2088 (1.0)      2.1587 (1.0)      4.2311 (1.0)      1.5840 (1.0)      0.4456 (1.05)         5;54  463.2413 (1.0)         558           1
test_no_cache         3.9492 (2.82)     79.1946 (1.10)     4.8880 (2.26)     5.1619 (1.22)     4.3766 (2.76)     0.4249 (1.0)          2;14  204.5841 (0.44)        212           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

iis on appveyor

-------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------
Name (time in ms)        Min                Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            1.0079 (1.0)       6.5850 (1.0)      1.2504 (1.0)      0.4201 (1.0)      1.0768 (1.0)      0.1810 (1.0)         75;81  799.7350 (1.0)         438           1
test_no_cache         1.0084 (1.00)     41.2769 (6.27)     1.3733 (1.10)     1.7076 (4.06)     1.0980 (1.02)     0.2992 (1.65)        5;126  728.1694 (0.91)        883           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

pdh_check w/ our mocks

-------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------
Name (time in us)         Min                    Max               Mean              StdDev             Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            20.2000 (1.0)       1,941.0000 (1.0)      25.5125 (1.0)       43.7204 (1.0)      21.0000 (1.0)      1.0000 (1.0)       69;2181       39.1964 (1.0)       11948           1
test_no_cache         20.2000 (1.0)      44,776.3000 (23.07)    30.4910 (1.20)     306.4939 (7.01)     21.2000 (1.01)     4.3000 (4.30)      34;7298       32.7965 (0.84)      32680           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

codecov-io · 2018-11-28T05:20:16Z

Codecov Report

Merging #2654 into master will decrease coverage by 8.3%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2654      +/-   ##
==========================================
- Coverage   84.72%   76.42%   -8.31%     
==========================================
  Files         662       45     -617     
  Lines       37611     3313   -34298     
  Branches     4507      387    -4120     
==========================================
- Hits        31867     2532   -29335     
+ Misses       4428      684    -3744     
+ Partials     1316       97    -1219

masci · 2018-12-11T07:48:22Z

About the benchmarks:

how much data are you querying for?
do you have a rough idea how this would scale? Does collection time depend on the amount of data we fetch, or the amount of calls, how an integration like Hiper-V potentially collecting data for thousands of VMs would perform?

ofek · 2018-12-13T07:14:35Z

@masci I updated OP. That initial set up my machine was borked and results were way off. Also confirmed all tests pass still #2739

gzussa · 2018-12-28T17:45:13Z

datadog_checks_base/datadog_checks/base/checks/win/winpdh_base.py

@@ -182,3 +138,51 @@ def check(self, instance):
            except Exception as e:
                # don't give up on all of the metrics because one failed
                self.log.error("Failed to get data for %s %s: %s" % (inst_name, dd_name, str(e)))
+
+    def _make_counters(self, key, counter_data):


Nice factorization here!

gzussa · 2018-12-28T17:49:11Z

datadog_checks_base/datadog_checks/base/checks/win/winpdh_base.py

@@ -167,8 +115,16 @@ def __init__(self, name, init_config, agentConfig, instances, counter_list):
    def check(self, instance):
        self.log.debug("PDHBaseCheck: check()")
        key = hash_mutable(instance)
+        refresh_counters = is_affirmative(instance.get('refresh_counters', True))
+
+        if refresh_counters:


Why are we having this in the check method now?

behind flag in case someone encounters perf issues

gzussa · 2018-12-28T17:49:49Z

datadog_checks_base/datadog_checks/base/checks/win/winpdh_base.py

+
+            entry = [inst_name, dd_name, m, obj]
+            self.log.debug('{}: {}'.format(message, entry))
+            self._metrics[key].append(entry)


Because of this method usage at https://github.com/DataDog/integrations-core/pull/2654/files#diff-14769541de478cc114862b41f3cdcfe2R122 , isn't it a risk that the key won't exist?

Key will always exist

integrations-core/datadog_checks_base/datadog_checks/base/checks/win/winpdh_base.py

Line 55 in 8567b6e

self._metrics[key] = []

ofek added integration/datadog_checks_base changelog/Added labels Nov 28, 2018

ofek requested review from a team as code owners November 28, 2018 05:14

ofek force-pushed the ofek/pdh branch from 1791a9c to 7e11746 Compare November 28, 2018 05:34

ofek added the do-not-merge/WIP label Nov 29, 2018

ofek added 5 commits December 6, 2018 22:14

Add PDH option to prevent caching of instance counters

da67b85

smarter caching

dd63d20

refactor

772ab44

fix

eb108cd

more efficient

33234c3

ofek force-pushed the ofek/pdh branch from 98ecf11 to 33234c3 Compare December 7, 2018 03:17

ofek removed the do-not-merge/WIP label Dec 10, 2018

cache here too

0a11b5c

ofek added 3 commits December 13, 2018 11:26

better logging

bde512a

debug

e4c7b07

Merge branch 'master' into ofek/pdh

cbd4347

ofek requested a review from a team as a code owner December 17, 2018 22:49

fix

d3c36bf

ofek mentioned this pull request Dec 24, 2018

pdh bench #2739

Closed

fix mocks

72bb99d

ofek force-pushed the ofek/pdh branch from 69a057c to 72bb99d Compare December 25, 2018 05:22

ofek added 3 commits December 27, 2018 16:14

working now, remove test logging

974f393

fix log verbosity

6e47375

make it the default behavior

8567b6e

gzussa reviewed Dec 28, 2018

View reviewed changes

gzussa approved these changes Dec 28, 2018

View reviewed changes

ofek merged commit 9893584 into master Dec 28, 2018

ofek deleted the ofek/pdh branch December 28, 2018 18:27

ofek changed the title ~~Add PDH option to prevent caching of instance counters~~ Prevent caching of PDH counter instances by default Dec 28, 2018

ofek mentioned this pull request May 31, 2019

Handle the refresh_counters flag #3840

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent caching of PDH counter instances by default #2654

Prevent caching of PDH counter instances by default #2654

ofek commented Nov 28, 2018 •

edited

Loading

codecov-io commented Nov 28, 2018 •

edited

Loading

masci commented Dec 11, 2018

ofek commented Dec 13, 2018 •

edited

Loading

gzussa Dec 28, 2018

gzussa Dec 28, 2018

ofek Dec 28, 2018

gzussa Dec 28, 2018

ofek Dec 28, 2018

Prevent caching of PDH counter instances by default #2654

Prevent caching of PDH counter instances by default #2654

Conversation

ofek commented Nov 28, 2018 • edited Loading

Motivation

Benchmarks

hyperv w/ 2 VMs running:

active_directory w/ our mocks

aspdotnet w/ our mocks

dotnetclr w/ our mocks

exchange_server w/ our mocks

iis on appveyor

pdh_check w/ our mocks

codecov-io commented Nov 28, 2018 • edited Loading

Codecov Report

masci commented Dec 11, 2018

ofek commented Dec 13, 2018 • edited Loading

gzussa Dec 28, 2018

Choose a reason for hiding this comment

gzussa Dec 28, 2018

Choose a reason for hiding this comment

ofek Dec 28, 2018

Choose a reason for hiding this comment

gzussa Dec 28, 2018

Choose a reason for hiding this comment

ofek Dec 28, 2018

Choose a reason for hiding this comment

ofek commented Nov 28, 2018 •

edited

Loading

codecov-io commented Nov 28, 2018 •

edited

Loading

ofek commented Dec 13, 2018 •

edited

Loading