Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent caching of PDH counter instances by default #2654

Merged
merged 14 commits into from
Dec 28, 2018
Merged

Prevent caching of PDH counter instances by default #2654

merged 14 commits into from
Dec 28, 2018

Conversation

ofek
Copy link
Contributor

@ofek ofek commented Nov 28, 2018

Motivation

Some checks' counters may change frequently e.g. active VMs

Benchmarks

https://ci.appveyor.com/project/Datadog/integrations-core/builds/20973815#L868

It looks like the change has near linear scaling: on mocks and real setups it's about up to 3 times slower compared to caching everything forever. The timing for each case seems to simply depend on the number of counter instances.

hyperv w/ 2 VMs running:

--------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------
Name (time in ms)        Min                 Max              Mean             StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            2.4076 (1.0)      105.0396 (1.0)      6.6155 (1.0)      19.5643 (1.58)     2.6306 (1.0)      0.2246 (1.0)         13;32  151.1598 (1.0)         334           1
test_no_cache         5.8093 (2.41)     109.4109 (1.04)     7.9470 (1.20)     12.3704 (1.0)      6.2489 (2.38)     0.4086 (1.82)          3;8  125.8331 (0.83)        138           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

active_directory w/ our mocks

------------------------------------------------------------------------------------------ benchmark: 2 tests -----------------------------------------------------------------------------------------
Name (time in us)          Min                    Max                Mean                StdDev              Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            125.0000 (1.0)      51,377.8000 (1.0)      174.7816 (1.0)      1,023.3445 (1.0)      131.8000 (1.0)       13.4000 (1.0)         4;641        5.7214 (1.0)        4726           1
test_no_cache         429.0000 (3.43)     58,860.6000 (1.15)     613.5881 (3.51)     1,638.4698 (1.60)     515.2000 (3.91)     123.1500 (9.19)         9;97        1.6298 (0.28)       1299           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

aspdotnet w/ our mocks

------------------------------------------------------------------------------------------ benchmark: 2 tests -----------------------------------------------------------------------------------------
Name (time in us)          Min                    Max                  Mean                StdDev              Median                 IQR            Outliers         OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            321.5000 (1.0)      67,419.0000 (1.0)        466.7370 (1.0)      1,959.9679 (1.0)      350.6000 (1.0)       45.2000 (1.0)         5;283  2,142.5344 (1.0)        2370           1
test_no_cache         865.9000 (2.69)     83,835.8000 (1.24)     1,180.8342 (2.53)     2,812.2705 (1.43)     949.0000 (2.71)     144.9000 (3.21)        3;141    846.8589 (0.40)        885           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

dotnetclr w/ our mocks

--------------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------------
Name (time in us)            Min                    Max                  Mean                StdDev                Median                   IQR            Outliers         OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache              464.9000 (1.0)      61,455.1000 (1.0)        882.7009 (1.0)      2,645.8914 (1.0)        580.6000 (1.0)        480.5000 (1.0)          9;21  1,132.8866 (1.0)        1554           1
test_no_cache         1,208.2000 (2.60)     94,526.0000 (1.54)     2,099.1361 (2.38)     4,733.5765 (1.79)     1,483.2000 (2.55)     1,151.3000 (2.40)          3;8    476.3865 (0.42)        770           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

exchange_server w/ our mocks

-------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------
Name (time in ms)        Min                Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            1.4012 (1.0)      72.2088 (1.0)      2.1587 (1.0)      4.2311 (1.0)      1.5840 (1.0)      0.4456 (1.05)         5;54  463.2413 (1.0)         558           1
test_no_cache         3.9492 (2.82)     79.1946 (1.10)     4.8880 (2.26)     5.1619 (1.22)     4.3766 (2.76)     0.4249 (1.0)          2;14  204.5841 (0.44)        212           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

iis on appveyor

-------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------
Name (time in ms)        Min                Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            1.0079 (1.0)       6.5850 (1.0)      1.2504 (1.0)      0.4201 (1.0)      1.0768 (1.0)      0.1810 (1.0)         75;81  799.7350 (1.0)         438           1
test_no_cache         1.0084 (1.00)     41.2769 (6.27)     1.3733 (1.10)     1.7076 (4.06)     1.0980 (1.02)     0.2992 (1.65)        5;126  728.1694 (0.91)        883           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

pdh_check w/ our mocks

-------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------
Name (time in us)         Min                    Max               Mean              StdDev             Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cache            20.2000 (1.0)       1,941.0000 (1.0)      25.5125 (1.0)       43.7204 (1.0)      21.0000 (1.0)      1.0000 (1.0)       69;2181       39.1964 (1.0)       11948           1
test_no_cache         20.2000 (1.0)      44,776.3000 (23.07)    30.4910 (1.20)     306.4939 (7.01)     21.2000 (1.01)     4.3000 (4.30)      34;7298       32.7965 (0.84)      32680           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@codecov-io
Copy link

codecov-io commented Nov 28, 2018

Codecov Report

Merging #2654 into master will decrease coverage by 8.3%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2654      +/-   ##
==========================================
- Coverage   84.72%   76.42%   -8.31%     
==========================================
  Files         662       45     -617     
  Lines       37611     3313   -34298     
  Branches     4507      387    -4120     
==========================================
- Hits        31867     2532   -29335     
+ Misses       4428      684    -3744     
+ Partials     1316       97    -1219

@masci
Copy link
Contributor

masci commented Dec 11, 2018

About the benchmarks:

  1. how much data are you querying for?
  2. do you have a rough idea how this would scale? Does collection time depend on the amount of data we fetch, or the amount of calls, how an integration like Hiper-V potentially collecting data for thousands of VMs would perform?

@ofek
Copy link
Contributor Author

ofek commented Dec 13, 2018

@masci I updated OP. That initial set up my machine was borked and results were way off. Also confirmed all tests pass still #2739

@ofek ofek requested a review from a team as a code owner December 17, 2018 22:49
@ofek ofek mentioned this pull request Dec 24, 2018
@@ -182,3 +138,51 @@ def check(self, instance):
except Exception as e:
# don't give up on all of the metrics because one failed
self.log.error("Failed to get data for %s %s: %s" % (inst_name, dd_name, str(e)))

def _make_counters(self, key, counter_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice factorization here!

@@ -167,8 +115,16 @@ def __init__(self, name, init_config, agentConfig, instances, counter_list):
def check(self, instance):
self.log.debug("PDHBaseCheck: check()")
key = hash_mutable(instance)
refresh_counters = is_affirmative(instance.get('refresh_counters', True))

if refresh_counters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we having this in the check method now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behind flag in case someone encounters perf issues


entry = [inst_name, dd_name, m, obj]
self.log.debug('{}: {}'.format(message, entry))
self._metrics[key].append(entry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this method usage at https://github.com/DataDog/integrations-core/pull/2654/files#diff-14769541de478cc114862b41f3cdcfe2R122 , isn't it a risk that the key won't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key will always exist

@ofek ofek merged commit 9893584 into master Dec 28, 2018
@ofek ofek deleted the ofek/pdh branch December 28, 2018 18:27
@ofek ofek changed the title Add PDH option to prevent caching of instance counters Prevent caching of PDH counter instances by default Dec 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants