Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make NumFDs not allocate excessively, implement throttling, and disable Prometheus process collector #1633

Merged
merged 13 commits into from
May 16, 2019

Conversation

richardartoul
Copy link
Contributor

@richardartoul richardartoul commented May 14, 2019

What this PR does / why we need it:
Recent changes to M3DB mean that it uses many more F.Ds than before. This can result in counting the number of F.Ds allocating excessively and wasting a ton of CPU resources.

This P.R prevents counting the F.Ds from allocating excessively and implements self-throttling so that even though counting F.Ds takes longer, it will not cause spikes in CPU usage.

  • Make NumFDs() not allocate excessively
  • Make NumFDs() throttle itself
  • Turn off FD counting in the Prometheus reporter

Screen Shot 2019-05-13 at 4 26 59 PM

Also verified with start_m3 that metrics continue to be emitted

Screen Shot 2019-05-15 at 5 45 09 PM

@richardartoul richardartoul changed the title [WIP] - Make NumFDs not allocate excessively and throttle itself [WIP] - Make NumFDs not allocate excessively, implement throttling, and disable Prometheus process collector May 14, 2019
@richardartoul richardartoul marked this pull request as ready for review May 14, 2019 20:00
@richardartoul richardartoul changed the title [WIP] - Make NumFDs not allocate excessively, implement throttling, and disable Prometheus process collector Make NumFDs not allocate excessively, implement throttling, and disable Prometheus process collector May 14, 2019
@codecov
Copy link

codecov bot commented May 14, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@f0505d6). Click here to learn what that means.
The diff coverage is 58.8%.

Impacted file tree graph

@@           Coverage Diff            @@
##             master   #1633   +/-   ##
========================================
  Coverage          ?   72.1%           
========================================
  Files             ?     958           
  Lines             ?   79587           
  Branches          ?       0           
========================================
  Hits              ?   57405           
  Misses            ?   18445           
  Partials          ?    3737
Flag Coverage Δ
#aggregator 69.3% <ø> (?)
#cluster 67.6% <ø> (?)
#collector 48% <ø> (?)
#dbnode 70.4% <ø> (?)
#m3em 67.9% <ø> (?)
#m3ninx 71% <ø> (?)
#m3nsch 51.1% <ø> (?)
#metrics 100% <ø> (?)
#msg 74.3% <ø> (?)
#query 49.7% <ø> (?)
#x 82.5% <58.8%> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0505d6...5d1ae0f. Read the comment docs.

@Haijuncao
Copy link
Contributor

@richardartoul

It seems that we report num file descriptors during instrumentation and emit it from dbnode as application level stats.

I am wondering if we need to emit FD count/stat from application level, FD stat per process can be queried from os.

@richardartoul
Copy link
Contributor Author

@Haijuncao Yeah its really helpful since the instrument library gets loaded into a lot of our services. I.E we don't need any other software running on the host to get this information which can be really important for debugging internally and in O.S.S

It also makes it really easy to know which process it is.


// NumFDsWithDefaultBatchSleep returns the number of file descriptors for a given process
// and is not available on non-linux systems.
func NumFDsWithDefaultBatchSleep(pid int) (int, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you also need NumFDsReference (or whatever that gets renamed to) from the non-linux source file? Otherwise it won't be able to build on non-linux since it will be missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ended up not exporting it

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants