-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make NumFDs not allocate excessively, implement throttling, and disable Prometheus process collector #1633
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1633 +/- ##
========================================
Coverage ? 72.1%
========================================
Files ? 958
Lines ? 79587
Branches ? 0
========================================
Hits ? 57405
Misses ? 18445
Partials ? 3737
Continue to review full report at Codecov.
|
It seems that we report num file descriptors during instrumentation and emit it from dbnode as application level stats. I am wondering if we need to emit FD count/stat from application level, FD stat per process can be queried from os. |
0b091b8
to
c9a90c2
Compare
@Haijuncao Yeah its really helpful since the instrument library gets loaded into a lot of our services. I.E we don't need any other software running on the host to get this information which can be really important for debugging internally and in O.S.S It also makes it really easy to know which process it is. |
|
||
// NumFDsWithDefaultBatchSleep returns the number of file descriptors for a given process | ||
// and is not available on non-linux systems. | ||
func NumFDsWithDefaultBatchSleep(pid int) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you also need NumFDsReference
(or whatever that gets renamed to) from the non-linux source file? Otherwise it won't be able to build on non-linux since it will be missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ended up not exporting it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
Recent changes to M3DB mean that it uses many more F.Ds than before. This can result in counting the number of F.Ds allocating excessively and wasting a ton of CPU resources.
This P.R prevents counting the F.Ds from allocating excessively and implements self-throttling so that even though counting F.Ds takes longer, it will not cause spikes in CPU usage.
NumFDs()
not allocate excessivelyNumFDs()
throttle itselfAlso verified with
start_m3
that metrics continue to be emitted