-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hoststats: add package for collecting host statistics including cpu memory and disk usage #17038
Conversation
2fdd9e4
to
418f04b
Compare
488a7e6
to
a9a5165
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is a good first attempt that we can iterate on in the future. I've left a couple of nits/change requests, and I think you might have missed a few steps from this doc.
Couple of open questions:
- Are we defaulting to host metrics being enabled or disabled?
- Can you upload some sort of sample of the metrics? As right now it's hard for me to test this locally.
- We should really add some sort of integration testing strategy. I know it's borderline impossible to assert these metrics, but we can at least attempt to run the collector on different platforms.
- We should probably add a document that outlines the metrics we are collecting and the format as well as reasoning.
"github.com/stretchr/testify/require" | ||
) | ||
|
||
func TestHostStats_CPU(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of a better approach, but I don't like the thought of this being a unit test. We're essentially spinning a CPU core to generate some load, and this took almost 13s to run on my local machine.
Maybe we need a whole new integration test suite for this kind of stuff?
Thanks for the review @loshz! I addressed your comments and answered your questions below.
Hmm I'm not sure I see a step I missed. There is no
Yes defaulting to enabled.
Sure heres what I've been testing with:
I refactored the test a bit to be more conservative. I'm hoping long term we can cover this in our e2e testing of the hcp telemetry components.
Any suggestions or prior art to help give me an idea of where and what level of detail something like this would be? |
👋 @nickethier, Can you please give more details on the reasoning behind adding those metrics to be collected by Consul versus using another mean of collecting the host data (an external exporter/collector)? The reason behind my question is that I think this add an extra concern around architecture compatibility. I see that the used library have a wide compatibility architecture range, but from Consul perspective we still need to verify that all the architectures that we support are working correctly and currently we don't have a great coverage for those different architectures. I'm also not sure how accurate those metrics will be in K8S or in containers in general 🤔 |
As I understand this library ( In cases where there isn't platform support I expect the metric to read as
I called this out in the docs. Perhaps there is room in the future to iterate on this and report any container limits in place but this is very platform specific and not something |
eb104c8
to
4307668
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for answering my questions @nickethier 🙏
LGTM!
…memory and disk usage
…default to false unless configured /w cloud integration
4307668
to
bf69a82
Compare
@loshz is OOO this week. changes and comments were reviewed by @dhiaayachi and @boxofrad (thanks!)
Description
This PR adds the ability for Consul to report host statistics through go-metrics. These metrics include cpu utilization by core and in aggregate, system memory usage and disk usage for the disk backing the configured data_dir.
Testing & Reproduction steps
Tests were added to assert cpu utilization calculation is performed accurately. Manual testing was performed on linux and darwin to ensure reported host stats values were accurate.
PR Checklist