-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding optional service_check_rw parameter to check for read-only filesystem #2086
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bberezov, thanks a lot for your contribution. Looks like a nice feature to add to the check.
I just have a few comments. Would you mind addressing them ?
Also, could you please document the new service check in a service_check.json
file (see docs for details), as well as in README.md
under the Service Checks
section (see here for an example) ?
disk/CHANGELOG.md
Outdated
@@ -1,5 +1,9 @@ | |||
# CHANGELOG - disk | |||
|
|||
## 1.3.0 / 2018-08-21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to update the changelog manually, this will now automatically be taken care of at release time
# | ||
# The (optional) service_check_rw parameter notifies when one of the partitions is in | ||
# read-only mode | ||
# service_check_rw: no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put false
for the default value here as per #2078 ?
disk/datadog_checks/disk/disk.py
Outdated
@@ -127,6 +129,16 @@ def collect_metrics_psutil(self): | |||
self.gauge(metric_name, metric_value, | |||
tags=tags, device_name=device_name) | |||
|
|||
# Add in a disk read write or read only check | |||
if self.instances[0].get('service_check_rw', False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should use self._service_check_rw
that you set here https://github.com/DataDog/integrations-core/pull/2086/files#diff-4d0d0ed796595d875c2f4710bb59b678R63
disk/tests/test_check.py
Outdated
c = Disk('disk', None, {}, instances) | ||
c.check(instances[0]) | ||
|
||
aggregator.assert_service_check('disk.read_write', status=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the constant Disk.CRITICAL
for the status.
disk/datadog_checks/disk/disk.py
Outdated
# Add in a disk read write or read only check | ||
if self.instances[0].get('service_check_rw', False): | ||
rwro = list(set(['rw', 'ro']) & set(part.opts.split(','))) | ||
if len(rwro) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we send a service check with AgentCheck.UNKNOWN
status when neither rw
nor ro
is there ?
disk/datadog_checks/disk/disk.py
Outdated
self.service_check( | ||
'disk.read_write', | ||
AgentCheck.OK if rwro[0] == 'rw' else AgentCheck.CRITICAL, | ||
tags=tags+['device_name:%s' % (device_name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag key here should be device
.
Hi @zippolyte I've updated PR according to your comments, let me know if anything else has to be modified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @bberezov !
Only one final detail and this will be 👌
disk/service_checks.json
Outdated
@@ -0,0 +1,11 @@ | |||
[ | |||
{ | |||
"agent_version": "5.22.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 6.5.0
since it will be released in the next version of the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry didn't know what that version was for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks
What does this PR do?
This PR adds read-write/read-only service check for disk.py which notifies if any of the disks are in read-only state.
Motivation
We need this functionality in our production environment in addition to standard disk metrics in order to catch any of the disks falling back to read-only mode.
Review checklist
no-changelog
label attachedAdditional Notes
The test for this change passes our local testing environment replicated from TravisCI configuration