Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optional service_check_rw parameter to check for read-only filesystem #2086

Merged
merged 3 commits into from
Aug 24, 2018

Conversation

bberezov
Copy link
Contributor

What does this PR do?

This PR adds read-write/read-only service check for disk.py which notifies if any of the disks are in read-only state.

Motivation

We need this functionality in our production environment in addition to standard disk metrics in order to catch any of the disks falling back to read-only mode.

Review checklist

  • PR has a meaningful title or PR has the no-changelog label attached
  • Feature or bugfix has tests
  • Git history is clean
  • If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo

Additional Notes

The test for this change passes our local testing environment replicated from TravisCI configuration

@bberezov bberezov requested review from a team as code owners August 21, 2018 12:43
Copy link
Contributor

@zippolyte zippolyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bberezov, thanks a lot for your contribution. Looks like a nice feature to add to the check.
I just have a few comments. Would you mind addressing them ?
Also, could you please document the new service check in a service_check.json file (see docs for details), as well as in README.md under the Service Checks section (see here for an example) ?

@@ -1,5 +1,9 @@
# CHANGELOG - disk

## 1.3.0 / 2018-08-21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to update the changelog manually, this will now automatically be taken care of at release time

#
# The (optional) service_check_rw parameter notifies when one of the partitions is in
# read-only mode
# service_check_rw: no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put false for the default value here as per #2078 ?

@@ -127,6 +129,16 @@ def collect_metrics_psutil(self):
self.gauge(metric_name, metric_value,
tags=tags, device_name=device_name)

# Add in a disk read write or read only check
if self.instances[0].get('service_check_rw', False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c = Disk('disk', None, {}, instances)
c.check(instances[0])

aggregator.assert_service_check('disk.read_write', status=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the constant Disk.CRITICAL for the status.

# Add in a disk read write or read only check
if self.instances[0].get('service_check_rw', False):
rwro = list(set(['rw', 'ro']) & set(part.opts.split(',')))
if len(rwro) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we send a service check with AgentCheck.UNKNOWN status when neither rw nor ro is there ?

self.service_check(
'disk.read_write',
AgentCheck.OK if rwro[0] == 'rw' else AgentCheck.CRITICAL,
tags=tags+['device_name:%s' % (device_name)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag key here should be device.

@bberezov
Copy link
Contributor Author

Hi @zippolyte I've updated PR according to your comments, let me know if anything else has to be modified.

Copy link
Contributor

@zippolyte zippolyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @bberezov !
Only one final detail and this will be 👌

@@ -0,0 +1,11 @@
[
{
"agent_version": "5.22.3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 6.5.0 since it will be released in the next version of the agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry didn't know what that version was for

Copy link
Contributor

@zippolyte zippolyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks

@zippolyte zippolyte merged commit a0a6434 into DataDog:master Aug 24, 2018
shraykay pushed a commit to shraykay/integrations-core that referenced this pull request Aug 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants