Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Update disk check #224

Merged
merged 2 commits into from
Jun 22, 2020
Merged

Update disk check #224

merged 2 commits into from
Jun 22, 2020

Conversation

bernardjkim
Copy link
Contributor

@bernardjkim bernardjkim commented Jun 5, 2020

Description

This PR updates the disk check and addresses items 1-3 of gravitational/gravity#1662. The disk/storage check will now report a warning or a critical probe at different thresholds. The default is set to 80% disk usage for a warning probe and 90% disk usage for a critical probe.

Linked tickets and PRs

Testing done

Testing done on version 5.5.47. fallocate is a quick way to alloate disk space for a file.

Verify warning when disk usage > 80%

[vagrant@node-1 gravity]$ df
Filesystem     1K-blocks     Used Available Use% Mounted on
[...]
/dev/sdb1       25670884 20356768   3987064  84% /var/lib/gravity

[vagrant@node-1 gravity]$ sudo gravity status
[...]
Cluster nodes:
    Masters:
        * node-1 (172.28.128.101, node)
            Status:             healthy
            [!]                 disk utilization on /var/lib/gravity exceeds 80 percent (4.1GB is available out of 26GB), see https://gravitational.com/telekube/docs/cluster/#garbage-collection
            Remote access:      online
[...]

Verify critical when disk usage > 90%

[vagrant@node-1 gravity]$ df
Filesystem     1K-blocks     Used Available Use% Mounted on
[...]
/dev/sdb1       25670884 22229512   2114320  92% /var/lib/gravity

[vagrant@node-1 gravity]$ sudo gravity status
[...]
Cluster nodes:
    Masters:
        * node-1 (172.28.128.101, node)
            Status:             degraded
            [×]                 disk utilization on /var/lib/gravity exceeds 90 percent (2.2GB is available out of 26GB), see https://gravitational.com/telekube/docs/cluster/#garbage-collection
            Remote access:      online
[...]

}

// DiskSpaceCheckerID is the checker that checks disk space utilization
const DiskSpaceCheckerID = "disk-space"

// DefaultCriticalWatermark is the default critical disk usage percentage threshold.
const DefaultCriticalWatermark = 90
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this constants alongside the other one (the low watermark), I think it's somewhere in lib/constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A watermark constant was not previously defined in satellite. The constant is defined in planet and passed to the storage checker.

TotalBytes: totalBytes,
AvailableBytes: availableBytes,
WatermarkCritical: c.HighWatermark,
WatermarkWarning: c.HighWatermark - 10, // Set warning watermark 10% below the critical watermark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just have 2 separate constants TBH for soft/hard limit. Is there any particular reason you wanted to "tie" them one to another this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the watermark to be configurable/overrideable in emergency situations. If the values are separately configurable, we might run into incorrect behavior if the critical watermark is set to a lower value than the warning watermark. Having them tied together will make sure we don't run into that situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario can be just checked during checker initialization, right? In Check(), as we usually do. And just return an error if that's the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that works. I guess it'd be nicer to have separate control over the two threshold values.

@bernardjkim bernardjkim marked this pull request as ready for review June 15, 2020 17:02
@bernardjkim bernardjkim requested review from a team, r0mant and knisbet June 15, 2020 17:02
@bernardjkim bernardjkim changed the title [WIP] Update disk check Update disk check Jun 15, 2020
@bernardjkim bernardjkim merged commit 3e27b8e into master Jun 22, 2020
@bernardjkim bernardjkim deleted the bernard/master/disk-check branch June 22, 2020 20:47
@a-palchikov
Copy link
Contributor

This should be also taken into account: gravitational/gravity#1748

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants