Skip to content
This repository has been archived by the owner on Feb 9, 2024. It is now read-only.

[7.0.x] Skip disk usage check during pre-flight checks #1758

Merged
merged 6 commits into from
Jun 24, 2020

Conversation

bernardjkim
Copy link
Contributor

@bernardjkim bernardjkim commented Jun 23, 2020

Description

This PR fixes a bug that was introduced during changes to the storage check due to satellite setting a default HighWatermark value to the storage checker. Storage checkers with unspecified or 0 HighWatermark value will skip the disk usage check.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Regression fix (non-breaking change which fixes a regression)

Linked tickets and other PRs

TODOs

  • Self-review the change
  • Perform manual testing
  • Address review feedback

Testing done

Verify successful install of single node cluster with mounted /proc volume

  • Create mount for /proc node profile in telekube manifest
nodeProfile
  - name: node
     ...
       volumes:
       - path: /proc
          targetPath: /hostproc
          createIfMissing: false
// Before changes install fails due to failed pre-flight check.
TASK [Install gravity application] *******************************************************************************************************************************************
task path: /home/bernard/go/src/github.com/gravitational/gravity/vagrant/ansible/install.yaml:8
fatal: [node-1]: FAILED! => {"changed": true, "cmd": "umask 0066\n cd /vagrant/installer\n  ./gravity install --debug --cloud-provider=generic --flavor=standalone --cluster=dev.test --advertise-addr=172.28.128.101 --token=token123\n ", "delta": "0:00:00.271563", "end": "2020-06-24 01:21:21.531486", "msg": "non-zero return code", "rc": 255, "start": "2020-06-24 01:21:21.259923", "stderr": "\u001b[31m[ERROR]: The following pre-flight checks failed:\n\t[×] disk capacity at /proc is 0 (failed to validate storage requirements)\n\n\u001b[0m", "stderr_lines": ["\u001b[31m[ERROR]: The following pre-flight checks failed:", "\t[×] disk capacity at /proc is 0 (failed to validate storage requirements)", "", "\u001b[0m"], "stdout": "\u001b[1mWed Jun 24 01:21:21 UTC\u001b[0m\tStarting installer", "stdout_lines": ["\u001b[1mWed Jun 24 01:21:21 UTC\u001b[0m\tStarting installer"]}
// After changes install is sucessful
TASK [Install gravity application] *******************************************************************************************************************************************
task path: /home/bernard/go/src/github.com/gravitational/gravity/vagrant/ansible/install.yaml:8
changed: [node-1] => {"changed": true, "cmd": "umask 0066\n cd /vagrant/installer\n  ./gravity install --debug --cloud-provider=generic --flavor=standalone --cluster=dev.test --advertise-addr=172.28.128.101 --token=token123\n ", "delta": "0:11:27.694958", "end": "2020-06-24 01:48:37.844384", "rc": 0, "start": "2020-06-24 01:37:10.149426", ...

@bernardjkim bernardjkim requested review from a team, r0mant and a-palchikov June 23, 2020 18:28
lib/schema/checks.go Outdated Show resolved Hide resolved
@a-palchikov
Copy link
Contributor

I'm not sure this is headed in the right direction.
The original premise is avoiding handling of virtual (or pseudo) file systems in the first place - not adding knobs to conditionally switch off testing. I don't see how skipping disk usage test will help.

I would look in the direction of programmatically detecting a filesystem type and excluding them from test explicitly.
One way this seems this could be achieved is via the stat syscall:

var stat unix.Stat_t
if err := unix.Stat(path, &stat); err != nil {
	return trace.ConvertSystemError(err)
}
if unix.Major(stat.Dev) == 0 && stat.Size == 0 {
	// exclude this filesystem from disk-specific tests
}

This is not 100% as I could not verify it from official documentation certain but close t
Also, this needs to be implemented taking OS environment into account - i.e. only providing a Linux implementation with empty stub for Darwin.

Kernel device documentation
Stat_t
SO discussion

@knisbet
Copy link
Contributor

knisbet commented Jun 24, 2020

I'm not sure this is headed in the right direction.
The original premise is avoiding handling of virtual (or pseudo) file systems in the first place - not adding knobs to conditionally switch off testing. I don't see how skipping disk usage test will help.

I would look in the direction of programmatically detecting a filesystem type and excluding them from test explicitly.
One way this seems this could be achieved is via the stat syscall:

var stat unix.Stat_t
if err := unix.Stat(path, &stat); err != nil {
	return trace.ConvertSystemError(err)
}
if unix.Major(stat.Dev) == 0 && stat.Size == 0 {
	// exclude this filesystem from disk-specific tests
}

This is not 100% as I could not verify it from official documentation certain but close t
Also, this needs to be implemented taking OS environment into account - i.e. only providing a Linux implementation with empty stub for Darwin.

Kernel device documentation
Stat_t
SO discussion

I might be missing something, but do we know why we're even trying to check disk capacity here? I assume it's not specified in the app.yaml? My expectation would be that we're only trying to test or bubble up errors for volumes that specifically indicate a capacity within the app.yaml, and wouldn't require the effort in trying to filter by various types of volumes, as presumably that's a more complicated issue than just detecting virtual filesystems such as /proc.

@r0mant
Copy link
Contributor

r0mant commented Jun 24, 2020

@knisbet @a-palchikov Yes, this is a simple regression that we try to check disk usage where we shouldn't (since it's not specified in the manifest but satellite sets the default and checks it anyway). I discussed it with @bernardjkim yesterday, we're gonna remove setting defaults in satellite.

Copy link
Contributor

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernardjkim Please fill out testing done before merging to show the scenario that was failing before and that now's passing.

The storage checker did not previously have a default HighWatermark
value set. These preflight checks did not specify a HighWatermark value
and were not intended to run the disk usage checks. Explicitly
specifying 0 HighWatermark will skip disk usage checks.
@bernardjkim bernardjkim merged commit 06ac3eb into version/7.0.x Jun 24, 2020
@bernardjkim bernardjkim deleted the bernard/7.0.x/disk-check branch June 24, 2020 23:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants