Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volumes: return better error messages for unsupported task drivers #8030

Merged
merged 7 commits into from
May 21, 2020

Conversation

tgross
Copy link
Member

@tgross tgross commented May 20, 2020

Fixes #7774 (partially #6664 as well)

When an allocation runs for a task driver that can't support volume mounts, the mounting will fail in a way that can be hard to understand. With host volumes this usually means failing silently, whereas with CSI the operator gets inscrutable internals exposed in the nomad alloc status.

This changeset adds a MountConfig field to the task driver Capabilities response. We validate this when the csi_hook or volume_hook fires and return a user-friendly error.

Three notes:

  • We don't currently have a way to get driver capabilities up to the server, except through attributes. Validating this when the user initially submits the jobspec would be even better than what we're doing here (and could be useful for all our other capabilities), but that's out of scope for this PR.
  • The MountConfig enum starts with "supports all" in order to support community plugins in a backwards compatible way, rather than cutting them off from volume mounting unexpectedly.
  • I haven't included an update to the internals docs. I'm going to do that in a separate PR so I can include the missing networking fields as well without cluttering this PR.

@tgross tgross added this to the 0.11.3 milestone May 20, 2020
@tgross tgross force-pushed the csi_validate_task_driver branch from da53f64 to e92ed1e Compare May 20, 2020 19:29
@tgross tgross force-pushed the csi_validate_task_driver branch from 60d9d60 to e3af3a6 Compare May 20, 2020 19:40
@tgross tgross changed the title CSI: validate task driver for volume mount volumes: return better error messages for unsupported task drivers May 20, 2020
@tgross tgross marked this pull request as ready for review May 20, 2020 20:43
@tgross tgross requested review from langmartin and notnoop May 20, 2020 20:44
client/allocrunner/csi_hook.go Outdated Show resolved Hide resolved
@tgross tgross merged commit 8860b72 into master May 21, 2020
@tgross tgross deleted the csi_validate_task_driver branch May 21, 2020 13:18
tgross added a commit that referenced this pull request May 27, 2020
…8030)

When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
@tgross
Copy link
Member Author

tgross commented May 27, 2020

Cherry-picked to 0.11.3 branch as 4b0fe71

@shantanugadgil
Copy link
Contributor

hi, this error is still present in 0.12.0-beta1 with host volumes

job "host_volume_1" {
  datacenters = ["dc1"]

  type = "batch"

  group "mygroup" {
    volume "myworkspace" {
      type      = "host"
      source    = "workspace"
      read_only = true
    }

    task "mytask" {
      driver = "raw_exec"

      volume_mount {
        volume      = "myworkspace"
        destination = "/myworkspace"
      }

      template {
        data = <<EOH
#!/bin/bash

set -u

id -a

hostname

env | sort

sleep 30
EOH

        destination = "local/runme.bash"
      }

      config {
        command = "/bin/bash"
        args    = ["-x", "local/runme.bash"]
      }
    }
  }
}

@tgross
Copy link
Member Author

tgross commented Jun 24, 2020

hi, this error is still present in 0.12.0-beta1 with host volumes

Hi @shantanugadgil You should be getting an error because the raw_exec driver has MountConfigSupportNone. If you're not getting an error to that effect, can you open a new issue with the error message you're getting?

@shantanugadgil
Copy link
Contributor

@tgross the issue is that plan and run do not complain about it, and silently accept it.

The error is seen in the status command, but I was hoping that the plan/run would not let this proceed.

@tgross
Copy link
Member Author

tgross commented Jun 24, 2020

Yes, that's a known limitation. See the top of the PR:

We don't currently have a way to get driver capabilities up to the server, except through attributes. Validating this when the user initially submits the jobspec would be even better than what we're doing here (and could be useful for all our other capabilities), but that's out of scope for this PR.

But if you want to open a new issue with that as a feature request, feel free!

@github-actions
Copy link

github-actions bot commented Jan 2, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSI: show a useful error for on schedule / job validation for incompatible task drivers
3 participants