-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: Task unable to run with csi_hook error "Device path not provided" #10432
Comments
Hi @jrlonan-gt! The relevant error seems to be this bit:
Can you provide the volume specification you're using here? |
AWS EBS gp2 volume |
FYI, i applied changes to the config and updated the Nomad jobspec for my other job that has identical config (it was working fine), and encountered the same issue as above. Were there any changes to the plugin within the last 2 weeks or so? |
I have had this same problem on Following the mysql examples nomad https://learn.hashicorp.com/tutorials/nomad/stateful-workloads-csi-volumes?in=nomad/stateful-workloads with no luck still This link shows where we're hitting the error, I have not debugged the CSI plugin yet https://discuss.hashicorp.com/t/ebs-csi-driver-csi-hook-failed-device-path-not-provided/22845/6 |
Checking the logs, my ebs-controller and ebs-nodes allocations were in a bad state, as at one point the containers were restarted but there was no I keep seeing these error logs |
@jrlonan-gt same thing. I can restart the controller and nodes and possibly get ONE of my jobs to work when mounting a EBS volume. However, this doesnt always work. I have the same error logs that you mention for all my nodes. csi plugin 0.10.1 What changed recently that caused this? controller messages
node messages
and example of a volume hcl
|
@jessequinn For the other job that still failed after I restarted controller and all the node plugins, it returned a different error message (which i didn't capture, unfortunately, but implies the volume is not attached to the correct EC2 instance where the job runs), so I had to manually detach the volume from the Nomad client via AWS console and re-create the Nomad job, and it successfully re-attach the EBS volume to the allocation now. Still would like to know the root cause, and hoping for a more stable release 🙏 |
@jrlonan-gt are you suggesting that the issue comes from the fact that the EBS volumes have not properly been removed from a previous attachment? That could make sense. let me get our cloud team to confirm that the EBSs arent still attached in some manner. However, it is weird that would be the case considering the volumes in nomad are "schedulable". |
@jrlonan-gt btw, i cannot get any EBS volume to work now, even with restarting nodes and controllers, WHEN i have more than 1 job using a volume (not the same volumes). |
like @durango i tested 0.9.0 to 0.10.1. all the same issues. we tested against nomad 1.0.0 and 1.0.4 same thing. |
in our situation we rebooted the machine and nomad/csi_plugin lost those connections; however, the EBSs were still attached. Detaching them as suggested by @jrlonan-gt worked. |
I have rotated machines through AWS' ASG as well as manually attaching and detaching the EBS drive from each server and client throughout my cluster. Still same issue. This is also a new volume registered under a unique/different ID and external ID. I have had this same issue since |
I had to restart controllers and nodes, remove volumes and jobs from nomad, and do |
I had to do the following to get this to work today:
If I used the previous plugin ID or CSI directory, I would run into the same issue (despite detaching/deregistering/purging jobs/etc). Hope this helps others down the road |
Following up on this unfortunately-neglected issue with a few notes as I start on related work:
I'm going to close this one out but if we have similar issue come back up post-1.2.5 please feel free to comment or reopen with more info. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.0.2 (4c1d4fc)
Operating system and Environment details
Issue
Stateful job fails with error " failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = InvalidArgument desc = Device path not provided". Nomad cluster is running in AWS using EBS volumes and CSI plugin v0.9.0
Reproduction steps
Re-run jobspec with the same jobspec several times.
Expected Result
Task running
Actual Result
Task failed to run with the following error
Time Type Description
2021-04-23T10:34:52+08:00 Setup Failure failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = InvalidArgument desc = Device path not provided
Job file (if appropriate)
Nomad Server logs (if appropriate)
N/A
Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: