-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Host Volumes #15489
Comments
Hi @akamensky! I've been working up a proposal for exactly this for a while now ("dynamic host volumes") but it hasn't quite made it over the line in terms of wrapping that design up and getting it implemented. But this is definitely on my radar! |
@tgross thank you for the response. Not implying any rush with this, just wondering what's the possible ETA on this feature landing in stable release (how long would it normally take from proposal until feature available)? We are evaluating Nomad as a replacement for homebrewed deployment/orchestration system for mostly legacy stack and this may be a show-stopper for us. |
For reasons of "we're a public company now and I'll get put in Comms Jail" 😀 I can't give an ETA on new features but it's almost certainly not going to get worked on until after 1.5.0 which will go out early in the new year. |
FYI, I changed the name on this feature to make it easier for us to find internally. We tend to refer to it as "Dynamic Host Volumes" so just updating the title to match that. |
I will ask this here first as not sure if this should be bundled in with this FR or raised as a separate ticket. If any of you @tgross or @mikenomitch could provide a feedback on this would appreciate it. I know with
Which then utilizes Dockers functionality to do the bind mount. However not everything is Docker. I feel that it may be good to provide similar configuration option for |
@akamensky yeah that'd be simple on its face, with a couple of caveats:
Probably worth splitting that idea out to another issue specific to the |
@tgross not going to ask for an ETA ;) but mind laying out how you want to implement this? I thought about this a bit already and I came up with the following crazy solution (I like it because it is self-contained so to say):
This would imo have quite a few upside over how host volumes are now and as an added benefit would show up automatically in the storage section etc… |
We definitely don't want to implement any kind of CSI-plugin-like interface; I'm going to be honest and say my opinion on CSI is that we implemented it reluctantly as a shared standard and aren't happy with the resulting interface and failure modes. Whatever we do for dynamic host volumes will be Nomad-native and wired into the scheduler directly. |
waiting for this feature |
This feature would be very useful for my deployment as well. |
Hi folks, adding a 👍 reaction to the top-level issue description is enough to make sure your enthusiasm gets recorded for the team. You can also see that it's currently on our "Later Release Shortlist" on the Public Roadmap. If you've got new use cases or design ideas, feel free to post them here though! |
I would also be excited to see this feature implemented. Additionally, I'd like to suggest a design idea: it would be great to have the ability to not only create volumes on the fly (which would be a massive accomplishment), but also to separate the logic of volume creation and binding an allocation to a volume. This concept could be similar to how Kubernetes implements Persistent Volumes (PV) and Persistent Volume Claims (PVC). |
That's how we implemented CSI support, where it makes sense (folks have asked to be able to merge them in the job spec anyways #11195 but I'm not sure it makes sense to have those operations for CSI because the characteristic times for creating CSI volumes are on the scale of minutes for some cloud providers, rather than milliseconds like the rest of the scheduler path). For dynamic host volumes we won't have the same kind of timescale problems so creating them on the fly with the job will be feasible. Fortunately if we can create them on the fly we can create them separately as well. I think the UX challenge will be how to surface placement of those volumes during creation. Unlike CSI (and k8s) dynamic host volumes are initially associated with a specific client node (because they'd have to be created by the client and not by some third-party cloud storage API). Whereas if the creation is tied to a job the volume would be created at the time of placement so the scheduler is just using all the existing scheduler logic to do so. For some additional context, here's a snippet from an internal design doc that I'm otherwise not ready to share yet about some of our underlying assumptions about the design:
I'm not 100% sold yet on the 2nd bullet point. And I'd like to figure out a way to get "except NFS" crammed into that 3rd bullet-point somehow, because it's a widely-supported standard and would cut out a ton of use cases where folks are stuck with CSI when its complexity is unwarranted. But that might be opening a whole other can of worms 😀 |
Hey @tgross, thanks for sharing! Just wanted to ensure that the "in addition to" part of this message gets considered 😄
Context from my experience: when our jobs require host volumes, scheduling is not dynamic at all. We know up front which hosts should be running these jobs, we create the volumes via config file on those hosts, and the job allocations stay with those hosts practically forever. A bit of manual intervention before scheduling these jobs is not a problem at all and might even be desired in some cases, so we are sure we are placing things where we wanted to. If it was possible to have something like |
Hi Tim, I am not really sold on the 2nd bullet point either. I'd really like to hear actual usecases for it. I doubt HA is a usecase for it because migration can simply take to long or the node might simply die in which case you cannot migrate any data at all anyways. I agree with @ivantopo that host volumes are most likely used for workloads with fixed allocations (like a cluster of elasticsearch servers for instance). I honestly don't know how much sense it would make to generate volumes from the jobspec automatically. From an operator perspective I want to be in control where data ends up. I do not want to allow users to fill my disks… I also don't want the allocations from an elasticsearch cluster (assume three allocations spread over five client nodes) to suddenly run on a new node with an automatically created empty volume (migration as in bulletpoint two is imo not an option). HA is provided here by running three allocations after all, I can and will deal with one of them dying. As for NFS, I am going out on a limb here and please don't read it as critique -- I know you dislike CSI (I do as well) but I don't think special casing NFS (or more accurately any network filesystem) makes much sense. How would nomad know that hostvolume X on node 1 and hostvolume X on node 2 would use the same backing storage? In CSI that is easy, it is just one volume. Honestly it feels like that all the complexity we hate about CSI would end up in host volumes if we started to "support" network filesystems there (even if a user is not using a network filesystem then they would probably pay the price for the increased code complexity that nomad has to support it, ie more bugs). Btw, CSI in nomad is quite stable nowadays (you fixed most ugly issues I think, so massive thanks for that) and running my CSI NFS plugin really is not much of a burden (This is not meant as self-promotion, but I would really hate for host volumes to have more complexity than needed, even if it is to "just" support NFS -- I am also not saying that my driver has no bugs, it probably has, but so far it works). That said, for safer operations I'd love to see #6554 fixed especially for CSI. Long story short, if it were possible to create nomad host volumes via the CLI and also set stuff like owner/group/mode it would be a massive improvement over what we have now. I bet it would also be what 99.42% of the userbase would love to see. If you want we can do a call and discuss this further? |
Regarding
I think this goes beyond the scope of this ticket but it is something that I miss as well. Not only in the context of CSI but generally (think consul intentions for the service mesh etc). This is imo something nomad-pack (or something else completely, nomad-pack still hasn't won me over) should provide and is bigger than simply volumes. |
This is great context! I think that's where we originally wanted to go with CSI and why we didn't have the create workflow in place -- the idea was always that you'd create via Terraform or whatever so that it's the responsibility of the cluster administrator rather than the job submitter. What you're saying here has a similar separation of duties, and that makes a lot nore sense for host volumes.
Yeah I don't think I disagree with most of what you're saying here. The only way I'd want to be able to support NFS is if we could treat it like an ordinary mount without special casing -- just supporting the right syntax for the |
That begs an interesting question :) I guess in the end it all boils down to how far you are willing to go, starting from the existing stanza:
and transforming this into something along the lines of
would immediately allow for block devices, normal bind mounts and nfs (and everything else). The main questions now become:
|
(sorry for google translate) I would like to add my own context. We use Nomad to run multiple instances of our application for automated testing. These launches are initiated by a daemon based on the merge request status. We have a zfs volume cloning daemon on a few nodes to get our databases up and running quickly. And it looks like this:
To sum it up: we don't care where the volume is. It is important for us that the volume be created and deleted dynamically in the host group. By inserting this into prestart, we are worried that poststop may not work and garbage collection will not be performed. Also, mixing different levels of responsibility looks dirty. The second use case is to mount a subdirectory of a volume declared via the host client configuration. We now mount the entire directory; create subdirectories and update application configuration in entrypoint. It would be great to delegate this to Nomad in volume stanza (mkdir, chmod, chown). |
This imo sounds like a job for a CSI plugin. |
I didn't find a working implementation. Making a daemon with two methods turned out to be easier. 🤷♂️ |
Ha, yeah I doubt you will find a prewritten CSI plugin for that. What I mainly wanted to say is that something like this is imo out of scope for host volumes. |
But I would like to register these volumes dynamically in Nomad, rather than inserting constraints and absolute path to the volume in Job file. 😄 |
(Edited to prevent confusion - see other comments lower down. This thread isn't really about the docker driver).
Disclaimer: Making use of the |
@heatzync that is a pretty clever work around. If you create an example that uses an |
The essence to have dymamic hoost volumes is to avoid docker in priviliged mode. This is big secirity gotcha . I don't understand the lifecycle approach. Nomad can out of the box create docker host volumes of you have config docker in privileged mode. |
I feel like this thread lost its course. The request was created specifically for "exec" driver with Nomad "host volume" functionality, not for Docker with Docker volumes. I guess it can be expanded to Docker with Nomad host volumes. Although I imagine for Docker it should be easier to use Docker volumes. |
Please accept the apology of a Nomad n00b for convoluting this thread @akamensky. I also now realised that @suikast42 tried to put me on the right path with:
I will edit the above comments and remove the |
@akamensky All drivers would benefit from a dynamic host volumes. I think @heatzync's contributions to this conversation are very relevant. |
Any Progress on this? |
If you look at the roadmap you will see that it is "scheduled" for the "1.9 & 1.10 (uncommitted)" milestone. Given that 1.7 was just release I'd guess that it won't be here for half a year at the minimum (probably even longer).
…On Fri, Dec 8, 2023, at 15:31, Süleyman Vurucu wrote:
Any Progress on this?
—
Reply to this email directly, view it on GitHub
<#15489 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAT5C4GPQ7467H5LKAQL73YIMQD7AVCNFSM6AAAAAASXNYHWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBXGI4DENBRGU>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thnaks .Where I can find the roadmap? |
It is a project on the hashicorp org: https://github.com/orgs/hashicorp/projects/202
Please note that I am not a hashicorp employee and only have the same outside view as you. So take my time guesses as just that -- they are guesses
…On Fri, Dec 8, 2023, at 23:17, Süleyman Vurucu wrote:
Thnaks .Where I can find the roadmap?
—
Reply to this email directly, view it on GitHub
<#15489 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAT5CZSD7EAUNXUBGWO4JDYIOGWRAVCNFSM6AAAAAASXNYHWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBXHEYTCMZTGU>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Just FYI folks, we've just kicked off this project and are planning on shipping it in Nomad 1.10.0. We're going through the initial design work and you can see that I've been landing some preliminary PRs into a feature branch to put together the skeleton of the feature. I'm going to attempt to publish our design document here once that's ready to share (with customer stories or other internal data filed-off, of course). |
@angrycub Do you think this will let us update our Nomad Coder template to use localhost volumes instead of relying on the local HOST CSI driver? |
Add several validation steps in the create/register RPCs for dynamic host volumes. We first check that submitted volumes are self-consistent (ex. max capacity is more than min capacity), then that any updates we've made are valid. And we validate against state: preventing claimed volumes from being updated and preventing placement requests for nodes that don't exist. Ref: #15489
Add several validation steps in the create/register RPCs for dynamic host volumes. We first check that submitted volumes are self-consistent (ex. max capacity is more than min capacity), then that any updates we've made are valid. And we validate against state: preventing claimed volumes from being updated and preventing placement requests for nodes that don't exist. Ref: #15489
Add several validation steps in the create/register RPCs for dynamic host volumes. We first check that submitted volumes are self-consistent (ex. max capacity is more than min capacity), then that any updates we've made are valid. And we validate against state: preventing claimed volumes from being updated and preventing placement requests for nodes that don't exist. Ref: #15489
Add several validation steps in the create/register RPCs for dynamic host volumes. We first check that submitted volumes are self-consistent (ex. max capacity is more than min capacity), then that any updates we've made are valid. And we validate against state: preventing claimed volumes from being updated and preventing placement requests for nodes that don't exist. Ref: #15489
Add several validation steps in the create/register RPCs for dynamic host volumes. We first check that submitted volumes are self-consistent (ex. max capacity is more than min capacity), then that any updates we've made are valid. And we validate against state: preventing claimed volumes from being updated and preventing placement requests for nodes that don't exist. Ref: #15489
Add several validation steps in the create/register RPCs for dynamic host volumes. We first check that submitted volumes are self-consistent (ex. max capacity is more than min capacity), then that any updates we've made are valid. And we validate against state: preventing claimed volumes from being updated and preventing placement requests for nodes that don't exist. Ref: #15489
Proposal
Currently to create
host
volume one needs to edit agent configuration to addvolume
stanza and restart the agent. This is impractical as Nomad itself may be provisioned using one of numerous tools (Ansible/Salt/etc) and restarting agent that may already have other tasks running is far from good idea.CSI volumes can be created/allocated on the fly. However CSI volumes are often networked storage (like NFS). Host volumes are extremely useful for stateful workloads that require high performance local storage (think local SSD array or NVMe for databases like rocksdb for example).
I think allowing to create host volumes on agent nodes on the fly using API calls (and perhaps with corresponding CLI commands) is sensible and very practical approach.
Use-cases
Attempted Solutions
The text was updated successfully, but these errors were encountered: