You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some types of task (both its in-memory state and its on-disk state) can be reconfigured or migrated to another node without it being shut down first. We should have a way to handle this gracefully.
We might need:
A task interface to reconfigure the task without restarting it (if this interface returns a failure, fallback to the stop-then-start logic)
A task interface to migrate the task's in-memory state to another node
handle one task running on both nodes
Some interface to migrate the volume to another node
handle one volume on both nodes
Use-cases
I'm investigating that if I can manage and monitor a bunch of VMs in Nomad with a custom task driver. These VMs might be stateless but I don't want them to be shutdown during a reschedule.
Currently Nomad only allows shutting down the task from the original node then starting it on the new node. The driver does not know on a high level that this task is being rescheduled rather than changed.
Writing a remote task driver on top of a current VM management solution (e.g. Proxmox VE) might be one possible way, but it is limited on my specific usage and does not scale well.
The text was updated successfully, but these errors were encountered:
Hi @Jamesits! Yeah, as you've noted a lot of this kind of thing would be specific to the task drivers so #2323 and #13785 are blocking for this. I'll keep this issue open to help tie everything together. We've been discussing this kind of thing internally a bit as something that would help Nomad replace VMWare deployments.
The major architectural hurdle here is that Nomad doesn't place tasks -- it places allocations which may have multiple tasks. And those tasks don't all need to have the same task driver! So right out of the gate we'd need to figure out how to migrate multiple tasks simultaneously. Ex. what happens if the tasks share state or even just ongoing network communication? And figure out what limitations we'd need to place on multi-task-driver allocs for this feature.
tgross
changed the title
Feature request: hot reconfiguration and live migration interface for a task driver
hot reconfiguration and live migration interface for task drivers
Jan 17, 2024
Proposal
Some types of task (both its in-memory state and its on-disk state) can be reconfigured or migrated to another node without it being shut down first. We should have a way to handle this gracefully.
We might need:
Use-cases
I'm investigating that if I can manage and monitor a bunch of VMs in Nomad with a custom task driver. These VMs might be stateless but I don't want them to be shutdown during a reschedule.
Other existing needs:
and we might need: #15489
Attempted Solutions
Currently Nomad only allows shutting down the task from the original node then starting it on the new node. The driver does not know on a high level that this task is being rescheduled rather than changed.
Writing a remote task driver on top of a current VM management solution (e.g. Proxmox VE) might be one possible way, but it is limited on my specific usage and does not scale well.
The text was updated successfully, but these errors were encountered: