Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hot reconfiguration and live migration interface for task drivers #19752

Open
Jamesits opened this issue Jan 17, 2024 · 1 comment
Open

hot reconfiguration and live migration interface for task drivers #19752

Jamesits opened this issue Jan 17, 2024 · 1 comment

Comments

@Jamesits
Copy link

Jamesits commented Jan 17, 2024

Proposal

Some types of task (both its in-memory state and its on-disk state) can be reconfigured or migrated to another node without it being shut down first. We should have a way to handle this gracefully.

We might need:

  • A task interface to reconfigure the task without restarting it (if this interface returns a failure, fallback to the stop-then-start logic)
  • A task interface to migrate the task's in-memory state to another node
    • handle one task running on both nodes
  • Some interface to migrate the volume to another node
    • handle one volume on both nodes

Use-cases

I'm investigating that if I can manage and monitor a bunch of VMs in Nomad with a custom task driver. These VMs might be stateless but I don't want them to be shutdown during a reschedule.

Other existing needs:

and we might need: #15489

Attempted Solutions

Currently Nomad only allows shutting down the task from the original node then starting it on the new node. The driver does not know on a high level that this task is being rescheduled rather than changed.

Writing a remote task driver on top of a current VM management solution (e.g. Proxmox VE) might be one possible way, but it is limited on my specific usage and does not scale well.

@tgross
Copy link
Member

tgross commented Jan 17, 2024

Hi @Jamesits! Yeah, as you've noted a lot of this kind of thing would be specific to the task drivers so #2323 and #13785 are blocking for this. I'll keep this issue open to help tie everything together. We've been discussing this kind of thing internally a bit as something that would help Nomad replace VMWare deployments.

The major architectural hurdle here is that Nomad doesn't place tasks -- it places allocations which may have multiple tasks. And those tasks don't all need to have the same task driver! So right out of the gate we'd need to figure out how to migrate multiple tasks simultaneously. Ex. what happens if the tasks share state or even just ongoing network communication? And figure out what limitations we'd need to place on multi-task-driver allocs for this feature.

@tgross tgross changed the title Feature request: hot reconfiguration and live migration interface for a task driver hot reconfiguration and live migration interface for task drivers Jan 17, 2024
@tgross tgross moved this to Needs Roadmapping in Nomad - Community Issues Triage Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants