Auto extend node subnet #155

giobart · 2023-12-08T13:24:50Z

Short

The node subnet is composed of 64 addresses. If a worker requires more addresses, we should perform another request to extend the subnetwork.

Proposal

At worker initialization time, the worker requires a net size of 64 as usual. Then, every time the address space is exhausted because we have more than 64 networked containers in a worker, we should extend it with a new request that proposes a second subnet to the worker.

One possible solution would
-> Request a new subnet if a subnet is exhausted inside env.generateAddress()
-> The new addresses can be stored inside env.addrCache

Ratio

Remove container limitations.

Impact

NetManager - Cluster manager(maybe)

Development time

1 week

Status

finding a solution

Checklist

Discussed
Documented
Implemented
Tested

The text was updated successfully, but these errors were encountered:

giobart · 2023-12-08T13:25:13Z

@smnzlnsk what do you think about it?

smnzlnsk · 2023-12-08T14:57:21Z

A couple points that popped up:

When would this be needed? When a cluster is completely out of options? Won't this cause a 'super' node if the services deployed are idle, but the scheduler decides to keep deploying on that one node because from a scheduling standpoint it seems fine? That one node would keep requesting address space, or is there an upper limit planned?
Are we planning on making the scheduler respect the available addresses of worker nodes?
Assuming we allow this and have a weak node. If that node keeps getting chosen by the scheduler and keeps deploying services, which suddenly all have a surge in traffic, causing the node to crash, won't this cause even more re-scheduling effort?

I think there are a lot of variables we need to respect, before going ahead with this. In general this seems like a good idea, though, iff the node will be able to withstand higher strain in the future (and the scheduler does not discriminate).

giobart · 2023-12-11T11:03:55Z

Ideally, I think that we should not be limited by addressing space but by actual resources. If a node has run out of resources, the scheduler will not (or should not) send new deployments regardless. If, instead, a node is capable of handling new workloads according to the SLA but has run out of addresses, it should request more, I guess.

giobart · 2023-12-11T16:27:29Z

@smnzlnsk What do you think of the solution in #156?

giobart added the enhancement New feature or request label Dec 8, 2023

giobart self-assigned this Dec 8, 2023

giobart linked a pull request Dec 11, 2023 that will close this issue

155 auto extend node subnet #156

Draft

giobart mentioned this issue Dec 13, 2023

Scheduling decisions ignore worker network exhaustion oakestra/oakestra#226

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto extend node subnet #155

Auto extend node subnet #155

giobart commented Dec 8, 2023

giobart commented Dec 8, 2023

smnzlnsk commented Dec 8, 2023

giobart commented Dec 11, 2023

giobart commented Dec 11, 2023

Auto extend node subnet #155

Auto extend node subnet #155

Comments

giobart commented Dec 8, 2023

Short

Proposal

Ratio

Impact

Development time

Status

Checklist

giobart commented Dec 8, 2023

smnzlnsk commented Dec 8, 2023

giobart commented Dec 11, 2023

giobart commented Dec 11, 2023