Add an intermediate Machine state for non-atomic Cloud Provider APIs #444

zuzzas · 2020-04-14T13:05:57Z

What would you like to be added:

An intermediate Machine state for non-atomic Cloud Provider APIs. After VM creation is complete, apply a second wave of changes to the machine. Consider Machine Ready only after all API calls are complete.

Why is this needed:

There are some API methods in some of the Cloud Provider's APIs that require performing a second (after the creation of a VM) API call to change some parameters. A perfect (and dear to my heart) example is setting the source-dest-check option on AWS instances.

If MCM is restarted (or somehow lost internal state), the second non-atomic call to the Cloud Provider's API would never be made.

The text was updated successfully, but these errors were encountered:

prashanth26 · 2020-08-16T13:39:19Z

Also, we should update the status on machine object right before VM creation request is made ot the provider.

prashanth26 · 2020-10-21T18:20:31Z

Hi @zuzzas ,

I thought a little deeper about the issue which you have raised. I see two separate requirements here, correct me if I understood it wrongly.

Having multiple cloud provider calls for a single machine creation so that operations are continued/retried on failures (MCM restarts). This is something that should be achievable using the OOT code currently where CreateMachine() call could be smart enough to decode the previous state using the lastKnownState - This field can be used by the providers to save partial state during VM creation/deletions (also at this. And also the state of the partial state of the VMs on providers can always be retrieved if VMs are tagged correctly with the initial VM operation call.

Although the above-mentioned logic is achievable with the OOT IMO, however one small caveat I observe is that the OOT currently would delete such partially created machines bcoz of the machineSet controller that deletes failed machines. However, the fix I see for this would be to keep the machine in a Pending state when machine creation is partially created here. We can probably also switch between Pending / Failed / AnyOtherPhase based on the type of error returned by the provider CreateMachine() code here. I am currently working on the AWS OOT provider and will try have an initial release in the coming week. We can probably test it to verify your use-case to add your partially created machine logic with multiple calls here (with the source-dest-check option).

However, all this transforms into the larger problem of in-place updates which is yet to be solved at the MCM and probably needs more thought there. The way I envision to achieve this in the future would be to use machineClass hashes on the VM as labels to match the current state and use a getMachineStatus() to return this label hash and compare it with the machineClass hash and update VMs it when this doesn't match. However, this is just a rough thought which needs more discussion.

If your requirement is only (1), that is VM creation with multiple rounds of CreateMachine() calls, where the createMachine() would be smart enough to decode the current operations required to perform (E.g. If VM is present then only make an update call to set the source-dest-check option) then this is something I can test in the coming days and update you on this. However, if it's about trying to update VMs in place for any generic case, I think it would be a much larger topic and discussion.

Also, these are just some thoughts I have tried to pen down, please correct me if I am wrong here or something can be improved?

zuzzas · 2020-10-22T07:22:09Z

Hi @prashanth26,

Thanks for a thorough response. Yeah, the first point seems feasible and is exactly what I require.

The reason for #538 is that I still use in-tree providers. It was an attempt to create a simple interim solution, which simply removes the Machine and restarts the creation flow. That helps us with not adopting half-created Machines, which were created as a VM, but failed to add Security Groups to its interfaces.

prashanth26 · 2020-10-22T17:04:11Z

That helps us with not adopting half-created Machines, which were created as a VM, but failed to add Security Groups to its interfaces

But wouldn't this case lead to orphan VMs in your case? Anyways you can identify VMs using the tags right. IMO, you can capture this case in the driver Create() call and either fix the partially/adopt created VM (or) probably try to perform cleanup on calling the Create() call the second time? And return an error to declare the machine failed. Wouldn't this be more meaningful rather than creating this additional phase? Wouldn't this approach work? Because adding this phase seems like a hack to me.

cc: @hardikdr @ggaurav10 @amshuman-kr WDYT?

zuzzas · 2020-10-23T07:19:32Z

IMO, you can capture this case in the driver Create() call

This happens after adoption. If a VM was created (without setting up other important stuff, like SGs in OpenStack), the GetVMs() call will return the VM and it will get adopted.

amshuman-kr · 2020-10-27T07:02:38Z

@zuzzas We are in the middle of moving from the in-tree providers to the out-of-tree providers. The existing in-tree providers are in different stages of being moved to the out-of-tree approach. As mentioned above by @prashanth26, the out-of-tree approach proposes to use the lastKnownState field to be used to maintain state about the state of reconciliation for the Machine (which covers provisioning, update and de-provisioning) to aid in decisions during the next reconciliation.

Clearly, something like this doesn't exist in the in-tree approach. But considering that the in-tree approach is being phased out, we are a bit reluctant to introduce new functionality in the in-tree approach. Would it work for you to do this in the out-of-tree approach? Also, do you see any problems in using the lastKnownState field in the out-of-tree approach that doesn't address some aspect of the problem?

zuzzas · 2020-10-27T11:19:51Z

Nah, everything is clear. I've been unsure about the current policy of changing stuff in the in-tree controller. Thanks, @amshuman-kr.

Also, do you see any problems in using the lastKnownState field in the out-of-tree approach that doesn't address some aspect of the problem?

It's perfect.

prashanth26 · 2021-07-21T05:21:47Z

This has been taken care with this field.
/close in favour of it.

zuzzas added the kind/enhancement Enhancement, improvement, extension label Apr 14, 2020

zuzzas mentioned this issue Apr 14, 2020

Support multiple network interfaces in the AWS driver #441

Merged

hardikdr mentioned this issue Jun 1, 2020

Update Status on machine object right before machine creation #466

Closed

gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jun 14, 2020

prashanth26 added exp/intermediate Issue that requires some project experience and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Aug 16, 2020

hardikdr added kind/design priority/backlog Issues which needs implementation labels Aug 28, 2020

hardikdr mentioned this issue Aug 28, 2020

Make src/dst checks configurable on awsmachineclass #504

Closed

zuzzas mentioned this issue Oct 21, 2020

Added a new Creating phase #538

Closed

gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Apr 26, 2021

gardener-robot closed this as completed Jul 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an intermediate Machine state for non-atomic Cloud Provider APIs #444

Add an intermediate Machine state for non-atomic Cloud Provider APIs #444

zuzzas commented Apr 14, 2020

prashanth26 commented Aug 16, 2020

prashanth26 commented Oct 21, 2020 •

edited

Loading

zuzzas commented Oct 22, 2020

prashanth26 commented Oct 22, 2020 •

edited

Loading

zuzzas commented Oct 23, 2020

amshuman-kr commented Oct 27, 2020

zuzzas commented Oct 27, 2020 •

edited

Loading

prashanth26 commented Jul 21, 2021

Add an intermediate Machine state for non-atomic Cloud Provider APIs #444

Add an intermediate Machine state for non-atomic Cloud Provider APIs #444

Comments

zuzzas commented Apr 14, 2020

prashanth26 commented Aug 16, 2020

prashanth26 commented Oct 21, 2020 • edited Loading

zuzzas commented Oct 22, 2020

prashanth26 commented Oct 22, 2020 • edited Loading

zuzzas commented Oct 23, 2020

amshuman-kr commented Oct 27, 2020

zuzzas commented Oct 27, 2020 • edited Loading

prashanth26 commented Jul 21, 2021

prashanth26 commented Oct 21, 2020 •

edited

Loading

prashanth26 commented Oct 22, 2020 •

edited

Loading

zuzzas commented Oct 27, 2020 •

edited

Loading