Add support for running the chroot builder from a VM scale set #184

szamfirov · 2022-02-11T09:50:45Z

Description

With the current implementation it is impossible to build VM images from a VM scale set machine using the chroot builder. The following PR adds support for VM scale set virtual machines by making use of the VirtualMachineScaleSetVMsClient. The original request for this functionality can be found here: hashicorp/packer#9348, which is identical to our own use case.

Changes in detail

The following changes have been made:

Adding the VirtualMachineScaleSetVMsClient
Adding helper functions to: get VM details, get VM disks, set VM disks using the new client
The existing logic will fall back to the new implementation in case of errors - it will first try to get details as if running from a standalone VM and if any of these calls fail, it will use the VirtualMachineScaleSetVMsClient to retry and continue with the rest of the logic.
Fix unmounting of filesystem on success - multiple filesystem mountpoints are initially mounted under the same location which was causing issues with the final cleanup step. The unmount is now done recursively which solves this issue (see -R, --recursive from the official manual pages). Recursion for each directory will stop if any unmount operation in the chain fails for any reason (and thus preserving the same behavior).

Risks associated with the changes

No risks - there is no negative effect on the existing implementation. We are already using the changes in anger and we haven't discovered any issues.

Related open issue(s):

Closes #9348

szamfirov · 2022-02-11T09:58:14Z

Just to mention that the check-lint failures have nothing to do with the changes in this PR.

nywilken · 2022-02-11T19:26:41Z

builder/azure/chroot/diskattacher.go

+		currentDisks, err = da.getScaleSetDisks(ctx)
+		if err != nil {
+			return err
+		}


Is there a way to identify the exact error that can be ignored from getDisk?

With this implementation we will lose the original error from calling da.getDisks(). It would be great to have both errors in the case that both function calls fail.

I should mention that the Packer SDK does have a multierror type that you could use. I think something like the code below would work but I have not tested so it might need a few changes.

var errs *packersdk.MultiError currentDisks, err := da.getDisks(ctx) if err != nil { errs = packersdk.MultiErrorAppend(errs, err) } if errs != nil && len(errs.Errors) > 0 { currentDisks, err = da.getScaleSetDisks(ctx) if err != nil { errs = packersdk.MultiErrorAppend(errs, err) return errs } }

Apologies for the lack of activity but I was quite stretched this week.

I agree that we should bubble the error up somehow for the user to be aware but I'm struggling to find a proper solution. With the current implementation, any error (or MultiError for that matter) which is returned from DetachDisk, AttachDisk and the likes will result in a multistep.ActionHalt in the main "step" which will essentially stop the build process. So this might not be the perfect solution.

Is there a decent way to surface such message as a "warning"? I'm thinking of something much simpler which is just to notify the user that the initial call failed and display what the error was before the falling back to the VMSS logic.

Let me share an example for a bit better understanding. Something along the lines of:

currentDisks, err := da.getDisks(ctx) if err != nil { log.Printf("DetachDisk.getDisks: error: %+v\nFalling back to the VM scale set implementation...", err) // We can even make this explicit in the UI ui.Say("Fetching VM instance disks failed. Enable debugging for more details. Falling back to the VM scale set implementation...") currentDisks, err = da.getScaleSetDisks(ctx) if err != nil { return err } }

@nywilken, just a friendly nudge. Please let me know what you think on the above ☝️

nywilken

Overall this looks good. All the acceptance test pass but I left a comment around losing potential error information. Thanks for making the fix. Please let us know if you have any questions on the review.

szamfirov · 2022-03-09T14:31:03Z

@nywilken, just a friendly nudge. Please let me know what you think on the above ☝️

nywilken · 2022-03-09T15:31:44Z

@nywilken, just a friendly nudge. Please let me know what you think on the above ☝️

Hi @szamfirov apologies, I've been a bit busy myself these past two weeks. So I haven't had much time to look into this further. That said I will provide a little context here to maybe unblock otherwise I will have more time next week.

With the current implementation, any error (or MultiError for that matter) which is returned from DetachDisk, AttachDisk and the likes will result in a multistep.ActionHalt in the main "step" which will essentially stop the build process. So this might not be the perfect solution.

Without your change, does the build process fail if there is an error?

Without looking into the callers I don't know the immediate answer. So I will rely on your response.

If the behavior is to fail are you suggesting that we change that behavior and warn the user through the UI and logs instead of stopping the build? Or are you suggesting that we don't return immediately after the first getDisk error, warn the user that the first call failed, and retry the getDisk call but for VMSS?

Again, quickly looking at the code it seems to be that we need to return an error and not proceed if any of the calls to getDisks fail. Is that a correct observation?

If so I think warning the user that one or more calls failed is helpful but ultimately we should not continue the build if a valid response from getDisk is needed. Again saying this with out dividing into the code. Please let me know if I am mistaken.

Let me share an example for a bit better understanding. Something along the lines of:

currentDisks, err := da.getDisks(ctx)
if err != nil {
  log.Printf("DetachDisk.getDisks: error: %+v\nFalling back to the VM scale set implementation...", err)
  // We can even make this explicit in the UI
  ui.Say("Fetching VM instance disks failed. Enable debugging for more details. Falling back to the VM scale set implementation...")
  currentDisks, err = da.getScaleSetDisks(ctx)
  if err != nil {
    return err
  }
}

Warning the user via the UI is a good option. It will require the UI being added to the step structure or pulling it from state if available. That said, I find the wording "falling back to the VM scale set" confusing as the user may not be working with scale sets.

Maybe having something a bit more like

log.Printf("an attempt to fetch vm instance disks failed: %v\n", err)
log.Printf("checking to see if instance disk is part of a vm scale set\n")
ui.Say("Fetching VM instance disks failed. Checking to see if instance disk is part of a VM scale set before giving up")

szamfirov · 2022-03-11T10:55:59Z

Hi @nywilken, thank you for the detailed response.

Or are you suggesting that we don't return immediately after the first getDisk error, warn the user that the first call failed, and retry the getDisk call but for VMSS?

That's exactly what I'm suggesting. Otherwise it would be difficult to determine when will be the right time to proceed with the fallback to VMSS.

I've updated the PR and have added more detailed logging (both UI and debug logs). Please have a look when you have the time.

hashicorp-cla · 2022-03-12T18:25:24Z

All committers have signed the CLA.

szamfirov · 2022-03-25T16:35:56Z

Hi @nywilken, a gentle nudge. Let me know if I can do anything more in order to get this PR closer to being merged.

nywilken · 2022-03-31T13:13:58Z

@szamfirov apologies for the delay. I've been away from the machines for the past few weeks (vacation and things). I'll get to this PR next week. Thanks for updating the PR and for keeping a pulse on the changes. Excited to help get these changes in.

jkurek1 · 2022-04-11T16:46:03Z

Hi guys. Any news?

nywilken

I find the updated approach to work. I will pull down the changes and give the PR a run. In the meantime, I left a few suggestions for how to pass in a Ui for each of the Attacher types.

nywilken · 2022-04-15T18:12:48Z

builder/azure/chroot/diskattacher.go

 )

 type DiskAttacher interface {
-	AttachDisk(ctx context.Context, disk string) (lun int32, err error)
+	AttachDisk(ctx context.Context, state multistep.StateBag, disk string) (lun int32, err error)


As opposed to passing in the entire statebag to get to the UI. I would recommend adding a field to each of these types for setting the UI. This way you can access the UI simply by calling da.ui.(....).

For example given the DiskAttacher type you could add the field

type diskAttacher struct { azcli client.AzureClientSet vm *client.ComputeInfo // store info about this VM so that we don't have to ask metadata service on every call ui packersdk.Ui } var NewDiskAttacher = func(azureClient client.AzureClientSet, ui packersdk.Ui) DiskAttacher { return &diskAttacher{ azcli: azureClient, ui: ui } } func (da *diskAttacher) DetachDisk(ctx context.Context, diskID string) error { //... log.Println("Fetching list of disks currently attached to VM") currentDisks, err := da.getDisks(ctx) if err != nil { .... da.ui.Say("Initial call for fetching VM instance disks returned an error. Checking to see if instance is part of a VM scale set before giving up.")

Then in the using the unit test as an example you could write

testDiskName := t.Name() errorBuffer := &strings.Builder{} ui := &packersdk.BasicUi{ Reader: strings.NewReader(""), Writer: ioutil.Discard, ErrorWriter: errorBuffer, } state := new(multistep.BasicStateBag) state.Put("azureclient", azcli) da := NewDiskAttacher(azcli, ui)

TBH it feels a little weird to me to have all these unexported fields and would prefer to have Ui and AZcli as the field names. But I think sticking with the style that exists is fine for now.

Fully agree with you - passing the state felt a bit too much. Thanks for the suggestion. I managed to incorporate the new field in my latest commit (395277a). It should all be fine now.

Adding field for easy access to the "ui" instead of passing the statebag everywhere.

szamfirov · 2022-04-21T08:34:46Z

@nywilken, please let me know once you manage to give these changes a spin. I'm looking forward to having this merged.

nywilken

Thanks for your patience in getting this change in. The changes look good to me. I left a comment around my thinking when walking through the multiple error logic but I don't think it is an issue. You might have more insight as you are actually using this change already. Please let me know if you have thoughts.

nywilken · 2022-04-29T19:11:02Z

builder/azure/chroot/diskattacher.go

+		log.Printf("DetachDisk.setDisks: error: %+v\n", err)
+		log.Println("Checking to see if instance is part of a VM scale set before giving up.")
+		da.ui.Say("Initial call for setting VM instance disks returned an error. Checking to see if instance is part of a VM scale set before giving up.")
+		err = da.setScaleSetDisks(ctx, newDisks)
+		if err != nil {
+			return err
+		}


So I've walked down this path a bit because it seems like we could be continuing when we should error. Thinking in the case where the VM is not in a scale set and da.setDisks returns an error but for some strange reason the call to da.setScaleSetDisks() doesnt. The the original error which was printed to the screen would not return and allow the build to continue up until something downstream fails because the attach disk is not present.

This case seems unlikely to me as I would expect da.setScaleSetDisks to always error when the vm is not in a scale set. But I did want to mention it because it was not immediately clear that something downstream checks for the existence of the said disks and fail appropriately.

builder/azure/chroot/diskattacher_test.go

Remove unused state

builder/azure/chroot/diskattacher_test.go

Removed unused import

szamfirov · 2022-05-03T15:03:45Z

@nywilken, thank you for merging the changes. I'm curious when the next release will be carved out so we can switch to using the official plugin (as opposed to the forked version)? Thanks in advance!

nywilken · 2022-05-03T15:11:50Z

@nywilken, thank you for merging the changes. I'm curious when the next release will be carved out so we can switch to using the official plugin (as opposed to the forked version)? Thanks in advance!

I'm working on getting out a release this week. If not today (EST) then tomorrow. There is an update to the Packer plugin sdk that I would like to get into the next release.

Svetlin Zamfirov and others added 2 commits January 25, 2022 12:34

Handle running the chroot builder from a VM scale set

c9699d7

Merge branch 'hashicorp:main' into main

156cf74

szamfirov requested a review from a team as a code owner February 11, 2022 09:50

szamfirov mentioned this pull request Feb 11, 2022

Support running packer build (azure-chroot) on Azure Virtual Machine Scale Set hashicorp/packer#9348

Closed

nywilken reviewed Feb 11, 2022

View reviewed changes

nywilken added the enhancement label Feb 11, 2022

nywilken reviewed Feb 11, 2022

View reviewed changes

szamfirov and others added 2 commits March 11, 2022 11:49

Merge branch 'hashicorp:main' into main

1ff4034

chroot builder - more verbose logging when falling back to VMSS logic

526cbc7

chroot builder - fix tests

4adf676

szamfirov requested a review from nywilken March 11, 2022 14:28

nywilken suggested changes Apr 15, 2022

View reviewed changes

Address code review feedback

395277a

Adding field for easy access to the "ui" instead of passing the statebag everywhere.

szamfirov requested a review from nywilken April 21, 2022 08:31

nywilken approved these changes Apr 29, 2022

View reviewed changes

Update builder/azure/chroot/diskattacher_test.go

dfdbba6

Remove unused state

nywilken reviewed Apr 29, 2022

View reviewed changes

builder/azure/chroot/diskattacher_test.go Outdated Show resolved Hide resolved

Update builder/azure/chroot/diskattacher_test.go

eb1468d

Removed unused import

nywilken merged commit 2ff43f2 into hashicorp:main Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for running the chroot builder from a VM scale set #184

Add support for running the chroot builder from a VM scale set #184

szamfirov commented Feb 11, 2022 •

edited

Loading

szamfirov commented Feb 11, 2022

nywilken Feb 11, 2022

nywilken Feb 12, 2022

szamfirov Feb 17, 2022

szamfirov Mar 4, 2022 •

edited

Loading

nywilken left a comment

szamfirov commented Mar 9, 2022

nywilken commented Mar 9, 2022

szamfirov commented Mar 11, 2022 •

edited

Loading

hashicorp-cla commented Mar 12, 2022 •

edited

Loading

szamfirov commented Mar 25, 2022

nywilken commented Mar 31, 2022

jkurek1 commented Apr 11, 2022

nywilken left a comment

nywilken Apr 15, 2022 •

edited

Loading

szamfirov Apr 21, 2022

szamfirov commented Apr 21, 2022

nywilken left a comment

nywilken Apr 29, 2022

szamfirov commented May 3, 2022

nywilken commented May 3, 2022

Add support for running the chroot builder from a VM scale set #184

Add support for running the chroot builder from a VM scale set #184

Conversation

szamfirov commented Feb 11, 2022 • edited Loading

Description

Changes in detail

Risks associated with the changes

Related open issue(s):

szamfirov commented Feb 11, 2022

nywilken Feb 11, 2022

Choose a reason for hiding this comment

nywilken Feb 12, 2022

Choose a reason for hiding this comment

szamfirov Feb 17, 2022

Choose a reason for hiding this comment

szamfirov Mar 4, 2022 • edited Loading

Choose a reason for hiding this comment

nywilken left a comment

Choose a reason for hiding this comment

szamfirov commented Mar 9, 2022

nywilken commented Mar 9, 2022

szamfirov commented Mar 11, 2022 • edited Loading

hashicorp-cla commented Mar 12, 2022 • edited Loading

szamfirov commented Mar 25, 2022

nywilken commented Mar 31, 2022

jkurek1 commented Apr 11, 2022

nywilken left a comment

Choose a reason for hiding this comment

nywilken Apr 15, 2022 • edited Loading

Choose a reason for hiding this comment

szamfirov Apr 21, 2022

Choose a reason for hiding this comment

szamfirov commented Apr 21, 2022

nywilken left a comment

Choose a reason for hiding this comment

nywilken Apr 29, 2022

Choose a reason for hiding this comment

szamfirov commented May 3, 2022

nywilken commented May 3, 2022

szamfirov commented Feb 11, 2022 •

edited

Loading

szamfirov Mar 4, 2022 •

edited

Loading

szamfirov commented Mar 11, 2022 •

edited

Loading

hashicorp-cla commented Mar 12, 2022 •

edited

Loading

nywilken Apr 15, 2022 •

edited

Loading