Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for running the chroot builder from a VM scale set #184

Merged
merged 8 commits into from
Apr 29, 2022
Merged

Add support for running the chroot builder from a VM scale set #184

merged 8 commits into from
Apr 29, 2022

Conversation

szamfirov
Copy link
Contributor

@szamfirov szamfirov commented Feb 11, 2022

Description

With the current implementation it is impossible to build VM images from a VM scale set machine using the chroot builder. The following PR adds support for VM scale set virtual machines by making use of the VirtualMachineScaleSetVMsClient. The original request for this functionality can be found here: hashicorp/packer#9348, which is identical to our own use case.

Changes in detail

The following changes have been made:

  • Adding the VirtualMachineScaleSetVMsClient
  • Adding helper functions to: get VM details, get VM disks, set VM disks using the new client
  • The existing logic will fall back to the new implementation in case of errors - it will first try to get details as if running from a standalone VM and if any of these calls fail, it will use the VirtualMachineScaleSetVMsClient to retry and continue with the rest of the logic.
  • Fix unmounting of filesystem on success - multiple filesystem mountpoints are initially mounted under the same location which was causing issues with the final cleanup step. The unmount is now done recursively which solves this issue (see -R, --recursive from the official manual pages). Recursion for each directory will stop if any unmount operation in the chain fails for any reason (and thus preserving the same behavior).

Risks associated with the changes

No risks - there is no negative effect on the existing implementation. We are already using the changes in anger and we haven't discovered any issues.

Related open issue(s):

Closes #9348

@szamfirov
Copy link
Contributor Author

Just to mention that the check-lint failures have nothing to do with the changes in this PR.

Comment on lines +40 to +43
currentDisks, err = da.getScaleSetDisks(ctx)
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to identify the exact error that can be ignored from getDisk?

With this implementation we will lose the original error from calling da.getDisks(). It would be great to have both errors in the case that both function calls fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should mention that the Packer SDK does have a multierror type that you could use. I think something like the code below would work but I have not tested so it might need a few changes.

var errs *packersdk.MultiError
currentDisks, err := da.getDisks(ctx)
if err != nil {
  errs = packersdk.MultiErrorAppend(errs, err)
}
if errs != nil && len(errs.Errors) > 0 {
  currentDisks, err = da.getScaleSetDisks(ctx)
  if err != nil {
    errs = packersdk.MultiErrorAppend(errs, err)
    return errs
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the lack of activity but I was quite stretched this week.

I agree that we should bubble the error up somehow for the user to be aware but I'm struggling to find a proper solution. With the current implementation, any error (or MultiError for that matter) which is returned from DetachDisk, AttachDisk and the likes will result in a multistep.ActionHalt in the main "step" which will essentially stop the build process. So this might not be the perfect solution.

Is there a decent way to surface such message as a "warning"? I'm thinking of something much simpler which is just to notify the user that the initial call failed and display what the error was before the falling back to the VMSS logic.

Let me share an example for a bit better understanding. Something along the lines of:

currentDisks, err := da.getDisks(ctx)
if err != nil {
  log.Printf("DetachDisk.getDisks: error: %+v\nFalling back to the VM scale set implementation...", err)
  // We can even make this explicit in the UI
  ui.Say("Fetching VM instance disks failed. Enable debugging for more details. Falling back to the VM scale set implementation...")
  currentDisks, err = da.getScaleSetDisks(ctx)
  if err != nil {
    return err
  }
}

Copy link
Contributor Author

@szamfirov szamfirov Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nywilken, just a friendly nudge. Please let me know what you think on the above ☝️

Copy link
Contributor

@nywilken nywilken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. All the acceptance test pass but I left a comment around losing potential error information. Thanks for making the fix. Please let us know if you have any questions on the review.

@szamfirov
Copy link
Contributor Author

@nywilken, just a friendly nudge. Please let me know what you think on the above ☝️

@nywilken
Copy link
Contributor

nywilken commented Mar 9, 2022

@nywilken, just a friendly nudge. Please let me know what you think on the above ☝️

Hi @szamfirov apologies, I've been a bit busy myself these past two weeks. So I haven't had much time to look into this further. That said I will provide a little context here to maybe unblock otherwise I will have more time next week.

With the current implementation, any error (or MultiError for that matter) which is returned from DetachDisk, AttachDisk and the likes will result in a multistep.ActionHalt in the main "step" which will essentially stop the build process. So this might not be the perfect solution.

Without your change, does the build process fail if there is an error?

Without looking into the callers I don't know the immediate answer. So I will rely on your response.

If the behavior is to fail are you suggesting that we change that behavior and warn the user through the UI and logs instead of stopping the build? Or are you suggesting that we don't return immediately after the first getDisk error, warn the user that the first call failed, and retry the getDisk call but for VMSS?

Again, quickly looking at the code it seems to be that we need to return an error and not proceed if any of the calls to getDisks fail. Is that a correct observation?

If so I think warning the user that one or more calls failed is helpful but ultimately we should not continue the build if a valid response from getDisk is needed. Again saying this with out dividing into the code. Please let me know if I am mistaken.

Let me share an example for a bit better understanding. Something along the lines of:

currentDisks, err := da.getDisks(ctx)
if err != nil {
  log.Printf("DetachDisk.getDisks: error: %+v\nFalling back to the VM scale set implementation...", err)
  // We can even make this explicit in the UI
  ui.Say("Fetching VM instance disks failed. Enable debugging for more details. Falling back to the VM scale set implementation...")
  currentDisks, err = da.getScaleSetDisks(ctx)
  if err != nil {
    return err
  }
}

Warning the user via the UI is a good option. It will require the UI being added to the step structure or pulling it from state if available. That said, I find the wording "falling back to the VM scale set" confusing as the user may not be working with scale sets.

Maybe having something a bit more like

log.Printf("an attempt to fetch vm instance disks failed: %v\n", err)
log.Printf("checking to see if instance disk is part of a vm scale set\n")
ui.Say("Fetching VM instance disks failed. Checking to see if instance disk is part of a VM scale set before giving up")

@szamfirov
Copy link
Contributor Author

szamfirov commented Mar 11, 2022

Hi @nywilken, thank you for the detailed response.

Or are you suggesting that we don't return immediately after the first getDisk error, warn the user that the first call failed, and retry the getDisk call but for VMSS?

That's exactly what I'm suggesting. Otherwise it would be difficult to determine when will be the right time to proceed with the fallback to VMSS.

I've updated the PR and have added more detailed logging (both UI and debug logs). Please have a look when you have the time.

@szamfirov szamfirov requested a review from nywilken March 11, 2022 14:28
@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

@szamfirov
Copy link
Contributor Author

Hi @nywilken, a gentle nudge. Let me know if I can do anything more in order to get this PR closer to being merged.

@nywilken
Copy link
Contributor

@szamfirov apologies for the delay. I've been away from the machines for the past few weeks (vacation and things). I'll get to this PR next week. Thanks for updating the PR and for keeping a pulse on the changes. Excited to help get these changes in.

@jkurek1
Copy link

jkurek1 commented Apr 11, 2022

Hi guys. Any news?

Copy link
Contributor

@nywilken nywilken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the updated approach to work. I will pull down the changes and give the PR a run. In the meantime, I left a few suggestions for how to pass in a Ui for each of the Attacher types.

)

type DiskAttacher interface {
AttachDisk(ctx context.Context, disk string) (lun int32, err error)
AttachDisk(ctx context.Context, state multistep.StateBag, disk string) (lun int32, err error)
Copy link
Contributor

@nywilken nywilken Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As opposed to passing in the entire statebag to get to the UI. I would recommend adding a field to each of these types for setting the UI. This way you can access the UI simply by calling da.ui.(....).

For example given the DiskAttacher type you could add the field

type diskAttacher struct {
	azcli client.AzureClientSet
	vm *client.ComputeInfo // store info about this VM so that we don't have to ask metadata service on every call
        ui packersdk.Ui
}

var NewDiskAttacher = func(azureClient client.AzureClientSet, ui packersdk.Ui) DiskAttacher {
	return &diskAttacher{
		azcli: azureClient,
                ui: ui
	}
}

func (da *diskAttacher) DetachDisk(ctx context.Context, diskID string) error {
 //...

	log.Println("Fetching list of disks currently attached to VM")
	currentDisks, err := da.getDisks(ctx)
	if err != nil {
....
		da.ui.Say("Initial call for fetching VM instance disks returned an error. Checking to see if instance is part of a VM scale set before giving up.")

Then in the using the unit test as an example you could write

	testDiskName := t.Name()

	errorBuffer := &strings.Builder{}
	ui := &packersdk.BasicUi{
		Reader:      strings.NewReader(""),
		Writer:      ioutil.Discard,
		ErrorWriter: errorBuffer,
	}

	state := new(multistep.BasicStateBag)
	state.Put("azureclient", azcli)
        da := NewDiskAttacher(azcli, ui)

TBH it feels a little weird to me to have all these unexported fields and would prefer to have Ui and AZcli as the field names. But I think sticking with the style that exists is fine for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agree with you - passing the state felt a bit too much. Thanks for the suggestion. I managed to incorporate the new field in my latest commit (395277a). It should all be fine now.

Adding field for easy access to the "ui" instead of passing the statebag
everywhere.
@szamfirov szamfirov requested a review from nywilken April 21, 2022 08:31
@szamfirov
Copy link
Contributor Author

@nywilken, please let me know once you manage to give these changes a spin. I'm looking forward to having this merged.

Copy link
Contributor

@nywilken nywilken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience in getting this change in. The changes look good to me. I left a comment around my thinking when walking through the multiple error logic but I don't think it is an issue. You might have more insight as you are actually using this change already. Please let me know if you have thoughts.

Comment on lines +67 to +73
log.Printf("DetachDisk.setDisks: error: %+v\n", err)
log.Println("Checking to see if instance is part of a VM scale set before giving up.")
da.ui.Say("Initial call for setting VM instance disks returned an error. Checking to see if instance is part of a VM scale set before giving up.")
err = da.setScaleSetDisks(ctx, newDisks)
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've walked down this path a bit because it seems like we could be continuing when we should error. Thinking in the case where the VM is not in a scale set and da.setDisks returns an error but for some strange reason the call to da.setScaleSetDisks() doesnt. The the original error which was printed to the screen would not return and allow the build to continue up until something downstream fails because the attach disk is not present.

This case seems unlikely to me as I would expect da.setScaleSetDisks to always error when the vm is not in a scale set. But I did want to mention it because it was not immediately clear that something downstream checks for the existence of the said disks and fail appropriately.

builder/azure/chroot/diskattacher_test.go Outdated Show resolved Hide resolved
@nywilken nywilken merged commit 2ff43f2 into hashicorp:main Apr 29, 2022
@szamfirov
Copy link
Contributor Author

@nywilken, thank you for merging the changes. I'm curious when the next release will be carved out so we can switch to using the official plugin (as opposed to the forked version)? Thanks in advance!

@nywilken
Copy link
Contributor

nywilken commented May 3, 2022

@nywilken, thank you for merging the changes. I'm curious when the next release will be carved out so we can switch to using the official plugin (as opposed to the forked version)? Thanks in advance!

I'm working on getting out a release this week. If not today (EST) then tomorrow. There is an update to the Packer plugin sdk that I would like to get into the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support running packer build (azure-chroot) on Azure Virtual Machine Scale Set
4 participants