Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate machine status with libvirt domain details #37

Merged
merged 2 commits into from
Oct 10, 2018
Merged

Populate machine status with libvirt domain details #37

merged 2 commits into from
Oct 10, 2018

Conversation

bison
Copy link
Contributor

@bison bison commented Oct 4, 2018

This populates the machine's status subresource with the libvirt domain's details.

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 4, 2018
@bison
Copy link
Contributor Author

bison commented Oct 4, 2018

/hold

Still a WIP. Almost there.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 4, 2018
Copy link
Contributor

@frobware frobware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me. Going to try it in the wild next.

dom, err := libvirtutils.LookupDomainByUUID(machine, uuid)
if err != nil {
glog.Errorf("Could not lookup created libvirt machine: %v", err)
return fmt.Errorf("error looking up created machine %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Express which machine failed the lookup; ... %q ...", machine...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


if err := a.updateStatus(machine, dom); err != nil {
glog.Errorf("Could not update machine status: %v", err)
return fmt.Errorf("error updating machine status: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto regarding which machine; essentially avoid:

printf("could not open file")

in favour of:

printf("could not open %q: %v", filename, error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

libvirtStatus, err := libvirtutils.ProviderStatus(machine)
if err != nil {
glog.Errorf("Could not get provider status for update: %v", err)
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we, in general, log errors, then return an err? Isn't the caller going to log the error, and perhaps with additional context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly doing this in the "top-level" Update / Create / etc. methods because they are the boundary between the actuator and the upstream controller. I've run into cases where the controller didn't log the returned errors which caused a lot of wasted time debugging.

return nil, fmt.Errorf(LibVirtConIsNil)
}

dom, err := virConn.LookupDomainByUUID(uuid[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having got this far... I'm wondering why we pass a pointer to the uuid? Is it because it's an underlying C/native type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, but it's represented as an array of 16 bytes, not a slice -- which is why it has to be sliced like this here. It seemed nicer to pass a pointer, especially in the case where the method might return an error and a nil UUID pointer rather than an error and a 16 byte array of zeros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up dropping the UUID thing altogether and just looking up by name. It's simpler and should be fine.


status, ok := obj.(*providerconfigv1.LibvirtMachineProviderStatus)
if !ok {
return nil, fmt.Errorf("unexpected object: %#v", gvk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would expressing %T be more useful here? i.e., what type did we get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gvk struct will have the group, version, and kind information. Honestly, this bit is just lifted from the AWS actuator, but it seems like a reasonable thing to log in this situation.

return "Shutoff"
}

return "Unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this the default case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done.

@bison
Copy link
Contributor Author

bison commented Oct 6, 2018

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 6, 2018
@bison
Copy link
Contributor Author

bison commented Oct 6, 2018

I think this is ready.

cloud/libvirt/actuators/machine/actuator.go Outdated Show resolved Hide resolved
// object.
func EncodeLibvirtMachineProviderStatus(status *providerconfigv1.LibvirtMachineProviderStatus) (*runtime.RawExtension, error) {
// TODO: Modifying this in place feels weird. Should it be copied?
status.TypeMeta = metav1.TypeMeta{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not modify it. Weird you have to set the APIVersion and the Kind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the AWS actuator does, but yeah, it felt weird. I don't think it's needed if we just set the metadata upfront when we first create a status, so I've done that instead and removed this bit.

return error.WithLog(err, "error getting provider status")
}

uuid, err := uuid.Parse(*libvirtStatus.InstanceID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panics when libvirtStatus.InstanceID is nil. In case you call the Update from scratch without populating machine status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine to look based on domain name (to be define under cloud/libvirt/actuators/machine/utils/domain.go):

func LookupDomainByName(machine *clusterv1.Machine) (*libvirt.Domain, error) {
	machineProviderConfig, err := MachineProviderConfigFromClusterAPIMachineSpec(&machine.Spec)
	client, err := buildClient(machineProviderConfig.URI)
	if err != nil {
		return nil, fmt.Errorf("Failed to build libvirt client: %s", err)
	}

	virConn := client.libvirt
	if virConn == nil {
		return nil, fmt.Errorf(LibVirtConIsNil)
	}

	dom, err := virConn.LookupDomainByName(machine.Name)
	if err != nil {
		return nil, fmt.Errorf("Failed to lookup domain by name: %v", err)
	}

	return dom, nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ditched the whole UUID thing. It didn't make sense to look up the domain via a field stored in the status in order to update the status anyway. The domain is looked up by name now.

@ingvagabund
Copy link
Member

Just few nits, otherwise lgtm. I was able to apply the patch, build it and run it to collect machine status. Thanks @bison !!!

@ingvagabund
Copy link
Member

@enxebre let's merge this PR first before applying #38

@@ -62,11 +105,139 @@ func (a *Actuator) Delete(cluster *clusterv1.Cluster, machine *clusterv1.Machine
// Update updates a machine and is invoked by the Machine Controller
func (a *Actuator) Update(cluster *clusterv1.Cluster, machine *clusterv1.Machine) error {
glog.Infof("Updating machine %v for cluster %v.", machine.Name, cluster.Name)
return fmt.Errorf("TODO: Not yet implemented")
error := errorWrapper{cluster: cluster, machine: machine}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe errorWr? I got confused by the error.WithLog. The error is allocated for error type. We should not override it unless necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed.

Dockerfile Outdated
@@ -15,6 +15,8 @@
# Reproducible builder image
FROM openshift/origin-release:golang-1.10 as builder

RUN yum install -y libvirt-devel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add this as libvirt-devel is already in openshift/origin-release:golang-1.10 image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was already fixed in master. This bit is gone after a rebase.


// UpdateProviderStatus updates the provider status in-place with info
// from the given libvirt domain.
func UpdateProviderStatus(status *providerconfigv1.LibvirtMachineProviderStatus, dom *libvirt.Domain) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any machine api specific logic does not belong here, we can address it after merging and rebasing #38

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@frobware frobware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to wait until this is re-based.

cloud/libvirt/actuators/machine/actuator.go Outdated Show resolved Hide resolved
if name == "" {
return fmt.Errorf("Failed to create domain, name is empty")
return nil, fmt.Errorf("Failed to create domain, name is empty")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is 'failed to create domain'. Rather, 'name is empty'. :-/

}

// Get values from machineProviderConfig
if err := domainDefInit(&domainDef, name, *machineProviderConfig); err != nil {
return fmt.Errorf("Failed to init domain definition from machineProviderConfig: %v", err)
return nil, fmt.Errorf("Failed to init domain definition from machineProviderConfig: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consistently use %v for error. In the few lines above we use %s.

}
} else {
return fmt.Errorf("machine does not has a IgnKey value")
return nil, fmt.Errorf("machine does not has a IgnKey value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: machine does not _have_ a IgnKey value.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 9, 2018
@bison
Copy link
Contributor Author

bison commented Oct 9, 2018

Rebased and simplified this quite a bit. I think it's probably ready as-is, but a few open questions:

  1. The status field names are copied from the AWS actuator. Should we update them to make more sense here? e.g. DomainUUID instead of InstanceID, or is there a compelling reason to have them match the AWS status?

  2. I haven't included any conditions in the status. Happy to do that now if we think it's important, or can tackle it in a follow up PR.

@paulfantom
Copy link
Contributor

/approve
/hold

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 10, 2018
@ingvagabund
Copy link
Member

/lgtm
/hold cancel

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 10, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bison, ingvagabund, paulfantom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [bison,ingvagabund,paulfantom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@frobware
Copy link
Contributor

frobware commented Oct 10, 2018

2. I haven't included any conditions in the status. Happy to do that now if we think it's important, or can tackle it in a follow up PR.

I vote we do that in a follow-up PR. Better to keep them small and easily reviewable (and revertable).

@openshift-merge-robot openshift-merge-robot merged commit 8671908 into openshift:master Oct 10, 2018
@bison bison deleted the machine-status branch October 10, 2018 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants