Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tf-state to machine object, and remove file system dependency #264

Merged
merged 5 commits into from
Jun 1, 2018

Conversation

karan
Copy link
Contributor

@karan karan commented Jun 1, 2018

What this PR does / why we need it:

#224

  1. Bake the terraform providers in the controller image.
  2. Instead of storing the terraform config and state on disk, rely on the named machine config map and machine object. See below for a little more details.
  3. Lots of other refactors for readability.

Special notes for your reviewer:

  1. The only thing I have supported using this model, and tested is cluster create. Update will be done after we move to clusterctl and delete vsphere-deployer.

  2. During bootstrap, the machineClient does not exist (we are not in K8s land). This means that to transfer the state for the master machine, we still need to scp the directory to master. This will be fixed after Working vsphere clusterctl example #263 -- in minikube, machineClient will always exist, so we can simply pivot the machine object and remove the volume mount entirely. Expect a follow-up PR.

  3. The current flow now is that when we need to create a new machine, we create a staging directory /tmp/cluster-api/machines/$MACHINE_NAME/. After the machine is created, the tfstate is populated in the Machine object as an annotation. Then the staging directory is deleted.

  4. There are a lot of other refactors I want to do, but let's leave those our of this PR.

  5. I am not updating the image. That will happen after I integrate and test cluster creation with clusterctl (and jess's bootstrap change).

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: karan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 1, 2018
@karan
Copy link
Contributor Author

karan commented Jun 1, 2018

/assign @mkjelland

/assign @kcoronado

var machineControllerImage = "gcr.io/k8s-cluster-api/vsphere-machine-controller:0.0.1"

//var machineControllerImage = "gcr.io/k8s-cluster-api/vsphere-machine-controller:0.0.2"
var machineControllerImage = "gcr.io/karangoel-gke-1/vsphere-machine-controller:0.0.2-dev"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to use leave the dev image uncommented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can remove it although the image published is a non-dev image.

return err
}

if verr := vc.validateMachine(machine, config); verr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use a new error (verr) instead of using the previously declared err?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No -- legacy code.

@@ -398,15 +466,17 @@ func (vc *VsphereClient) Update(cluster *clusterv1.Cluster, goalMachine *cluster
// This can only happen right after bootstrapping.
if goalMachine.ObjectMeta.Annotations == nil {
ip, _ := vc.GetIP(goalMachine)
glog.Info("Annotations do not exist. Populating existing state for bootstrapped machine.")
return vc.updateAnnotations(goalMachine, ip)
glog.Info("Annotations do not exist. This happens when for a newly bootstrapped machine.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "when"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return string(tfStateBytes), nil
}

return "", errors.New("could not get tfstatae")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: should be tfstate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.Run()

return nil
}

func (vc *VsphereClient) updateAnnotations(machine *clusterv1.Machine, masterEndpointIp string) error {
func (vc *VsphereClient) updateAnnotations(machine *clusterv1.Machine, masterEndpointIp, tfState string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on why this is an annotation, as opposed to a field in machine status ProviderStatus? Do you see it moving there eventually, or is there a reason why you don't think it makes sense there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment.

// We are storing these as annotations and not in Machine Status because that's intended for
// "Provider-specific status" that will usually be used to detect updates. Additionally,
// Status requires yet another version API resource which is too heavy to store IP and TF state.

Copy link
Contributor

@mkjelland mkjelland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, leaving for @kcoronado to add the lgtm label

Copy link
Contributor

@kcoronado kcoronado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
The comments and PR description were helpful for me to understand what was going on, so thanks for that!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2018
@k8s-ci-robot k8s-ci-robot merged commit f104e0e into kubernetes-sigs:master Jun 1, 2018
@karan
Copy link
Contributor Author

karan commented Jun 1, 2018

Thanks for the reviews!

jayunit100 pushed a commit to jayunit100/cluster-api that referenced this pull request Jan 31, 2020
…ubernetes-sigs#264)

* make the new broken type work, pass user, pass to tf

* cleanup logs, bump image

* Progress so far

* make state in machine object work

* Add and fix comments in vsphere provider
jayunit100 pushed a commit to jayunit100/cluster-api that referenced this pull request Jan 31, 2020
* add standalone esx support

* move all glog to klog

* Fixed machine provisioning on ESXi.

- fixed boot sequence on some images (e.g. xenial)
- fixed sudo on machines without DNS access
- fixed cloud provider bootstrap
- fixed rbac role preventing machine deletion
- refactored templates.go and the esx cloning code

Fixed boot sequence on some images by adding a serial port to allow random
number initialization.  This affect some images like Xenial.  It currently
adds a serial port to all machines if it doesn't already in the vm spec.
Fixed sudo access for machines without DNS access, which for most development
scenarios in nested ESXi on dev laptops.  Fixed cloud provider bootstrapping
on infrastructure that do not have cloud provider support (e.g. ESXi)

issue kubernetes-sigs#177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants