Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Network and Compute services which can be used by top level reconciler #139

Merged
merged 1 commit into from
Mar 23, 2019
Merged

New Network and Compute services which can be used by top level reconciler #139

merged 1 commit into from
Mar 23, 2019

Conversation

awesomenix
Copy link
Contributor

What this PR does / why we need it:

  • Creates tiny testable services, replaces current top level compute and network services
  • replaces templates with direct arm calls provides better reconcilation
  • Add azure cloud provider config to all master components and kubelet
  • Creates internal load balancer for all node roles to join, instead of going through public load balancer, currently not hooked up, will be in next PR

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #119, #105, #111

Special notes for your reviewer:

  • Apologize for massive PR, i changed only cluster actuator, but machine actuator dependency caused this change as well, promise wont happen again
  • There will be a follow up PR for cleaning up machine actuator to be more unittestable,
  • Follow up PR for nodes to use Internal load balancer instead of public load balancer
  • currently certs are regenerated every time controller restarts :( , fixes to come
  • availability zones support

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

Complete rewrite of machine and cluster actuators

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/provider/azure Issues or PRs related to azure provider labels Mar 20, 2019
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 20, 2019
@awesomenix
Copy link
Contributor Author

/assign @tariq1890 @justaugustus @juan-lee

@k8s-ci-robot
Copy link
Contributor

@awesomenix: GitHub didn't allow me to assign the following users: juan-lee.

Note that only kubernetes-sigs members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @tariq1890 @justaugustus @juan-lee

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@justaugustus
Copy link
Contributor

@awesomenix -- Noticed you pushed some extra commits. As a contributor pro-tip, you can mark a PR as work-in-progress by including "WIP" anywhere in the PR title (I usually do something like [WIP] Awesome PR). This will automatically label the PR as do-not-merge/work-in-progress, which prevents preemptive reviews and merges.

That said, let me know if this is ready to review and I'll do so when I have a chance. :)

@awesomenix
Copy link
Contributor Author

Sorry was testing out the internal load balancer, so had to move around settings to a common place to it can be referred across multiple files. The PR is ready for review. I apologize for a massive PR, i started off with changes to base network only, but while integration machine actuator got tangled.

  1. Define network, compute tiny services
  2. Call these services in cluster reconciler
  3. Call these service in machine actuator (will be moved to a reconciler in next PR with testing)

Thanks for taking time to review @justaugustus

@awesomenix
Copy link
Contributor Author

I have uploaded azure provider to personal docker repo for testing (if you want to dry run
) quay.io/awesomenix/cluster-api-azure-controller:0.1.0-alpha.3

@awesomenix awesomenix changed the title New Network and Compute services which can be used by top level reconciler [WIP] New Network and Compute services which can be used by top level reconciler Mar 20, 2019
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2019
@awesomenix awesomenix changed the title [WIP] New Network and Compute services which can be used by top level reconciler New Network and Compute services which can be used by top level reconciler Mar 20, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2019
@@ -36,6 +36,9 @@ type AzureClusterProviderSpec struct {
ResourceGroup string `json:"resourceGroup"`
Location string `json:"location"`

// Generated PublicIPName generated, since it should be unique
PublicIPName string `json:"publicIPName,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is generated I'd put it in status.

Copy link
Contributor Author

@awesomenix awesomenix Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, currently it uses consistent hashing, if that method changes in future to be something random, that means, during pivot, target cluster could end up generating a different name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine to stay in the Spec, as eventually, we may want to allow users to supply a PIP that already exists. PIP reconcile should then check for the field and store in Status.

docs/getting-started.md Show resolved Hide resolved
docs/getting-started.md Outdated Show resolved Hide resolved
@@ -48,6 +51,15 @@ type AzureClusterProviderSpec struct {
// SAKeyPair is the service account key pair.
SAKeyPair KeyPair `json:"saKeyPair,omitempty"`

// AdminKubeconfig generated using the certificates part of the spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to find a better place for these. Specs should not contain any secrets. Since these are already in the model I'm open to addressing this in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a TODO here to address later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its more of a design decision on where the secrets should live, also not knowing where the secret generation part is headed (is cluster api upstream project owning it, or will provider own it). I rather have it logged as an issue or a feature request than a TODO here

pkg/cloud/azure/actuators/cluster/reconciler.go Outdated Show resolved Hide resolved
pkg/cloud/azure/actuators/cluster/reconciler.go Outdated Show resolved Hide resolved
// return errors.Wrapf(err, "failed to retrieve kubeconfig while creating machine %q", machine.Name)
// }

networkInterfaceSpec := networkinterfaces.Spec{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider a factory method to return the networkinterfaces.Spec.

e.g.

func makeNICSpec(role NodeRole) *networkinterfaces.Spec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next PR please, currently its already too much of a change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with punting that to next PR.

return publicKeyPins, nil
}

// CreateNewBootstrapToken creates new bootstrap token using in cluster config.
func CreateNewBootstrapToken() (string, error) {
func CreateNewBootstrapToken(kubeconfig string, tokenTTL time.Duration) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider converting parameters into a struct. This makes plumbing easier in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to a struct here.

Copy link
Contributor Author

@awesomenix awesomenix Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it for now, will be addressed when refactored

return nil
}

// CreateOrUpdate Helper function so this can be unittested.
func (s *Service) CreateOrUpdate(ctx context.Context) error {
func (s *Service) CreateOrUpdate(ctx context.Context, spec interface{}) error {
klog.V(2).Infof("generating certificates")
clusterName := s.scope.Cluster.Name
tmpDirName := "/tmp/cluster-api/" + clusterName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting this to remove the dependency on the path, and on the concrete file system.
Relying on io.Reader makes it easy to test without touching the filesystem

Copy link
Contributor Author

@awesomenix awesomenix Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, side effect of using kubeadm which always writes to filesystem, not sure if there is an internal API to achieve that, please file an issue and will be addressed in future PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's punt to another PR. @serbrech -- please file an issue for this.

limitations under the License.
*/

package internalloadbalancers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want a separate package for internalloadbalancers ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each service has a purpose, defines a target behaviour rather than mimicking azure service itself, agreed that the code could be 50% same, its better to keep it in different packages to test the behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like loadbalancers to be consolidated into a single package. The requesting code path can do something like:

  • Specify NetworkSpec
  • NetworkSpec will hold something like, NetworkType: {public,private} or defaultLoadBalancerType: {public,internal|private}

From there, load balancer methods should take that as a parameter and have:

func ReconcileLoadBalancer(type) {
  ...
  case: "public":
    ...
  case: "private|internal"
    ...
}

Let's minimize duplication of code where we can, as the probe, backend pool, frontendIp logic will largely be the same as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branches are the cause of most bugs in software, i would rather separate the implementation in separate files and maintain duplicity if any. Looking at the difference between the 2 (as of now which can change in future)

Common:

  • Load Balancing rules
  • Probes
  • Get
  • Delete

Differences:

InternalLoadBalancers:

  • Static Address
  • Uses Master Subnet

PublicLoadBalancers:

  • Dynamic
  • Public IP Address
  • Nat Rules

There is quite a bit of difference, but to save 25-30% code duplication is not worth sacrificing the code readability.

May be in future it would be nice to separate the configuration into separate file, would rather postpone to future when this is absolutely required.

Copy link
Contributor

@justaugustus justaugustus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awesomenix -- Done with a first pass review. This is more style introspection than logic specific.
I'll give another pass once we address some of my comments.

Adding Nick for review as well.
/assign @soggiest

.gitignore Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
docs/getting-started.md Show resolved Hide resolved
docs/getting-started.md Outdated Show resolved Hide resolved
pkg/cloud/azure/services/publicips/service.go Show resolved Hide resolved
pkg/deployer/deployer.go Outdated Show resolved Hide resolved
@justaugustus
Copy link
Contributor

@awesomenix -- Additional request: Do we know if this results in a "working" (by our current loose definition of working) cluster?

If so, can you post explicit instructions about how you're building the cluster in this branch?

…r to pure ARM calls

fix: New Compute and Network micro services convering machine, cluster to pure ARM calls

fix: New Network services which can be used by top level reconciler

Add Virtual Machine and Virtual Machine Extensions as well, unforunately this is a massive PR

Atleast create single node clusters

Add Azure Cloud provider config to all masters and client

Use Internal LB IP for nodes to communicate with Control plane

Move all defaults to a separate file under services

Minor fix to remove any accidental tabs present in yaml file used in startup script

Move AdminKubeconfig and DiscoveryHashes back to ClusterConfig, since kubeadm uses on disk certificates, we only update if the spec certs are empty, causing mismatch

Address review comments, convert spec to an defined interface and use ref rather than value
@awesomenix
Copy link
Contributor Author

@justaugustus updated PR, can you please take another look.

Additional request: Do we know if this results in a "working" (by our current loose definition of working) cluster?

Yes, the instructions are exactly the same as mentioned in getting started. Just need a new release to be cut.

@awesomenix
Copy link
Contributor Author

I0320 22:33:01.828489     534 clusterclient.go:818] Waiting for Machine test1-node-p4sdh to become ready...
I0320 22:33:01.913574     534 clusterdeployer.go:143] Done provisioning cluster. You can now access your cluster with kubectl --kubeconfig kubeconfig
I0320 22:33:01.913898     534 createbootstrapcluster.go:36] Cleaning up bootstrap cluster.
I0320 22:33:01.913912     534 kind.go:57] Running: kind [delete cluster --name=clusterapi]
I0320 22:33:03.645543     534 kind.go:60] Ran: kind [delete cluster --name=clusterapi] Output: Deleting cluster "clusterapi" ...
test1-controlplane-0   Ready     master    6m        v1.13.4
test1-node-p4sdh       Ready     <none>    2m        v1.13.4

image

@justaugustus
Copy link
Contributor

@awesomenix -- That screenshot is AWESOME! :)
I've had a work event all this week, but will try to give this a final review tonight.

@awesomenix
Copy link
Contributor Author

@justaugustus if possible can you please do a final review, it would be nice to get this in before the weekend, so can work on further important improvements such as tests, availability zones, machine actuator cleanup

@justaugustus
Copy link
Contributor

@awesomenix -- Fabulous job!
I'm approving to not slow down the work, but in future, we need to be extremely cautious of introducing large changes like this, especially as we're starting to onboard more contributors.

@awesomenix / @juan-lee / @serbrech / @tariq1890 / @soggiest -- As we start to work more and more of the backlog, I need everyone to explicitly declare the things they'll be working on and create/assign issues, if they don't already exist to prevent collisions and large PRs.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 23, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awesomenix, justaugustus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2019
@k8s-ci-robot k8s-ci-robot merged commit b55f066 into kubernetes-sigs:master Mar 23, 2019
@awesomenix awesomenix deleted the networkinterface branch March 23, 2019 17:18
openshift-merge-robot referenced this pull request in openshift/cluster-api-provider-azure Mar 25, 2019
…r to pure ARM calls (#139)

fix: New Compute and Network micro services convering machine, cluster to pure ARM calls

fix: New Network services which can be used by top level reconciler

Add Virtual Machine and Virtual Machine Extensions as well, unforunately this is a massive PR

Atleast create single node clusters

Add Azure Cloud provider config to all masters and client

Use Internal LB IP for nodes to communicate with Control plane

Move all defaults to a separate file under services

Minor fix to remove any accidental tabs present in yaml file used in startup script

Move AdminKubeconfig and DiscoveryHashes back to ClusterConfig, since kubeadm uses on disk certificates, we only update if the spec certs are empty, causing mismatch

Address review comments, convert spec to an defined interface and use ref rather than value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ReconcileLoadBalancer panics on first run due to a check on a nil value
7 participants