-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Digital Ocean] Implement KOPS validate cluster #9476
[Digital Ocean] Implement KOPS validate cluster #9476
Conversation
Hi @srikiz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rifelpet, srikiz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
for _, member := range g.Members { | ||
|
||
// DO doesn't have a notion of Auto Scaling Group - use the same config Type for both current config and new config. | ||
err := cg.NewCloudInstanceGroupMember(member, g.GroupType, g.GroupType, nodeMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means DO won't be able to know when to do a rolling update. All rolling updates would have to use --force
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the DO Cloud, it doesn't (yet) appear to support deleting instances, so I'm guessing rolling updates don't work at all. Perhaps @srikiz can confirm?
kops/pkg/resources/digitalocean/cloud.go
Lines 101 to 111 in 38195fb
// DeleteInstance is not implemented yet, is func needs to delete a DO instance. | |
func (c *Cloud) DeleteInstance(i *cloudinstances.CloudInstanceGroupMember) error { | |
klog.V(8).Info("digitalocean cloud provider DeleteInstance not implemented yet") | |
return fmt.Errorf("digital ocean cloud provider does not support deleting cloud instances at this time") | |
} | |
// DetachInstance is not implemented yet. It needs to cause a cloud instance to no longer be counted against the group's size limits. | |
func (c *Cloud) DetachInstance(i *cloudinstances.CloudInstanceGroupMember) error { | |
klog.V(8).Info("digitalocean cloud provider DetachInstance not implemented yet") | |
return fmt.Errorf("digital ocean cloud provider does not support surging") | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @rifelpet - that is something I want to check next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks like DO doesn't support rolling updates. So that's not a blocker for this PR.
The comment is a bit off-point—the second and third parameters just need to be unequal whenever any field of the godo.DropletCreateRequest
used to create the instance was different than what would be used to create a new instance. This can be implemented as either a generation id or a hash of the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, thanks @johngmyers - I'll keep a note of this when I implement rolling update feature.
/retest Review the full test history for this PR. Silence the bot with an |
pkg/resources/digitalocean/cloud.go
Outdated
@@ -38,6 +39,8 @@ import ( | |||
|
|||
const TagKubernetesClusterIndex = "k8s-index" | |||
const TagKubernetesClusterNamePrefix = "KubernetesCluster" | |||
const TagKubernetesInstanceGroup = "kops-instancegroup" | |||
const DOInstanceGroupConfig = "do-instancegroup-config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this referenced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, removed DOInstanceGroupConfig
Let's wait a bit before merging this. |
@@ -38,6 +39,8 @@ import ( | |||
|
|||
const TagKubernetesClusterIndex = "k8s-index" | |||
const TagKubernetesClusterNamePrefix = "KubernetesCluster" | |||
const TagKubernetesInstanceGroup = "kops-instancegroup" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a duplicate of do.TagKubernetesClusterInstanceGroupPrefix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johngmyers - I think this needs refactoring. I tried to use it and it looks like we have a cyclic dependency here. Ideally, I should have kept this logic (and few other methods) in the do package, but that's a little more work that I thought of.
Would it be okay if we get this merged, and I take up the clean up activity as a separate PR? Please suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps move TagKubernetesClusterInstanceGroupPrefix
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed TagKubernetesClusterInstanceGroupPrefix and instead used the above const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
The DO code is a bit oversimplified, apparently because DO doesn't support autoscaling groups. But kops doesn't care (much) about the autoscaling aspect of autoscaling groups. The existing For validation,
For rolling update, it needs to identify: For rolling update, it also needs |
To further clarify what I wrote, validation will throw failures if there aren't enough instances in a group to cover what that group reports as its target size or if any instances don't have a corresponding node (haven't joined the cluster). Additionally it will throw a failure if Rolling update needs |
`Validating cluster dev5.k8s.local INSTANCE GROUPS NODE STATUS Your cluster dev5.k8s.local is ready` |
pkg/resources/digitalocean/cloud.go
Outdated
|
||
for _, member := range g.Members { | ||
|
||
// DO doesn't have a notion of Auto Scaling Group - use the same config Type for both current config and new config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DO doesn't have a notion of Auto Scaling Group - use the same config Type for both current config and new config. | |
// TODO use a hash of the godo.DropletCreateRequest fields for second and third parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, updated.
@srikiz Rolling update support does not block this PR. In my mind the thing that blocks this PR is its identifying all non-master instances as being in the "nodes" instance group. The droplet creation code should tag instances with the name of the instance group that created them so this code can assign them to the right group. The code for assigning the group of the masters seems unnecessarily complex. If there's a reason the masters need to have special case string manipulation, please explain. |
I believe there is scope to share code between this and |
Thanks for the inputes - I updated code to also check for instance group called "nodes" if it doesn't match the master. While creating the droplet, we currently tag the master with 3 different tags.
For the worker nodes we have the below tags.
I updated code to also check if the matching "kops-instancegroup: nodes" tag is available on the droplet. |
Why don't you put the instancegroup's It looks like the code will only return at most one |
I don't know if there is a way in DO to add more non-master instance groups. DO doesn't support any new instance group creation. I am currently managing with the help of tags. I tag droplets that act as worker nodes with "kops-instancegroup:nodes" tag. If there are more than 1 DO KOPS cluster, I currently retrieve all droplets that match the KOPS cluster name here - https://github.com/kubernetes/kops/pull/9476/files#diff-09d5ce1959bcc4dffcb0d3d88a1ffb55R332 I believe DO currently don't have support for adding additional instance groups. |
Yeah. I moved the consts from the current package (upup/pkg/fi/cloudup/do) into pkg/resources/digitialocean. It I had to modify droplets.go, api_loadbalancer.go, master_volumes.go, etc. I thought it would be safer to make this change as a separate PR. Hope that's okay. |
|
I see, got it. I will put Objectmeta.Name to the kops-instancegroup tag and verify it. Will update the PR shortly. |
@johngmyers - I have incorporated as per your comments - that really helped ! |
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better. I'm putting on a hold in case you want to use the standard type for GroupType
. Feel free to cancel the hold if you want it to go in as-is.
/hold
/lgtm
type DOInstanceGroup struct { | ||
ClusterName string | ||
InstanceGroupName string | ||
GroupType string // will be either "master" or "worker" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pkg/apis/kops/InstanceGroupRole
as the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep a note to update this when I touch the rolling update feature. I'll add a new tag when I create a droplet that holds this information, so I can extract it from the droplet tag.
Thanks for your help in reviewing this @johngmyers !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get the group role from the instancegroup spec of the ig that was in the tag.
/hold cancel |
Implementation of KOPS validate cluster for Digital Ocean KOPS provider.
Tested with
./kops validate cluster
and it returns the state of your KOPS cluster as expected.