Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ec2.DescribeInstanceTypes for machine type info #8856

Merged
merged 8 commits into from
Jun 9, 2020

Conversation

rifelpet
Copy link
Member

@rifelpet rifelpet commented Apr 7, 2020

The AWS client is not initialized by the time API validation is performed, so I had to move the machineType instance group field validation to apply_cluster.go.

Marking WIP because I still need to update nodeup/pkg/model/kubelet.go's use of GetMachineTypeInfo (only relevant with the AmazonVPC CNI), as well as some tests and mocks. I also realize the validation is no longer returning multiple validation errors, I'll get that fixed.

Feedback is welcome on a better approach to support every use of GetMachineTypeInfo.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 7, 2020
@k8s-ci-robot k8s-ci-robot added area/api area/provider/aws Issues or PRs related to aws provider approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 7, 2020
Copy link
Member

@justinsb justinsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!

@@ -49,3 +49,8 @@ func (m *MockEC2) DescribeInstancesPages(request *ec2.DescribeInstancesInput, ca
func (m *MockEC2) DescribeInstancesPagesWithContext(aws.Context, *ec2.DescribeInstancesInput, func(*ec2.DescribeInstancesOutput, bool) bool, ...request.Option) error {
panic("Not implemented")
}

func (m *MockEC2) DescribeInstanceTypes(*ec2.DescribeInstanceTypesInput) (*ec2.DescribeInstanceTypesOutput, error) {
klog.Warningf("MockEc2::DescribeInstanceTypes is stub-implemented")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, we can try to narrow the instance types we support in testing, or just support fake instance types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function isn't actually used, it just needs to be defined in order to satisfy the mock interface. The actual stub we use in integration testing is in upup/pkg/fi/cloudup/awsup/mock_aws_cloud.go where I defined a reasonable mock behavior.

pkg/apis/kops/validation/aws.go Show resolved Hide resolved
upup/pkg/fi/cloudup/awsup/aws_cloud.go Outdated Show resolved Hide resolved
InstanceIPsPerENI: intValue(info.NetworkInfo.Ipv4AddressesPerInterface),
}
memoryGB := float64(intValue(info.MemoryInfo.SizeInMiB)) / 1024
machine.MemoryGB = float32(math.Round(memoryGB*100) / 100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we remove the rounding logic once we're happy we get matching values? Maybe just add a todo?

upup/pkg/fi/cloudup/awsup/aws_cloud.go Outdated Show resolved Hide resolved
@mikesplain
Copy link
Contributor

This is great, thanks for your work on this @rifelpet!

@johngmyers
Copy link
Member

Deferring the field validation until cloudup is really unfortunate. It looks like that would mean a typoed machinetype would not be caught during kops edit instancegroup but would fail on the subsequent kops update cluster.

How bad would it be for api validation to try getting a fi.Cloud? If that fails it could silently skip validating the machinetypes at that point.

@rifelpet
Copy link
Member Author

I agree it would be unfortunate. I believe that validation is called by PopulateClusterSpec (which is called by cmd/kops/edit_cluster.go and cmd/kops/create_cluster.go). PopulateClusterSpec is called twice, once with strict=false and then again with strict=true after other fields are populated. The second call occurs after fi.Cloud is initialized so it might be possible to pass in the Cloud object and only validate when strict=true.

Thoughts?

@rifelpet rifelpet force-pushed the machine-types-2 branch 2 times, most recently from e699475 to 066b296 Compare May 27, 2020 03:37
@rifelpet rifelpet force-pushed the machine-types-2 branch 9 times, most recently from f4867a9 to 7d0c704 Compare May 29, 2020 05:01
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. area/documentation labels May 29, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 29, 2020
@rifelpet rifelpet changed the title [WIP] Use ec2.DescribeInstanceTypes for machine type info Use ec2.DescribeInstanceTypes for machine type info Jun 3, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 3, 2020
@rifelpet
Copy link
Member Author

rifelpet commented Jun 8, 2020

As discussed during office hours, I added a release note mentioning the behavior change. Since there isn't much of a remedy we have available other than contacting AWS support for a rate limit increase, I opted to instruct the user to open a GH issue. If anyone has any better ideas please let me know.

/assign @mikesplain
for testing

@johngmyers
Copy link
Member

/retest

upup/pkg/fi/cloudup/awsup/aws_cloud.go Show resolved Hide resolved
pkg/apis/kops/validation/aws.go Outdated Show resolved Hide resolved
pkg/apis/kops/validation/aws.go Outdated Show resolved Hide resolved
pkg/apis/kops/validation/aws.go Outdated Show resolved Hide resolved
pkg/apis/kops/validation/instancegroup.go Outdated Show resolved Hide resolved
docs/releases/1.19-NOTES.md Outdated Show resolved Hide resolved
rifelpet added 7 commits June 9, 2020 10:12
This requires passing a cloud object in additional places throughout the validation package and originating mostly from cmd/kops

This means that some kops commands now require valid cloud provider credentials, but I don't think this is an issue because the vast majority of use-cases already require the same cloud provider credentials in order to interact with the state store.
@mikesplain
Copy link
Contributor

I just tried to test this out twice but both times I think there was a networking issue... kubedns/coredns failed both times. I'm going to dig in and make sure it's not related.

Ratelimiting didn't appear to hamper things too much in one of our busiest accounts so I think we're probably safe there.

@mikesplain
Copy link
Contributor

This works great now with the rebase/fixes. Thanks @rifelpet!

Testing didn't show alot on my side, things worked as expected and the impact of the additional api calls was less than I had expected (very minimal actually), even on a busy api.

I think we should get this rolling and get more people using this asap so I'm going to approve and hold. Feel free to remove the hold as you see fit!

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 9, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mikesplain, rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mikesplain,rifelpet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rifelpet
Copy link
Member Author

rifelpet commented Jun 9, 2020

sounds good, thanks for the review! thoughts on cherry-picking to 1.18? or should we let this sit for a few extra months in 1.19 pre-releases for additional testing?
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2020
@k8s-ci-robot k8s-ci-robot merged commit d5b8e6c into kubernetes:master Jun 9, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jun 9, 2020
@mikesplain
Copy link
Contributor

I think it makes sense to get into 1.18, less to deal with in terms of machine updates... We could also let it bake for a few days in master if you'd like. Thanks again!

@johngmyers
Copy link
Member

I would prefer we not put larger features into 1.18 at this point. It is in beta and we should be working on getting it out the door.

@olemarkus
Copy link
Member

Second what John said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api area/documentation area/nodeup area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants