Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1alpha2/v1alpha3 skew - Bastion does not have Public IP in AWSCluster status #2059

Closed
DheerajSShetty opened this issue Oct 21, 2020 · 14 comments · Fixed by #2074
Closed

v1alpha2/v1alpha3 skew - Bastion does not have Public IP in AWSCluster status #2059

DheerajSShetty opened this issue Oct 21, 2020 · 14 comments · Fixed by #2074
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.

Comments

@DheerajSShetty
Copy link

DheerajSShetty commented Oct 21, 2020

/kind bug

What steps did you take and what happened:

  1. Applied v1alpha3 webhooks and CRDs on a management cluster
  2. Created a cluster using v1alpha2 controllers CAPA 0.4.10 and CAPI 0.2.11 and CABPK 0.1.7
  3. Cluster creation was successful.
  4. Bastion was created in AWS, with Public IP.

What happened:
The AWSCluster status did not have the Public IP of the bastion. The instance state was pending.

What did you expect to happen:
Expected the Public IP of the bastion in the AWSCluster status.
Expected the instance state to be running

Anything else you would like to add:
CURLed to v1alpha2 endpoint of AWSCluster object, and patched --data '{"status":{"bastion":{"instanceState":"running"}}}', the data was not changed in AWSCluster,
CURLed the same data to v1alpha3 endpoint , and the instance state changed to running

Environment:

  • Cluster-api-provider-aws version: CAPA 0.4.10 and CAPI 0.2.11 and CABPK 0.1.7
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 21, 2020
@ncdc ncdc changed the title Bastion does not have Public IP in AWSCluster status v1alpha2/v1alpha3 skew - Bastion does not have Public IP in AWSCluster status Oct 21, 2020
@ncdc
Copy link
Contributor

ncdc commented Oct 21, 2020

@DheerajSShetty

The AWSCluster status did not have the Public IP of the bastion. The instance state was pending.

What command/API version did you use when you saw this? Could you please share the full AWSCluster yaml?

@ncdc
Copy link
Contributor

ncdc commented Oct 21, 2020

cc @sedefsavas

@DheerajSShetty
Copy link
Author

@DheerajSShetty

The AWSCluster status did not have the Public IP of the bastion. The instance state was pending.

What command/API version did you use when you saw this? Could you please share the full AWSCluster yaml?

Initially I did a kubectl get cluster-api, I saw the old clusters had BASTION IP

NAME                                                                    CLUSTER   READY   VPC                     BASTION IP
awscluster.infrastructure.cluster.x-k8s.io/01en1apmejp0ztqv6nte5msdfc             true    vpc-06098575e41666eda   3.1.201.82
awscluster.infrastructure.cluster.x-k8s.io/01en2a6h2ae7vm2nzt50gw5n8t             true    vpc-0012af8628a0283ce

I did a kubectl get awscluster.infrastructure.cluster.x-k8s.io/01en2a6h2ae7vm2nzt50gw5n8t which shows objects in v1alpha3 format, and kubectl get awscluster.v1alpha2.infrastructure.cluster.x-k8s.io/01en2a6h2ae7vm2nzt50gw5n8t

Attaching the awscluster yaml of v1alpha2 type.
v1alpha2-cluster.log

@ncdc
Copy link
Contributor

ncdc commented Oct 21, 2020

Can you share the conversion data instead of redacting it? Feel free to redact anything sensitive.

@DheerajSShetty
Copy link
Author

@ncdc
Copy link
Contributor

ncdc commented Oct 21, 2020

What APIVersion did you use in your client when creating 01en2a6h2ae7vm2nzt50gw5n8t - was that v1alpha2?

Can you gist the CAPA pod logs from when you created that cluster?

@DheerajSShetty
Copy link
Author

Yes , I used v1alpha2 client for cluster creation. Looks like the pod got restarted and I dont have the logs for cluster creation.
Meanwhile I can create another cluster and share the awscluster yaml and creation logs

@ncdc
Copy link
Contributor

ncdc commented Oct 22, 2020

Thanks. What image tag did you install for the webhooks deployment?

@sedefsavas
Copy link
Contributor

@DheerajSShetty Can you confirm if my understanding is correct?

The old cluster case (which shows public IP):
Created v1alpha2 object that is stored as v1alpha2, works after upgrading to v1alpha3.

Conversions happening: v1alpha2(first created and stored) --> v1alpha2(read by controller and processed)--> v1alpha3 (read by kubectl)
So v1alpha2-->v1alpha3 conversion works.

The new cluster case:
Creating v1alpha2 which is stored as v1alpha3, then used by the controller as v1alpha2 does not work.

Conversions happening: v1alpha2(first created ) --> v1alpha3 (stored) --> v1alpha2 (read from controller and processed) --> v1alpha3(stored again and read by kubectl)

I checked the conversion in both directions, they are copying the bastion instance exactly same way.

@DheerajSShetty
Copy link
Author

Thanks. What image tag did you install for the webhooks deployment?

I am using CAPA 0.6.0 webhook, CAPI 0.3.10 webhook

@DheerajSShetty
Copy link
Author

DheerajSShetty commented Oct 22, 2020

@DheerajSShetty Can you confirm if my understanding is correct?

The old cluster case (which shows public IP):
Created v1alpha2 object that is stored as v1alpha2, works after upgrading to v1alpha3.

Conversions happening: v1alpha2(first created and stored) --> v1alpha2(read by controller and processed)--> v1alpha3 (read by kubectl)
So v1alpha2-->v1alpha3 conversion works.

The new cluster case:
Creating v1alpha2 which is stored as v1alpha3, then used by the controller as v1alpha2 does not work.

Conversions happening: v1alpha2(first created ) --> v1alpha3 (stored) --> v1alpha2 (read from controller and processed) --> v1alpha3(stored again and read by kubectl)

I checked the conversion in both directions, they are copying the bastion instance exactly same way.

That is right.

I tried curling to apiserver endpoint by running a kubectl proxy, just to see if changing the value of instancestate works.

I tried curling on v1alpha2 type, this did not store the status as running

curl -k -v -XPATCH -H 'Accept: application/json' -H 'Content-Type: application/merge-patch+json'  http://localhost:8080/apis/infrastructure.cluster.x-k8s.io/v1alpha2/namespaces/01ejcndrmwt6ehnn9pqcrcwa66/awsclusters/01emqgatkh2g5tstc9gctkmtry/status --data '{"status":{"bastion":{"instanceState":"running"}}}'

This one stored the intancestate as running

curl -k -v -XPATCH -H 'Accept: application/json' -H 'Content-Type: application/merge-patch+json'  http://localhost:8080/apis/infrastructure.cluster.x-k8s.io/v1alpha3/namespaces/01ejcndrmwt6ehnn9pqcrcwa66/awsclusters/01emqgatkh2g5tstc9gctkmtry/status --data '{"status":{"bastion":{"instanceState":"running"}}}'

Is that a good way to test the conversion?

@ncdc
Copy link
Contributor

ncdc commented Oct 26, 2020

The root cause is that the Instance struct changed between v1alpha2 and v1alpha3, but we aren't correctly accounting for the differences when restoring from the v1alpha3 data annotation. We are blindly taking what was in the data annotation, which ignores all changes to .status.bastion when updating the AWSCluster.

We probably need to audit the entire conversion code logic...

We also need to update the FuzzTestFunc helper in CAPI to catch this, which it doesn't currently do.

@sedefsavas
Copy link
Contributor

/assign

@sedefsavas
Copy link
Contributor

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants