Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComputeBackendService doesn't acquire IAP configurations and defaults to OFF #304

Closed
tonybenchsci opened this issue Nov 11, 2020 · 16 comments
Labels
bug Something isn't working

Comments

@tonybenchsci
Copy link

Describe the bug (This is potentially also a feature request)
We have previously created a ComputeBackendService using Deployment Manager (DM), and then manually turned on Identity-Aware Proxy (IAP), which autogenerates the OAuth2.0 Client ID/secret and attaches it to the ComputeBackendService. We're migrating our GLB resources from DM to KCC, so when the new definition was applied and shows UpToDate, we expected spec.iap to default to the existing GCP configuration. However, the actual behaviour was that because we left spec.iap empty, it defaulted to not populating spec.iap and turned IAP off for the ComputeBackendService leading to downtime. And with the k8s reconciliation loop, manually turning it on gets corrected automatically again.

As a workaround, we went into GCP console page for API Services & Credentials / OAuth 2.0 Client IDs. Copied the ClientID/secret which was generated when we manually flipped the IAP slider to ON. And then added them to spec.iap, and re-applied which works. Using our Infra-as-Code workflow, this now requires: PR1 to create the ComputeBackendService, manually flip IAP on to create OAuth2.0 clientId/secret, PR2 to add spec.iap.

Alternatively, we could have workflow to: pre-create the OAuth2.0 ClientID (assuming there's no special magic behind the scenes that binds IAP created ClientIDs to these IAP-<service-name> IDs), and PR1 create the full ComputeBackendService with spec.iap. This would be a CRD feature request if we cannot simply have spec.iap: true that does the ClientID generation and binding for us.

In order of awesomeness, the fix would be:

  1. spec.iap: true magic that creates Client and binding like the UI experience
  2. CRD for OAuth2.0 Client to pre-create and reference in spec.iap.oauth2ClientRef
  3. Acquire from GCP and not default to OFF

ConfigConnector Version
1.26.0

YAML snippets:

apiVersion: compute.cnrm.cloud.google.com/v1beta1
kind: ComputeBackendService
metadata:
  name: glb-gke-service--site-reagent
spec:
  iap:  # Had to be added for it to work!
    oauth2ClientId: <REDACTED>.apps.googleusercontent.com
    oauth2ClientSecret:
      valueFrom:
        secretKeyRef:
          key: glb-gke-service--site-reagent
          name: iap-client-secrets
  backend:
  - balancingMode: RATE
    maxRatePerEndpoint: 5
    group:
      networkEndpointGroupRef:
        external: projects/b6i-stg/zones/us-east4-a/networkEndpointGroups/<REDACTED>
  - balancingMode: RATE
    maxRatePerEndpoint: 5
    group:
      networkEndpointGroupRef:
        external: projects/b6i-stg/zones/us-east4-b/networkEndpointGroups/<REDACTED>
  - balancingMode: RATE
    maxRatePerEndpoint: 5
    group:
      networkEndpointGroupRef:
        external: projects/b6i-stg/zones/us-east4-c/networkEndpointGroups/<REDACTED>
@tonybenchsci tonybenchsci added the bug Something isn't working label Nov 11, 2020
@jcanseco
Copy link
Member

Hey @tonybenchsci, we're sorry to hear about your downtime. Please give us some time to look into this and let you know what we find.

@jcanseco
Copy link
Member

Hi @tonybenchsci, I put in some time to investigate this issue, and it seems like there may be a few things at play here.

First off: what version of GKE are you running KCC on? And when you say you left spec.iap "empty", do you mean you left the field unspecified? Or, did you specify it, but set all its subfields as empty strings?

Reason I'm asking is: KCC 1.24+ automatically enables server-side apply for resources when in GKE 1.16+. When server-side apply is enabled, KCC only sets the fields that are specified in the YAML that you apply. All other fields are allowed to "float" (i.e. reflect whatever is on the GCP API).

Therefore, as long as server-side apply is enabled and the YAML you apply does not have a spec.iap, KCC shouldn't override the existing spec.iap value. I did some testing on my side where I used a ComputeBackendService with no spec.iap to acquire an existing backend service with IAP enabled, and I was able to acquire the backend service without disabling IAP.

Aside 1: There is a bug where KCC isn't able to update the ComputeBackendService state on the K8s API server due to spec.oauth2ClientSecret being required and KCC not being able to retrieve the live value from GCP, but that's a separate bug.

Aside 2: I would recommend always using server-side apply in any case when applying K8s resources (i.e. `kubectl apply --server-side) to avoid edge cases that arise when mixing client-side apply with server-side apply. Also, using server-side apply always would be good practice since server-side apply is the default apply behavior in K8s 1.18+. Note that you may need to configure your GitOps pipeline to use server-side apply when applying K8s resources.

@jcanseco
Copy link
Member

@tonybenchsci, after some internal discussion, we believe it may actually be best to treat the ability to acquire a resource without specifying a required "unreadable" field as a bug. By "unreadable", we mean fields whose live value cannot be read from the GCP API (e.g. spec.oauth2ClientSecret). The idea right now is to display an error of some kind when this is attempted. We believe this may be for the best since the behavior would be too ambiguous across KCC resources otherwise. It so happens that in this case, IAP is not disabled even if only the spec.oauth2ClientId is specified after acquisition, but a different behavior could easily be the case for another resource. We believe we can eliminate this ambiguity by just error-ing and error-ing early.

I wanted to notify you so that you don't end up building logic around our current buggy behavior.

So with that said, I think your original top suggestion may be the right way to go here:

spec.iap: true magic that creates Client and binding like the UI experience

However, we believe such behavior would fit best in a higher-level component that manages ComputeBackendService (kinda like how ReplicaSetss manage Pods). This would be consistent with the GCP console's "auto-generation" logic which is a higher-level component that sits on top of the GCE API.

There are other ideas for higher-level components that have come up in the past, so we'll take this one into consideration as well when we investigate what the right approach would be.

In the meantime, I regret to say that you will have to rely on your workaround for now.

@tonybenchsci
Copy link
Author

tonybenchsci commented Nov 21, 2020

@jcanseco Thanks for the detailed explanations; very insightful for us. Yes, the workaround is fine as these changes don't happen very frequently, having two PRs via our GitOps pipeline is fine in the short-term.

To answer your questions:

  • we are using KCC 1.26.0 on GKE 1.15.12-gke.20 (Stable release channel) using the manual install/upgrade method
  • we did not specify spec.iap at all; as in no key iap

I'm actually more intrigued by server side apply actually. Are you saying that when GKE goes to 1.16, server-side apply will be enabled (and used) by default? Or do we have to explicitly add --server-side in all our kubectl apply commands in our CICD pipelines (we use Bazel btw for kustomize+kubectl toolchaning) until GKE 1.18 where it becomes implicit.

Also, is it fully backwards compatible in the sense that we won't see any unexpected behaviour between client-side apply and server-side apply?

@jcanseco
Copy link
Member

Hey @tonybenchsci, no problem, and thanks for confirming that the workaround is fine for you in the short-term.

To answer your questions [...]

Makes sense, which means you're on the old client-side apply behavior where KCC continuously updates the GCP resource to match the KCC resource in the K8s API server, regardless if some of those fields were not actually specified by you and were defaulted by KCC/GCP.

This explains why KCC disabled your IAP as well since KCC tries to update your GCP resource with an empty spec.iap.

I'm actually more intrigued by server side apply actually [...]

Server-side apply (SSA) is a K8s feature made available in GKE 1.16+. You need to enable server-side apply for a resource to use the feature for that resource, but KCC 1.24+ enables SSA for all KCC resources automatically for you. In GKE 1.18+, server-side apply is enabled for all resources by default.

Do we have to explicitly add --server-side in all our kubectl apply commands in our CICD pipelines
Also, is it fully backwards compatible in the sense that we won't see any unexpected behaviour between client-side apply and server-side apply?

I'll have to loop back in with you for these questions, since I'm unfortunately not the subject-matter expert on SSA on our team, and that person is currently on vacation.

My understanding though is that there shouldn't be any issues with continuing to use client-side apply, which is what many of our customers do today with no issues.

@jcanseco
Copy link
Member

jcanseco commented Nov 24, 2020

@xiaobaitusi do you think you could look into @tonybenchsci's questions about SSA (the ones quoted below)?

Do we have to explicitly add --server-side in all our kubectl apply commands in our CICD pipelines

Also, is it fully backwards compatible in the sense that we won't see any unexpected behaviour between client-side apply and server-side apply?

@tonybenchsci
Copy link
Author

tonybenchsci commented Feb 4, 2021

@jcanseco We're now on KCC 1.34.0 and GKE (master 1.16.15-gke.6000, nodes still 1.15.12-gke.6002), and not seeing KCC try to flip IAP back to off anymore if we don't specify spec.iap at all after manually turning it on in the GCP console. However, the behaviour has changed from KCC overruling GCP to KCC now forever stuck in Update in progress until we get KCC and GCP in sync to UpToDate via either: turning IAP off in the console OR our re-capture workaround to have IAP on.

edit: It seems like the Update in progress without spec.iap is associated with "error with update call to API server: ComputeBackendService.compute.cnrm.cloud.google.com \"glb-gke-service--blah\" is invalid: spec.iap.oauth2ClientSecret: Required value"

This must be related to

KCC 1.24+ automatically enables server-side apply for resources when in GKE 1.16+. 
When server-side apply is enabled, KCC only sets the fields that are specified in the YAML that you apply. 
All other fields are allowed to "float" (i.e. reflect whatever is on the GCP API).

Clarifying question to: You need to enable server-side apply for a resource to use the feature for that resource, but KCC 1.24+ enables SSA for all KCC resources automatically for you....
We still only use kubectl apply against the KCC cluster, without --server-side, but you're saying that the flag is implicit now? The part about "need to enable" only applies to version below 1.24 right?

@xiaobaitusi
Copy link
Contributor

Hi @tonybenchsci, thanks for circling back with your new finding.

It seems like the Update in progress without spec.iap is associated with "error with update call to API server: ComputeBackendService.compute.cnrm.cloud.google.com "glb-gke-service--blah" is invalid: spec.iap.oauth2ClientSecret: Required value"

Here is my theory about what happens here: KCC sees iap field as a whole is not specified and decides to not touch the current values in GCP, reads the live values and then writes back to the CR object. However, it seems that KCC can only read oauth2ClientId but not oauth2ClientSecret back from API, it will try to write {iap: {oauth2ClientId: "someID"}} to the k8s API server but gets rejected since oauth2ClientSecret field is a required field inside of `iap'.

At this point, unfortunately you have to keep using your re-capture workaround to have IAP on. I'll bring it to our team for a further discussion.

@tonybenchsci
Copy link
Author

tonybenchsci commented Mar 25, 2021

@xiaobaitusi @jcanseco Thank you for the support so far, and our team is happy to see that #325 has been fulfilled as part of https://github.com/GoogleCloudPlatform/k8s-config-connector/releases/tag/v1.43.0

Now what would be Amazing (and close this issue) is:

  • Update ComputeBackendService to have a spec.iap.IAPIdentityAwareProxyClientRef which grabs the ID and secret
  • Provide/explain a way to bind members to role IAP-secured Web App User against ComputeBackendServices (other than using the GCP UI Console)
  • Confirmation that pre-creating OAuth 2.0 Client IDs of format IAP-{ComputeBackendService_NAME} does not conflict with (is equivalent to) flipping the IAP button to "ON" on in the GCP UI.

@tonybenchsci
Copy link
Author

Any updates to my questions above @jcanseco @xiaobaitusi ?

@maqiuyujoyce
Copy link
Collaborator

Hi @tonybenchsci , thank you for your follow up! We are actively working on it and hopefully have a fix for the issue you mentioned in this comment out soon. Will post updates when we have new information.

@jcanseco
Copy link
Member

@maqiuyujoyce I believe @tonybenchsci is talking about a different issue (namely his question here).

Hello @tonybenchsci, let's continue that conversation in the other thread here. I put up a comment there in response to your question.

@tonybenchsci
Copy link
Author

@jcanseco @maqiuyujoyce looks like 1.47 fixes this. What is the new behaviour?

@maqiuyujoyce
Copy link
Collaborator

Hi @tonybenchsci , with version 1.47.0, you'll be able to acquire a Compute backend service which has IAP enabled. There is no field that explicitly represents the enablement of IAP in ComputeBackendService, though the enablement is implied by the existence of fields spec.iap.oauth2ClientId and spec.iap.oauth2ClientSecretSha256 after acquisition.

@tonybenchsci
Copy link
Author

tonybenchsci commented Apr 28, 2021

Thanks @maqiuyujoyce, do you mean that fields spec.iap.oauth2ClientId and spec.iap.oauth2ClientSecretSha256 are defaulted to GCP values, and don't have to be user specified anymore? This is the 3rd solution in my original comment.

Otherwise, the ability to acquire was already functional before 1.47.0, because we just needed to provide the GCP values of those two fields to the manifest, so that KCC didn't turn it back off again.

@maqiuyujoyce
Copy link
Collaborator

Hi @tonybenchsci , thanks for the followup. Yes, you are right. I should have clarified that - fields spec.iap.oauth2ClientId and spec.iap.oauth2ClientSecretSha256 can now be defaulted to GCP values, and don't have to be user specified anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants