Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preview fails when upgrading to version 4 even though server side apply is disabled #2521

Closed
doy-materialize opened this issue Aug 1, 2023 · 3 comments · Fixed by #2522
Closed
Assignees
Labels
customer/feedback Feedback from customers kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Milestone

Comments

@doy-materialize
Copy link

What happened?

when upgrading to pulumi-kubernetes 4.0.3, our previews in ci are now failing with:

error: Preview failed: resource mz-system/mzcloud-staging-eu-west-1-admission-webhook-642b8a34 was not successfully created by the Kubernetes API server : configmaps is forbidden: User "gha-readonly-oidc:GithubActionsSession-pulumi-preview" cannot create resource "configmaps" in API group "" in the namespace "mz-system"

even though our provider is passing enable_server_side_apply=False and we have also set kubernetes: enableServerSideApply: false in our stack configuration. i would have also expected #2419 to have fixed this even if we hadn't been setting the configuration ourselves, so i'm not sure what is wrong here.

Expected Behavior

pulumi preview should succeed without requiring create permissions

Steps to reproduce

our provider definition (python) looks like this:

self.provider = k8s.Provider(
    provider_name,
    kubeconfig=self.kubeconfig,
    enable_server_side_apply=False,
    opts=pulumi.ResourceOptions(
        parent=self,
        ignore_changes=["kubeconfig"],
    ),
)

and we have this in our Pulumi.staging.yaml:

kubernetes:enableServerSideApply: false

i'm not sure what else is relevant here, our pulumi codebase is fairly large

Output of pulumi about

CLI
Version      3.76.1
Go Version   go1.20.6
Go Compiler  gc

Plugins
NAME             VERSION
aws              5.42.0
cloudinit        1.3.0
command          0.7.2
crds             0.0.0
docker-buildkit  0.1.21
fivetran         0.1.12
frontegg         0.2.27
kubernetes       3.30.1
purrl            0.4.1
python           unknown
random           4.13.2
snowflake        0.30.2
tls              4.10.0

Host
OS       arch
Version
Arch     x86_64
This project is written in python: executable='/home/doy/materialize/cloud/venv/
bin/python3' version='3.11.3
'

Current Stack: materialize/mzcloud/doy2

(this is from running on my local machine - we are seeing this issue in a ci run which just bumps the kubernetes plugin version from 3.30.1 to 4.0.3)

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@doy-materialize doy-materialize added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Aug 1, 2023
@mnlumi mnlumi added customer/feedback Feedback from customers impact/regression Something that used to work, but is now broken p1 A bug severe enough to be the next item assigned to an engineer labels Aug 1, 2023
@guineveresaenger guineveresaenger removed the needs-triage Needs attention from the triage team label Aug 2, 2023
@guineveresaenger
Copy link
Contributor

Hi @doy-materialize - thank you for reporting this issue. The team will take a look!

@rquitales rquitales self-assigned this Aug 2, 2023
@rquitales
Copy link
Member

rquitales commented Aug 2, 2023

@doy-materialize I'm sorry that you're experiencing this inconvenience. To get more context on this issue, is the CI environment only doing previews and not attempting to actuate/create the resources on cluster? Would you be able to share the RBAC config for the service account used by CI for previews?

I'm still investigating things, but will document some things as I keep investigating.

  • I̶t̶ ̶s̶e̶e̶m̶s̶ ̶l̶i̶k̶e̶ ̶̶p̶u̶l̶u̶m̶i̶ ̶p̶r̶e̶v̶i̶e̶w̶̶ ̶w̶i̶l̶l̶ ̶a̶l̶w̶a̶y̶s̶ ̶d̶e̶f̶a̶u̶l̶t̶ ̶t̶o̶ ̶S̶S̶A̶ ̶a̶n̶d̶ ̶w̶i̶l̶l̶ ̶n̶o̶t̶ ̶u̶s̶e̶ ̶c̶l̶i̶e̶n̶t̶ ̶s̶i̶d̶e̶ ̶a̶p̶p̶l̶y̶ ̶e̶v̶e̶n̶ ̶i̶f̶ ̶̶e̶n̶a̶b̶l̶e̶_̶s̶e̶r̶v̶e̶r̶_̶s̶i̶d̶e̶_̶a̶p̶p̶l̶y̶=̶F̶a̶l̶s̶e̶̶ ̶i̶s̶ ̶s̶e̶t̶ (Not entirely accurate)
  • My initial suspicion is that the RBAC for previews in CI is insufficient as only interacting with configmaps is disallowed, but the auth error being returned is not being properly captured, so it doesn't fall back to client side apply for preview.
    if req.GetPreview() && apierrors.IsForbidden(awaitErr) {

@rquitales
Copy link
Member

rquitales commented Aug 2, 2023

So I was able to reproduce this issue, both in v3 and v4.
I can cause this error to surface in v3, with SSA or CSA by also specifying PULUMI_K8S_ENABLE_DRY_RUN=true when running pulumi commands.

v3 Codepath:
Without enabling dry-run with that env flag, this logic check causes skipRefresh to be true:

skipPreview := hasComputedValue(newInputs) ||
!k.supportsDryRun(newInputs.GroupVersionKind()) ||
isPatchURN(urn)

As such, the create function returns early here:

if req.GetPreview() && skipPreview {
logger.V(9).Infof("cannot preview Create(%v)", urn)
return &pulumirpc.CreateResponse{Id: "", Properties: req.GetProperties()}, nil
}
This prevents the provider from actually attempting to create/preview the resources against the live API server later in the Create function.


In v4, the logic check is slightly different:

skipPreview := hasComputedValue(newInputs) || !k.gvkExists(newInputs) || isPatchURN(urn)

Here, the default settings will cause the provider to not return early unlike in v3. The provider will proceed further and attempt to create against the live cluster:
initialized, awaitErr := await.Creation(config)
This is where the authentication error is returned since the create method for dry run will fail against the live cluster.


There are 2 potential approaches to fixing this, we can drop the error if it's an authentication error when attempting to do a create dry run. This essentially means preview will not show any error, and indicate to the user that they are able to create the resources even if they are unable to. This would be beneficial in this scenario since I assume CI only cares about preview, and not actuation of resources. This means that preview/diff will be against the resources stored in state.

Alternatively, we can wrap the error to be more descriptive, and return the error in preview as well, not just when running pulumi up, which is what is happening currently.

@mnlumi mnlumi removed p1 A bug severe enough to be the next item assigned to an engineer impact/regression Something that used to work, but is now broken labels Aug 2, 2023
@mikhailshilkov mikhailshilkov added this to the 0.93 milestone Aug 7, 2023
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Aug 11, 2023
@mikhailshilkov mikhailshilkov modified the milestones: 0.93, 0.92 Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer/feedback Feedback from customers kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants