Created EKS Addon does not get saved to state if it does not become active #4759

flostadler · 2024-11-12T15:46:39Z

Describe what happened

When creating an EKS Addon, the provider will first send the CreateAddon API call to AWS and then wait for the addon to become active.
Some addons, like corends, take a longer time to become active and might hit wait timeouts.
If the resource creation fails while waiting for the Addon to become active, the resource isn't saved to state.

Re-running pulumi up is now guaranteed to fail because Pulumi will wants to create the addon, even though it already exists in the cluster.

As a workaround, users either need to delete the addon from the cluster manually or import the Addon into Pulumi state.

Sample program

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as awsx from "@pulumi/awsx";
import * as eks from "@pulumi/eks";

// Grab some values from the Pulumi configuration (or use default values)
const config = new pulumi.Config();
const vpcNetworkCidr = config.get("vpcNetworkCidr") || "10.0.0.0/16";

const env = "aws-addon-bug"

// Create a new VPC
const eksVpc = new awsx.ec2.Vpc("eks-vpc", {
    enableDnsHostnames: true,
    cidrBlock: vpcNetworkCidr,
});

const instanceRole = new aws.iam.Role('testrole', {
    assumeRolePolicy: JSON.stringify({
        Version: '2012-10-17',
        Statement: [
            {
                Effect: 'Allow',
                Principal: {
                    Service: 'ec2.amazonaws.com',
                },
                Action: 'sts:AssumeRole',
            },
        ],
    }),
    managedPolicyArns: [
        'arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy',
        'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly',
        'arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy',
    ],
})

const eksCluster = new eks.Cluster(`${env}-cluster`, {
    vpcId: eksVpc.vpcId,
    authenticationMode: eks.AuthenticationMode.Api,
    corednsAddonOptions: {
        enabled: false,
    },
    createOidcProvider: true,
    enabledClusterLogTypes: ['api', 'audit', 'authenticator'],
    fargate: false,
    instanceRole: instanceRole,
    kubeProxyAddonOptions: {
        enabled: false,
    },
    nodeAssociatePublicIpAddress: false,
    privateSubnetIds: eksVpc.privateSubnetIds,
    publicSubnetIds: eksVpc.publicSubnetIds,

    skipDefaultSecurityGroups: false,
    skipDefaultNodeGroup: true,
    useDefaultVpcCni: false,
    version: '1.25',
});

const mng = new eks.ManagedNodeGroup(`${env}-managed-ng`, {
    cluster: eksCluster,
    instanceTypes: ['t3.medium'],
    scalingConfig: {
        desiredSize: 1,
        maxSize: 1,
        minSize: 1,
    },
    nodeRole: instanceRole,
});

const addonVersion = aws.eks.getAddonVersionOutput({
    addonName: 'coredns',
    kubernetesVersion: eksCluster.eksCluster.version,
    mostRecent: true,
}).version;

// takes ~ 15 minutes to create
new aws.eks.Addon(`${env}-cluster-coredns`, {
    clusterName: eksCluster.eksCluster.name,
    addonName: 'coredns',
    addonVersion: addonVersion,
    resolveConflictsOnCreate: 'OVERWRITE',
    resolveConflictsOnUpdate: 'OVERWRITE',
}, { customTimeouts: { create: '2m', update: '2m' }, dependsOn: [mng] });

// Export some values for use elsewhere
export const kubeconfig = eksCluster.kubeconfig;
export const vpcId = eksVpc.vpcId;

Log output

n/a

Affected Resource(s)

aws.eks.Addon

Output of `pulumi about`

n/a

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

The text was updated successfully, but these errors were encountered:

flostadler · 2024-12-09T08:38:27Z

While working on pulumi/pulumi-eks#1509 I ran into this, but it occurred after updating pulumi-aws from v6.47.0 to v6.63.0: pulumi/pulumi-eks#1519 (comment)

This makes me believe that this is actually a regression. I'll do a bisect over the versions to confirm my suspicion

flostadler · 2024-12-09T09:48:08Z

I bisected the versions and it's indeed a regression that was introduced in v6.51.0.

That one includes multiple upstream upgrades and a bridge upgrade. Looking into these to find the root cause

flostadler · 2024-12-09T09:55:46Z

The upstream upgrades v5.64.0 and v5.65.0 do not include any suspect changes (no changes to any EKS resources). It doesn't repro in Terraform either.
So this is seems to be a regression on the pulumi side

flostadler · 2024-12-09T10:03:40Z

It's this: pulumi/pulumi-terraform-bridge#2696

We're not returning the partial state to the engine for init errors since enabling PRC. That's why Pulumi doesn't save the Addon to state when it's failing to initialize.

In the SDKv2 bridge under PlanResourceChange we are not passing any state we receive during TF Apply back to the engine if we also received an error. This causes us to incorrectly miss any resources which were created but encountered errors during the creation process. The engine should see these as `ResourceInitError`, which allows the engine to attempt to update the partially created resource on the next `up`. This PR fixes the issue by passing the state down to the engine in the case when we receive an error and a non-nil state from TF during Apply. related to pulumi/pulumi-gcp#2700 related to pulumi/pulumi-aws#4759 fixes #2696

In the SDKv2 bridge under PlanResourceChange we are not passing any state we receive during TF Apply back to the engine if we also received an error. This causes us to incorrectly miss any resources which were created but encountered errors during the creation process. The engine should see these as ResourceInitError, which allows the engine to attempt to update the partially created resource on the next up. This PR fixes the issue by passing the state down to the engine in the case when we receive an error and a non-nil state from TF during Apply. This is the second attempt at this. The first was #2695 but was reverted because it caused a different panic: #2706. We added a regression test for that in #2710 The reason for that panic was that we were now creating a non-nil `InstanceState` with a nil `stateValue` which causes the `ID` function to panic. This PR fixes both issues by not allowing non-nil states with nil `stateValue`s and by preventing the panic in `ID`. There was also a bit of fun with go nil interfaces along the way, which is the reason why `ApplyResourceChange` now returns a `shim.InstanceState` interface instead of a `*v2InstanceState2`. Otherwise we end up creating a non-nil interface with a nil value. related to pulumi/pulumi-gcp#2700 related to pulumi/pulumi-aws#4759 fixes #2696

This PR was generated via $ upgrade-provider pulumi/pulumi-aws --kind=bridge. --- Upgrading pulumi-terraform-bridge from v3.96.0 to v3.97.1. **Manual Changes:** copied from #4898 Fixes #4759 Fixes #4894

pulumi-bot · 2024-12-13T18:07:29Z

This issue has been addressed in PR #4898 and shipped in release v6.65.0.

flostadler added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team service/eks EKS issues customer/feedback Feedback from customers and removed needs-triage Needs attention from the triage team labels Nov 12, 2024

t0yv0 self-assigned this Dec 2, 2024

mjeffryes assigned flostadler and unassigned t0yv0 Dec 3, 2024

mjeffryes added this to the 0.114 milestone Dec 3, 2024

flostadler mentioned this issue Dec 9, 2024

Add support for EKS Auto Mode pulumi/pulumi-eks#1519

Open

flostadler added the impact/regression Something that used to work, but is now broken label Dec 9, 2024

flostadler mentioned this issue Dec 9, 2024

The SDKv2 Bridge swallows resource init errors when run under PlanResourceChange pulumi/pulumi-terraform-bridge#2696

Closed

flostadler added blocked The issue cannot be resolved without 3rd party action. awaiting/bridge The issue cannot be resolved without action in pulumi-terraform-bridge. labels Dec 9, 2024

VenelinMartinov mentioned this issue Dec 9, 2024

Pass state back to the engine if Apply encountered an error pulumi/pulumi-terraform-bridge#2695

Merged

This was referenced Dec 10, 2024

Workflow failure: Upgrade bridge #4894

Closed

Upgrade pulumi-terraform-bridge to v3.97.0 #4898

Merged

flostadler closed this as completed in #4898 Dec 10, 2024

flostadler closed this as completed in 316ee6a Dec 10, 2024

pulumi-bot added the resolution/fixed This issue was fixed label Dec 10, 2024

VenelinMartinov mentioned this issue Dec 10, 2024

Pass state back to the engine if Apply encountered an error 2 pulumi/pulumi-terraform-bridge#2713

Merged

corymhall mentioned this issue Dec 11, 2024

Upgrade pulumi-terraform-bridge to v3.97.1 #4907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created EKS Addon does not get saved to state if it does not become active #4759

Created EKS Addon does not get saved to state if it does not become active #4759

flostadler commented Nov 12, 2024

flostadler commented Dec 9, 2024

flostadler commented Dec 9, 2024

flostadler commented Dec 9, 2024 •

edited

Loading

flostadler commented Dec 9, 2024 •

edited

Loading

pulumi-bot commented Dec 13, 2024

Created EKS Addon does not get saved to state if it does not become active #4759

Created EKS Addon does not get saved to state if it does not become active #4759

Comments

flostadler commented Nov 12, 2024

Describe what happened

Sample program

Log output

Affected Resource(s)

Output of pulumi about

Additional context

Contributing

flostadler commented Dec 9, 2024

flostadler commented Dec 9, 2024

flostadler commented Dec 9, 2024 • edited Loading

flostadler commented Dec 9, 2024 • edited Loading

pulumi-bot commented Dec 13, 2024

Output of `pulumi about`

flostadler commented Dec 9, 2024 •

edited

Loading

flostadler commented Dec 9, 2024 •

edited

Loading