Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-eks] Cluster.addHelmChart no longer respecting repository field. #11477

Closed
ten-lac opened this issue Nov 15, 2020 · 18 comments
Closed

[aws-eks] Cluster.addHelmChart no longer respecting repository field. #11477

ten-lac opened this issue Nov 15, 2020 · 18 comments
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service guidance Question that needs advice or information.

Comments

@ten-lac
Copy link

ten-lac commented Nov 15, 2020

The HelmChartOptions interface for Cluster.addHelmChart allows for a repository field which allows the pointing of the repository to pull the helm chart from. This no longer works.

Reproduction Steps

        const prometheusHelmChart = cluster.addHelmChart('Prometheus', {
            chart: 'prometheus-operator',
            createNamespace: false,
            namespace: namespacePrometheusName,
            release: `prometheus-operator`,
            version: '9.3.2',
            repository: 'https://kubernetes-charts.storage.googleapis.com/',
            timeout: Duration.minutes(10),
            values: { ... }
       }

Also, the following CF stanza was generated

ControlPlaneEKSchartPrometheus72B62D62:
    Type: Custom::AWSCDK-EKS-HelmChart
    Properties:
      ServiceToken:
        Fn::GetAtt:
          - awscdkawseksKubectlProviderNestedStackawscdkawseksKubectlProviderNestedStackResourceA7AEBA6B
          - Outputs.TennaK8Sqa1awscdkawseksKubectlProviderframeworkonEvent689A9A95Arn
      ClusterName:
        Ref: ControlPlaneEKSC6E4919A
      RoleArn:
        Fn::GetAtt:
          - ControlPlaneEKSCreationRole58D1869A
          - Arn
      Release: prometheus-operator
      Chart: prometheus-operator
      Version: 9.3.2
      Wait: true
      Timeout: 300s
      Values: "<...>"
      Namespace: prometheus-system
      Repository: https://kubernetes-charts.storage.googleapis.com/
    DependsOn:
      - ControlPlaneEKSchartPrometheusEFSProvisionerEB6A15C9
      - ControlPlaneEKSKubectlReadyBarrier1C298D16
      - ControlPlaneEKSNodegroupManagedPrivateWorkerPoolc52xlarge118920201112FAFA9E33
      - ControlPlaneEKSNodegroupManagedPublicWorkerPoolc5xlarge118920201112CD94DE82
      - MonitoringNamespaceprometheussystemDockerRegistryGlobalprometheussystemDockerRegistryGlobalprometheussystemmanifest7E9EEA9B
      - MonitoringNamespaceprometheussystemprometheussystemnamespacemanifestB28933AF
      - NetworkVpcIGW6BEA7B02
      - NetworkVpcPrivateSubnet1DefaultRoute08635105
      - NetworkVpcPrivateSubnet1RouteTable7D7AA3CD
      - NetworkVpcPrivateSubnet1RouteTableAssociation327CA62F
      - NetworkVpcPrivateSubnet1Subnet6DD86AE6
      - NetworkVpcPrivateSubnet2DefaultRouteA15DC6D5
      - NetworkVpcPrivateSubnet2RouteTableC48862D1
      - NetworkVpcPrivateSubnet2RouteTableAssociation89A2F1E8
      - NetworkVpcPrivateSubnet2Subnet1BDBE877
      - NetworkVpcPrivateSubnet3DefaultRouteFE7FEBED
      - NetworkVpcPrivateSubnet3RouteTable7FC52A8D
      - NetworkVpcPrivateSubnet3RouteTableAssociation616B0E34
      - NetworkVpcPrivateSubnet3Subnet8ABFAF5C
      - NetworkVpcPublicSubnet1DefaultRoute31EC04EC
      - NetworkVpcPublicSubnet1EIPE0D52090
      - NetworkVpcPublicSubnet1NATGateway64781A21
      - NetworkVpcPublicSubnet1RouteTable30235CE2
      - NetworkVpcPublicSubnet1RouteTableAssociation643926C7
      - NetworkVpcPublicSubnet1Subnet36933139
      - NetworkVpcPublicSubnet2DefaultRoute0CF082AB
      - NetworkVpcPublicSubnet2EIP24F41572
      - NetworkVpcPublicSubnet2NATGateway42CB86F5
      - NetworkVpcPublicSubnet2RouteTable0FACEBB2
      - NetworkVpcPublicSubnet2RouteTableAssociationC662643B
      - NetworkVpcPublicSubnet2SubnetC427CCE0
      - NetworkVpcPublicSubnet3DefaultRoute320997B4
      - NetworkVpcPublicSubnet3EIP22F0C93C
      - NetworkVpcPublicSubnet3NATGateway5DD3AF93
      - NetworkVpcPublicSubnet3RouteTable4F517CA2
      - NetworkVpcPublicSubnet3RouteTableAssociationDBCF32A1
      - NetworkVpcPublicSubnet3Subnet4BBF7F47
      - NetworkVpc7FB7348F
      - NetworkVpcVPCGW8F3799B5
      - PermissionEKSClusterAdminRoleEA0AD2E4
      - PermissionEKSClusterReadonlyRoleFE7F2F3C
    UpdateReplacePolicy: Delete
    DeletionPolicy: Delete
    Metadata:
      aws:cdk:path: Tenna-K8S-qa1/ControlPlane/EKS/chart-Prometheus/Resource/Default

What did you expect to happen?

Expected Helm chart to install without error. I do not expect the chart to be pulled from charts.helm.sh repo when https://kubernetes-charts.storage.googleapis.com has been specified.

What actually happened?

Screen Shot 2020-11-14 at 7 21 16 PM

Environment

  • CDK CLI Version :
> cdk version
1.73.0 (build eb6f3a9)
  • Framework Version:
All @aws-cdk libraries at 1.73
  • Node.js Version:
> node -v
v12.18.2
  • OS :
macOS 10.15.7
  • Language (Version):
Javascript

Other


This is 🐛 Bug Report

@ten-lac ten-lac added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 15, 2020
@ten-lac
Copy link
Author

ten-lac commented Nov 16, 2020

I think I've identified a pattern where this is happening when I run multiple helm charts in a CDK. Is it possible the repository field is leaking between addHelmChart?

@SomayaB SomayaB changed the title [aws-eks] Cluster.addHelmChart no longer respecting repository field. [eks] Cluster.addHelmChart no longer respecting repository field. Nov 16, 2020
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Nov 16, 2020
@ten-lac
Copy link
Author

ten-lac commented Nov 17, 2020

I've been able to consistently generate this problem across different accounts. The error also doesn't always bubble up so neatly in Cloudformation. I've seen another case where the error is similar and points to the same line of code within the Python Lambda function. But the message from the Exception is empty.

@iliapolo iliapolo changed the title [eks] Cluster.addHelmChart no longer respecting repository field. [aws-eks] Cluster.addHelmChart no longer respecting repository field. Nov 23, 2020
@iliapolo
Copy link
Contributor

@ten-lac Taking your example, i'm unable to successfully deploy it even once.

I've dug around the logs and extracted the command that eventually gets executed:

helm upgrade prometheus-operator prometheus-operator --install --repo https://kubernetes-charts.storage.googleapis.com/ --version 9.3.2 --namespace default --timeout 600s --kubeconfig /tmp/kubeconfig

I've tried running this command locally against my EKS cluster:

helm upgrade prometheus-operator prometheus-operator --install --repo https://kubernetes-charts.storage.googleapis.com/ --version 9.3.2 --namespace default --timeout 600s --kubeconfig /tmp/kubeconfig                        [19:36:08]
Release "prometheus-operator" does not exist. Installing it now.
Error: failed to download "https://charts.helm.sh/stable/prometheus-operator-9.3.2.tgz" (hint: running `helm repo update` may help)

I got the same error. Whats happening is that because the chart isn't found in the specified repo, helm attempts to fetch it from the default one, making it seem as if it ignores the --repo argument.

Looking at the https://kubernetes-charts.storage.googleapis.com/ repository, i indeed do not see the prometheus-operator chart.

In contrast, if i run the exact same command, but with a chart that does exist in that repo:

helm upgrade acs-engine-autoscaler acs-engine-autoscaler --install --repo https://kubernetes-charts.storage.googleapis.com/ --version 2.1.2 --namespace default --timeout 600s --kubeconfig /tmp/kubeconfig                    [19:39:39]
Release "acs-engine-autoscaler" does not exist. Installing it now.
NAME: acs-engine-autoscaler
LAST DEPLOYED: Mon Nov 23 19:39:47 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
##############################################################################
####  ERROR: You are missing required values in the values.yaml file.     ####
##############################################################################

This deployment will be incomplete until all the required fields in the values.yaml file have been provided.

To update, run:

    helm upgrade acs-engine-autoscaler \
    --set acsenginecluster.resourcegroup=YOUR-RESOURCEGROUP-HERE,acsenginecluster.azurespappid=YOUR-AZURESPAPPID-HERE,acsenginecluster.azurespsecret=YOUR-AZURESPSECRET-HERE,acsenginecluster.azuresptenantid=YOUR-AZURESPTENANTID-HERE,acsenginecluster.kubeconfigprivatekey=YOUR-KUBECONFIGPRIVATEKEY-HERE,acsenginecluster.clientprivatekey=YOUR-CLIENTPRIVATEKEY-HERE stable/acs-engine-autoscaler

Now, the chart is recognized and fails because of missing input values, which is the expected result.

Long story short, it seems that the chart you are trying to install does not exist in the repository. Perhaps they removed it recently?

Can you try running the command i posted here locally and see what happens? If you're getting the same error, that means the problem is not with CDK, but most likely external.

Thanks

@iliapolo iliapolo added guidance Question that needs advice or information. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 23, 2020
@ten-lac
Copy link
Author

ten-lac commented Nov 23, 2020

@iliapolo, I wonder if there is a defect with helm. I get the same error you get.

The source of the helm chart is located here.

https://kubernetes-charts.storage.googleapis.com/prometheus-operator-9.3.2.tgz

I had the same thought as you that the chart was yanked. I also thought maybe I had the it works on my machine behavior caused by cache. So I tried wiping out my repo reference and established a new one and used helm to search it.

~/git/tenna-llc
> helm repo add kubernetes-charts1 https://kubernetes-charts.storage.googleapis.com/
"kubernetes-charts1" has been added to your repositories

~/git/tenna-llc took 25s 758ms
> helm search repo prometheus-operator
NAME                                  	CHART VERSION	APP VERSION	DESCRIPTION
kubernetes-charts1/prometheus-operator	9.3.2        	0.38.1     	DEPRECATED Provides easy monitoring definitions...

Version of Helm.

> helm version
version.BuildInfo{Version:"v3.3.3", GitCommit:"55e3ca022e40fe200fbc855938995f40b2a68ce0", GitTreeState:"dirty", GoVersion:"go1.15.2"}

@iliapolo
Copy link
Contributor

@ten-lac it looks like after you added the repo it does recognize the chart.

Were you able to get the command working eventually?

@ten-lac
Copy link
Author

ten-lac commented Nov 23, 2020

Yes, I am able to get it to work with helm locally. Is there a similar tactic to get this to work on cloud formation?

@iliapolo
Copy link
Contributor

@ten-lac can you share the exact commands you are running?

@ten-lac
Copy link
Author

ten-lac commented Nov 23, 2020

@iliapolo, looks like the https://kubernetes-charts.storage.googleapis.com/ is going through a deprecation phase? The index.yaml that is hosted there no longer mirrors that actual content.

But here is how I got it to work locally.

Works quickly

helm repo add stable https://charts.helm.sh/stable
helm template prometheus-operator stable/prometheus-operator --version 9.3.2 --namespace default --timeout 600s

Works slowly

helm template prometheus-operator prometheus-operator --repo https://charts.helm.sh/stable --version 9.3.2 --namespace default --timeout 600s

When running this with CDK, Cloudformation still ejects with error.

const prometheusHelmChart = controlPlane.cluster.addHelmChart('Prometheus', {
            cluster: controlPlane.cluster,
            chart: 'prometheus-operator',
            createNamespace: false,
            namespace: namespacePrometheusName,
            release: `prometheus-operator`,
            version: '9.3.2',
            repository: 'https://charts.helm.sh/stable/',
            timeout: Duration.minutes(5),
            wait: true,
            // https://github.com/helm/charts/blob/master/stable/prometheus-operator/values.yaml
            values: <....>
});
[ERROR] Exception: b'Release "prometheus-operator" does not exist. Installing it now.\n'Traceback (most recent call last):  File "/var/task/index.py", line 17, in handler    return helm_handler(event, context)  File "/var/task/helm/__init__.py", line 50, in helm_handler    helm('upgrade', release, chart, repository, values_file, namespace, version, wait, timeout, create_namespace)  File "/var/task/helm/__init__.py", line 94, in helm    raise Exception(output)

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 24, 2020
@iliapolo
Copy link
Contributor

@ten-lac there is definitely a lot of deprecation and relocations happening.

Specifically, the prometheus-operator chart has been deprecated, renamed to kube-prmoetheus-stack and relocated to the prometheus-community repository.

I can't say I fully understand whats happening, because it seems the prometheus-operator should still be available in the old repository, but changing it to https://prometheus-community.github.io/helm-charts does the trick.

Since this doesn't seem like a CDK issue, i'm gonna close this out, please let me know if you feel differently.

Thanks

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@iliapolo
Copy link
Contributor

@ten-lac Sorry, re-opening since I didn't carefully read your last message.

You mention:

When running this with CDK, Cloudformation still ejects with error.
[ERROR] Exception: b'Release "prometheus-operator" does not exist. Installing it now.\n'Traceback (most recent call last): File "/var/task/index.py", line 17, in handler return helm_handler(event, context) File "/var/task/helm/init.py", line 50, in helm_handler helm('upgrade', release, chart, repository, values_file, namespace, version, wait, timeout, create_namespace) File "/var/task/helm/init.py", line 94, in helm raise Exception(output)

Are you sure this is the same phenomena? I dont see the failed to download message now. Can you share the CloudWatch logs from the lambda? Or the complete error from CloudFormation?

@iliapolo iliapolo reopened this Nov 25, 2020
@ten-lac
Copy link
Author

ten-lac commented Nov 25, 2020

That error that I referenced is all I got from cloud watch for the ERROR entry. I am sure that run I did generated the error since I correlated the time. I hadn’t run any updates on that stack for days prior. Made this error easy to spot.

@iliapolo
Copy link
Contributor

What do you mean for the ERROR entry? a prefix search? Can you share the entire execution log?

@ten-lac
Copy link
Author

ten-lac commented Nov 26, 2020

@iliapolo , this is the log file that encapsulates the above error.

log-events-viewer-result.csv.zip

@iliapolo
Copy link
Contributor

iliapolo commented Dec 1, 2020

@ten-lac in the error log you attached I also see:

REPORT RequestId: 76ab1c51-de28-439f-b83d-a01c80dc310f	Duration: 47374.59 ms	Billed Duration: 47400 ms	Memory Size: 256 MB	Max Memory Used: 257 MB

Could you try giving the function more memory by using the aws lambda console and see if this is the root cause?

We currently don't support passing custom memory limits but this can be escape hatched.

@iliapolo iliapolo added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 1, 2020
@iliapolo
Copy link
Contributor

iliapolo commented Dec 1, 2020

@ten-lac I believe this is the same issue: #11787

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 2, 2020
@ten-lac
Copy link
Author

ten-lac commented Dec 3, 2020

I think these errors are coming from stable helm chart deprecations.

@ten-lac ten-lac closed this as completed Dec 3, 2020
@github-actions
Copy link

github-actions bot commented Dec 3, 2020

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

2 participants