Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS: Helm Pull on eks.HelmChart does not work for non-ECR OCI Charts. #24710

Closed
elamaran11 opened this issue Mar 21, 2023 · 7 comments · Fixed by #25237
Closed

EKS: Helm Pull on eks.HelmChart does not work for non-ECR OCI Charts. #24710

elamaran11 opened this issue Mar 21, 2023 · 7 comments · Fixed by #25237
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@elamaran11
Copy link

Describe the bug

We are trying to install an OCI helm chart from ghcr.io (public chart) for Grafana operator using CDK and it consistently fails, while manual installation succeeds with helm (no authn necessary).

CDK Install using:

eks.HelmChart with 
    name: 'grafana-operator',
    chart: 'grafana-operator',
    namespace: 'grafana-operator',
    repository: 'oci://ghcr.io/grafana-operator/helm-charts/grafana-operator',
    version: 'v5.0.0-rc0',
    release: 'grafana-operator'

Helm install manual (succeeds):

helm upgrade grafana-operator ./grafana-operator --install --repo oci://ghcr.io/grafana-operator/helm-charts/grafana-operator --version v5.0.0-rc0

Error:

Error: b'Release "grafana-operator" does not exist. Installing it now.\nError: path "/tmp/tmpb1ykx
ljl/grafana-operator" not found\n'

Looks like the bug is in return os.path.join(tmpdir, repository.rpartition('/')[-1]) the parameter for the chart where CDK passes tmp dir and chart name (we used ./grafana-operator).

Expected Behavior

Successful installation of non-ECR OCI Helm charts using below

eks.HelmChart with 
    name: 'grafana-operator',
    chart: 'grafana-operator',
    namespace: 'grafana-operator',
    repository: 'oci://ghcr.io/grafana-operator/helm-charts/grafana-operator',
    version: 'v5.0.0-rc0',
    release: 'grafana-operator'

Current Behavior

Error:

Error: b'Release "grafana-operator" does not exist. Installing it now.\nError: path "/tmp/tmpb1ykx
ljl/grafana-operator" not found\n'

Looks like the bug is in the parameter for the chart where CDK passes tmp dir and chart name (we used ./grafana-operator) .
The issue is in return os.path.join(tmpdir, repository.rpartition('/')[-1])

Reproduction Steps

Please run the cdk script at https://github.com/aws-samples/containers-blog-maelstrom/blob/main/grafana-operator-AMG/bin/grafana-operator-amg.ts uncommenting line 26 // new GrafanaOperatorHelmAddon(), and you will see the Grafana Operator installation failing.

Possible Solution

The issue is in return os.path.join(tmpdir, repository.rpartition('/')[-1]) It should return right first parameter instead of tmpdir.

Additional Information/Context

No response

CDK CLI Version

2.67

Framework Version

No response

Node.js Version

v19.6.0

OS

Mac/Ubuntu

Language

Typescript

Language Version

No response

Other information

No response

@elamaran11 elamaran11 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 21, 2023
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Mar 21, 2023
@pahud
Copy link
Contributor

pahud commented Mar 21, 2023

According to this:

logger.error("OCI repository format not recognized, falling back to helm pull")
cmnd = ['helm', 'pull', repository, '--version', version, '--untar']

If oci:// prefix comes with non-ecr-public or non-ecr-private repos, it simply run helm pull followed by helm upgrade, rather than helm upgrade against the oci:// URL directly:

helm upgrade grafana-operator ./grafana-operator --install --repo oci://ghcr.io/grafana-operator/helm-charts/grafana-operator --version v5.0.0-rc0

I'm making it p2 bug. Any PR contributions is highly welcome and appreciated.

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Mar 21, 2023
@shapirov103
Copy link

shapirov103 commented Mar 21, 2023

@pahud helm pull works fine in the lambda function code, the following helm upgrade however, fails with directory not found.

I was able to reproduce this bug and one potential root cause is that the code allocates a temporary directory variable within an if scope here, but once it gets out of scope (before helm upgrade) it is cleaned up automatically.

 if repository is not None and repository.startswith('oci://'):
            tmpdir = tempfile.TemporaryDirectory()
            chart_dir = get_chart_from_oci(tmpdir.name, repository, version)
            chart = chart_dir  
            # End of scope for tmpdir forcing runtime to clean it up - it depends on the python runtime though

# tmpdir  is gone already, but the passed chart variable still contains the path to the clean up dir
helm('upgrade', release, chart, repository, values_file, namespace, version, wait, timeout, create_namespace)

@pahud
Copy link
Contributor

pahud commented Mar 21, 2023

@shapirov103 Thank you for your valuable insight. This helps a lot!

In this case we probably should consider to use mkstemp() and mkdtemp() instead followed by manual cleanup as the doc described here or just directly helm upgrade against the oci:// URL instead?

@shapirov103
Copy link

@pahud

From the performance perspective, if we already pulled the chart, it makes sense to reuse it without an extra lookup, so I would say approach 1 is beneficial. mkdtemp() will work, or just scoping the tmpdir variable with with - I can see that it may need slight refactoring of that code though.

@elamaran11
Copy link
Author

@pahud Do we know when this will be implemented.

@pahud
Copy link
Contributor

pahud commented Apr 18, 2023

@elamaran11 Unfortunately this bug is currently labeled as p2 which means the core team maintainers will not be able to address it. But we will be glad to review any PRs from the community to get it fixed.

@mergify mergify bot closed this as completed in #25237 Apr 24, 2023
mergify bot pushed a commit that referenced this issue Apr 24, 2023
This fixes an issue that occurs when installing Helm charts from OCI repositories other than ECR (e.g. `oci://ghcr.io/grafana-operator/helm-charts/grafana-operator`).

The issue occurs because `subprocess.check_output` is called with `shell=True`, which only invokes the first item of the passed cmnd sequence. So instead of `helm pull $REPO --version $VERSION --untar`, only `helm` is executed, which results in the following output (also observed in the Lambda logs):

<details>
<summary>Output</summary>

```
The Kubernetes package manager

Common actions for Helm:

- helm search:    search for charts
- helm pull:      download a chart to your local directory to view
- helm install:   upload the chart to Kubernetes
- helm list:      list releases of charts

Environment variables:

| Name                               | Description                                                                                       |
|------------------------------------|---------------------------------------------------------------------------------------------------|
| $HELM_CACHE_HOME                   | set an alternative location for storing cached files.                                             |
| $HELM_CONFIG_HOME                  | set an alternative location for storing Helm configuration.                                       |
| $HELM_DATA_HOME                    | set an alternative location for storing Helm data.                                                |
| $HELM_DEBUG                        | indicate whether or not Helm is running in Debug mode                                             |
| $HELM_DRIVER                       | set the backend storage driver. Values are: configmap, secret, memory, sql.                       |
| $HELM_DRIVER_SQL_CONNECTION_STRING | set the connection string the SQL storage driver should use.                                      |
| $HELM_MAX_HISTORY                  | set the maximum number of helm release history.                                                   |
| $HELM_NAMESPACE                    | set the namespace used for the helm operations.                                                   |
| $HELM_NO_PLUGINS                   | disable plugins. Set HELM_NO_PLUGINS=1 to disable plugins.                                        |
| $HELM_PLUGINS                      | set the path to the plugins directory                                                             |
| $HELM_REGISTRY_CONFIG              | set the path to the registry config file.                                                         |
| $HELM_REPOSITORY_CACHE             | set the path to the repository cache directory                                                    |
| $HELM_REPOSITORY_CONFIG            | set the path to the repositories file.                                                            |
| $KUBECONFIG                        | set an alternative Kubernetes configuration file (default "~/.kube/config")                       |
| $HELM_KUBEAPISERVER                | set the Kubernetes API Server Endpoint for authentication                                         |
| $HELM_KUBECAFILE                   | set the Kubernetes certificate authority file.                                                    |
| $HELM_KUBEASGROUPS                 | set the Groups to use for impersonation using a comma-separated list.                             |
| $HELM_KUBEASUSER                   | set the Username to impersonate for the operation.                                                |
| $HELM_KUBECONTEXT                  | set the name of the kubeconfig context.                                                           |
| $HELM_KUBETOKEN                    | set the Bearer KubeToken used for authentication.                                                 |
| $HELM_KUBEINSECURE_SKIP_TLS_VERIFY | indicate if the Kubernetes API server's certificate validation should be skipped (insecure)       |
| $HELM_KUBETLS_SERVER_NAME          | set the server name used to validate the Kubernetes API server certificate                        |
| $HELM_BURST_LIMIT                  | set the default burst limit in the case the server contains many CRDs (default 100, -1 to disable)|

Helm stores cache, configuration, and data based on the following configuration order:

- If a HELM_*_HOME environment variable is set, it will be used
- Otherwise, on systems supporting the XDG base directory specification, the XDG variables will be used
- When no other location is set a default location will be used based on the operating system

By default, the default directories depend on the Operating System. The defaults are listed below:

| Operating System | Cache Path                | Configuration Path             | Data Path               |
|------------------|---------------------------|--------------------------------|-------------------------|
| Linux            | $HOME/.cache/helm         | $HOME/.config/helm             | $HOME/.local/share/helm |
| macOS            | $HOME/Library/Caches/helm | $HOME/Library/Preferences/helm | $HOME/Library/helm      |
| Windows          | %TEMP%\helm               | %APPDATA%\helm                 | %APPDATA%\helm          |

Usage:
  helm [command]

Available Commands:
  completion  generate autocompletion scripts for the specified shell
  create      create a new chart with the given name
  dependency  manage a chart's dependencies
  env         helm client environment information
  get         download extended information of a named release
  help        Help about any command
  history     fetch release history
  install     install a chart
  lint        examine a chart for possible issues
  list        list releases
  package     package a chart directory into a chart archive
  plugin      install, list, or uninstall Helm plugins
  pull        download a chart from a repository and (optionally) unpack it in local directory
  push        push a chart to remote
  registry    login to or logout from a registry
  repo        add, list, remove, update, and index chart repositories
  rollback    roll back a release to a previous revision
  search      search for a keyword in charts
  show        show information of a chart
  status      display the status of the named release
  template    locally render templates
  test        run tests for a release
  uninstall   uninstall a release
  upgrade     upgrade a release
  verify      verify that a chart at the given path has been signed and is valid
  version     print the client version information

Flags:
      --burst-limit int                 client-side default throttling limit (default 100)
      --debug                           enable verbose output
  -h, --help                            help for helm
      --kube-apiserver string           the address and the port for the Kubernetes API server
      --kube-as-group stringArray       group to impersonate for the operation, this flag can be repeated to specify multiple groups.
      --kube-as-user string             username to impersonate for the operation
      --kube-ca-file string             the certificate authority file for the Kubernetes API server connection
      --kube-context string             name of the kubeconfig context to use
      --kube-insecure-skip-tls-verify   if true, the Kubernetes API server's certificate will not be checked for validity. This will make your HTTPS connections insecure
      --kube-tls-server-name string     server name to use for Kubernetes API server certificate validation. If it is not provided, the hostname used to contact the server is used
      --kube-token string               bearer token used for authentication
      --kubeconfig string               path to the kubeconfig file
  -n, --namespace string                namespace scope for this request
      --registry-config string          path to the registry config file (default "/Users/fabian/Library/Preferences/helm/registry/config.json")
      --repository-cache string         path to the file containing cached repository indexes (default "/Users/fabian/Library/Caches/helm/repository")
      --repository-config string        path to the file containing repository names and URLs (default "/Users/fabian/Library/Preferences/helm/repositories.yaml")

Use "helm [command] --help" for more information about a command.
```

</details>

From the `subprocess` logs:

> On POSIX with shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. **If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself.** 

See https://docs.python.org/3/library/subprocess.html#popen-constructor

Merging the `helm` command and its arguments into the first item of the `cmnd` list fixes the issue.

To quickly verify the fix, this code can be added to the file:

```python
logging.basicConfig(level=logging.DEBUG)
tmpdir = tempfile.TemporaryDirectory()
chart_dir = get_chart_from_oci(tmpdir.name, "oci://ghcr.io/grafana-operator/helm-charts/grafana-operator", "v5.0.0-rc0")
```

Without the fix, running the file with Python should then result in the behaviour described above. With the fix the chart should be pulled correctly into the temporary directory.

```sh
python packages/aws-cdk-lib/aws-eks/lib/kubectl-handler/helm/__init__.py
```

Closes #24710 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants