Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Supporting more artifact retreval methods #817

Open
endzyme opened this issue Dec 2, 2022 · 18 comments
Open

Proposal: Supporting more artifact retreval methods #817

endzyme opened this issue Dec 2, 2022 · 18 comments
Labels
area/multi-index kind/proposal lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@endzyme
Copy link

endzyme commented Dec 2, 2022

Related to #684 and #816

Overview

I am looking for some initial implementation feedback on supporting private GitHub repository releases as well as a few other potential artifact repositories.

The intent here is to support some way of running custom commands, which return an artifact file via stdout. This could solve #684 by running something like gh api -H Accept:application/octet-stream /repos/kubernetes-sigs/krew/releases/assets/55894121 to download artifacts from a private GitHub repo's release assets. It could also allow people to have their on private krew indexes which rely upon local machine tooling to get the artifacts. I'm thinking example commands like aws s3 cp s3://my-fancy-private-bucket/my-cool-krew-plugin.tgz -. This type of support could separate the concerns of auth to the local machine running the installation.

I initially wanted to implement this as an S3 SDK scheme but decided it would be better to enable more options via running commands on the local machine. Happy to discuss alternative approaches here too.

Early feedback request on the design

Keep in mind my testing and work is pretty rough and just an initial concept.

  1. Should this implementation bloat the uri spec key to support other "schemes" (such as file:// and cmd://) or should it really be a separate key in the plugin manifest spec? (Providing string like cmd://run-my-super-sweet-script -i thing -o otherthing just feels kinda not ideal.)

  2. Should this leverage temp files on the file system to "buffer" the stdout from the "artifact download command"? I initially attempted to use exec.Command() and exec.Command().Stdout() but hit issues with the pipe closing before the stream could be read (appears to be how exec.Command() is intended to function).

  3. Is this introducing too much of a security risk, allowing manifests to effectively execute arbitrary code via krew when installing plugins?

Below is an example plugin manifest using this PR to give an example of how this is written right now.

apiVersion: krew.googlecontainertools.github.com/v1alpha2
kind: Plugin
metadata:
  name: testfile
spec:
  version: "v0.0.1"
  homepage: https://somedomain.com/
  shortDescription: Throwin testfiles
  caveats: |
    Something something testfile

  platforms:
  - uri: cmd://gh api -H Accept:application/octet-stream /repos/kubernetes-sigs/krew/releases/assets/55894121
    sha256: 5df32eaa0e888a2566439c4ccb2ef3a3e6e89522f2f2126030171e2585585e4f
    bin: testfile
    files:
    - from: ./krew-linux_amd64
      to: testfile
    selector:
      matchExpressions:
      - key: "os"
        operator: "In"
        values:
        - linux
        - darwin
@endzyme
Copy link
Author

endzyme commented Dec 2, 2022

@ahmetb Sorry for the rather lazy approach of starting the discussion by pasting the PR initial comment. If you'd like a different method of starting the design discussion, let me know what you expect and I can write up a proposal.

@endzyme
Copy link
Author

endzyme commented Dec 29, 2022

@ahmetb Anything I can help with to get the ball rolling here on what you need to have a design discussion? Happy to connect in the Kubernetes Slack too. 😄 Happy Holidays and hope you have a great New Year

@chriskim06
Copy link
Member

chriskim06 commented Jan 10, 2023

i think points 1 and 3 you made are definitely important. i dont like the idea of executing an arbitrary command to get the release assets because its a security risk like you said.

i think homebrew has specific release strategies for downloading assets from non-public github urls (private github repo, s3, etc). maybe something like that could work here as well. speaking about private github specifically, i think if we kept the uri as a url to a private release asset then you have to add a Authorization: token ... header to the request, which presents another challenge of how krew should get that token. a predefined env var would be possible but that adds extra required knowledge on the users.

digging through the github api a bit and its a little difficult to download a release asset from a private repo. the url looks like https://api.github.com/repos/OWNER/REPO/releases/assets/ASSET_ID and i dont see a way to download a release asset by name. i had to first get the release by name and parse out the asset id from that response. maybe there's an easier way though. if this is the way it has to be done then i think we would have to either:

  1. put the responsibility on the users to get the correct api url with asset id
  2. use the combination of owner, repo, and asset name to query the github api and download the asset (might involve changing the plugin schema which would not be great). another thing is if people are self hosting github then the api url will be different [1]

not a huge fan of either of those but option 2 would be a better user experience. another possibly annoying thing is that the access token thats used would have to have repo permissions in all the plugin repos for that custom index.


[1] another issue i ran into while playing around with this: my company uses self hosted github enterprise + okta sso. the typical url used for assets in plugin manifests (github-url/owner/repo/releases/download/...) doesnt actually work since that redirects to an okta page. figure id jot this down here since i think others might have similar issues when self hosting github. the plugins would need to specify the api url and the download would need to be updated

@esatterwhite
Copy link

esatterwhite commented Feb 10, 2023

Most package managers I've encountered that handle this have allowed me to specify a url to the source. In the case of a git repo, it is just the git repo, and a ref.

source = {
  url = 'git+https://github.com/<owner>/<repo>.git',
  tag = 'v2.2.1',
}

where the protocol of the url tells the manager how it is supposed to get the asset.

github+https://github.com/<owner>/<repo>

tells you the asset lives on a github release for example, which means the repo could be private and needs some additional information. Most github things have standardized around having a GITHUB_TOKEN environment variable set on the machine running the command to handle authenticated requests.

@endzyme
Copy link
Author

endzyme commented Feb 12, 2023

@esatterwhite, do you have any examples you can link here. I'm happy to keep driving this forward a little, but I want to see if the examples have use cases for other schemes or protocols like s3/gcs or others. Thanks!

@chriskim06
Copy link
Member

dug around homebrew a bit and they are also using the github api https://github.com/Homebrew/brew/blob/master/Library/Homebrew/utils/github/api.rb#L196-L198 with the token defined in an env var i think. im not exactly sure how a formula defines that the artifact to install is in a private repo though (and homebrew also allows people to define how something is installed instead of just providing a binary to download).

i think if we are going to support this we'll need to use the api as well and come up with a good design around some of the comments i made earlier

@chriskim06
Copy link
Member

@esatterwhite that looks like stuff to download a private repo (which i believe works with the custom indexes feature in krew). the problem we have right now is if a plugin manifest defines a binary from a release in a private repo. i havent figured out a way to download an asset using the https://github.com/{owner}/{repo}/releases/download/{tag}/{artifact} url. i think we need the github api in order to do that which would require some changes to the plugin schema i think and that requires some design

@ahmetb ahmetb added kind/proposal priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/multi-index labels Feb 13, 2023
@chriskim06
Copy link
Member

i think the main points of concern are

  • making sure that existing manifests dont need to be updated
  • doing this in a way that is extensible to other potential artifact sources
  • how to handle authentication credentials

with that being said here are some very preliminary thoughts i had:

we could add a new Release type that implements the download.Fetcher interface and include this in the Plugin type. Release would need to implement the UnmarshalJSON method and use json.RawMessage to allow unmarshaling to different types (in case we support other download options like s3 in the future). a release would look like

platforms:
- selector:
    ...
  release:
    type: github
    metadata:
      baseURL: ...
      repo: ...
      release: ...
      asset: ...
  ...

the implementation of these different ways to download releases would just need to implement Get(string) (io.ReadCloser, error), could have different schemas themselves, and would need to update the Release type's implementation of UnmarshalJSON.

the rough idea above doesnt address how the github token is picked up by krew. that is probably one of the more difficult points to address with all this. we've talked about potentially having a config file in the past but we havent really needed one afaik. an env var is much simpler from a dev perspective but adds a bit of extra required knowledge on the user's part (although i dont think much and is a common paradigm with other tools).

@ahmetb curious to hear your take on all this.

@endzyme
Copy link
Author

endzyme commented Feb 18, 2023

I think this is a fine approach. In terms of a new release type, it could easily be adapted into a plugin ecosystem if needed. The plugin docs for different doc types could either warn users of missing env vars or other things.

My only hesitation is having the top level spec called release, but maybe refer to it as artifact or installer or similar.

@ahmetb
Copy link
Member

ahmetb commented Feb 19, 2023

I think it's probably better to go from concrete use cases to solutions in this scenario. Several use cases I'm hearing in this post are:

  1. I want to let my employees install kubectl plugins (from my private artifact storage): I'm not too convinced that we should try to accommodate this use case.

    Most companies have corporate IT tooling to install binaries on employee machines. Since most plugins are single Go binaries, they should use their existing corp machine management tooling to deliver and update these binaries and not need Krew.

  2. I want to share my plugin from a private artifact repository (superset of 1): This comes down to where the binary is and how it is authenticated. It might be OK to roughly categorize the artifact sources as:

    • Blob is not served over HTTP(S), e.g. ftp or custom curl command: inclined to not accommodate this
    • Blob is on a private HTTPS endpoint with "custom" authentication scheme: For example, the "signed blob" URLs seen in S3 or Azure. This means we need to reformat the URL, it's not ideal.
    • Blob is authenticated via an Authorization header: e.g. AWS S3 can use an "Authorization: AWS xxx" scheme, but it
    • Blob is authenticated via an Authorization: Bearer xxx header: (What kubectl auth exec plugins do): it can accommodate endpoints like Google Cloud Storage or GitHub API.

By far, the last format (Bearer token) is the easiest to support and may be covering the most ground [citation needed] but as soon as we add this, we'll probably get someone from a company saying they use Kerberos/NTLM for authentication (which requires roundtripping the http requests with a sophisticated negotiation protocol like SPNEGO/GSSAPI) and we'd be pulled in direction (and we probably should have a hard line to not support such requests).

Also keep in mind, if a company can deliver something like an auth plugin that's exec'ed on a machine, they might as well deliver the kubectl plugin with the same mechanism, too. :) So this would be only interesting if the user already has CLI tools like gh or gcloud where we can call gh auth token or gcloud auth print-access-token and get a token out of it.

@esatterwhite
Copy link

esatterwhite commented Feb 19, 2023

We want to set up our own krew pugin index in our GitHub org, which is private and pulls artifacts off of private GitHub releases. Very similar to what krew is doing already. That is the use case

@daniel-garcia
Copy link
Contributor

daniel-garcia commented Feb 20, 2023

kubectl provides an elegant albeit complex method for authentication => kubelogin. In fact, krew can install kubelogin plugins (see oidc-login. If krew took a similar approach, the user experience might look like

$ kubectl krew index add myindex https://github.com/myorg/myindex.git
WARNING: You have added a new index from "https://github.com/myorg/myindex.git"
The plugins in this index are not audited for security by the Krew maintainers.
Install them at your own risk.

Some of the artifacts in this index require the 'krew-download-github'. The plugin was not found.

Then the user can install that plugin. Most of these can be public.

$ kubectl krew install krew-download-github

When a kubectl plugin download requires a download extension, krew executes the plugin. A zero exit code would mean that the data written to stdout is the complete artifact. If a non-zero exit code would mean the download failed, the output over stderr could give the user more insight into the issue.

The above experience would ensure that all plugins are verified via sha256 hashes. Social evidence also shows that executing subcommands is acceptable (see kubelogin). Giving arguments to the plugins (defined in the index) would also not expand the security surface area as long as a krew does not interpret the arguments in a shell. In kubeconfig, the commands are defined by an array of strings and not passed to a shell but to exec.Command.

The above approach would cover downloading artifacts FROM ANY source over any protocol provided that the end user is willing to install kubectl plugins that enable those plugins. Presumably, they are already comfortable with that, or they would not be using krew in the first place. Concerns about corporate policies involving the installation of software are an orthogonal concern. Installing krew plugins is no different from installing kubectl, chrome, or any other executable on a corporate asset. Those concerns should be deferred to Device Management software solutions and not directly affect the solutions presented here.

@daniel-garcia
Copy link
Contributor

On a related note, its not obvious this is an issue since krew interprets 4xx as successful artifact downloads. This can be a real time sink because of the misleading error. #819

@chriskim06
Copy link
Member

  1. I want to share my plugin from a private artifact repository

i think this is something we should support, especially for private github repos. it should be doable to implement this in a way where we can add/deny new artifact sources on a case by case basis

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 23, 2023
@esatterwhite
Copy link

Has there been any forward progress on this? I know there were a couple pull requests, but they seem to have stalled.
Are we will waiting on agreement around design?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-index kind/proposal lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

7 participants