Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Quay as a Proxy Cache #13

Merged
merged 1 commit into from
Dec 21, 2021
Merged

Conversation

Sunandadadi
Copy link
Contributor

No description provided.

@Sunandadadi Sunandadadi force-pushed the proxy_cache branch 2 times, most recently from 8f1f552 to 98a29fd Compare December 14, 2021 04:14
@Sunandadadi Sunandadadi changed the title WIP: Proposal for Quay as a Proxy Cache Proposal for Quay as a Proxy Cache Dec 14, 2021
Copy link
Contributor

@flavianmissi flavianmissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for taking the time to putting it down into words.
Left some small comments. I'll hold the approval until we discuss it with the rest of the team.
Great work! 🚀

enhancements/proxy-cache.md Outdated Show resolved Hide resolved
enhancements/proxy-cache.md Show resolved Hide resolved
enhancements/proxy-cache.md Outdated Show resolved Hide resolved

### Goals

* A Quay user can define and configure(credentials, staleness period) via config file/app, a repository in Quay that acts a cache for a specific upstream registry.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this saying a Quay repo is mapped to an entire upstream registry, like docker.io? A couple bullets down it sounds like a Quay org is mapped to a registry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the epic's user stories:

As a administrator I want to be able to select from caching an entire upstream registry (e.g. cache all of docker.io, i.e. docker.io/library/postgres:latest -> quay.corp/cache/library/postgres:latest) or just a selected namespaces (e.g. just cache docker.io/library, i.e. docker.io/library/postgres:latest -> quay.corp/docker-cache/postgres:latest) so that I can constrain access to potentially untrusted upstream registries

Maybe we should add all users stories to the enhancement so there's no confusion.
Wdyt?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be the other way around: first we need an enhancement proposal that describes a feature the way it will look in the next release, then we discuss it and accept it, and then we can create stories and implement it.

The enhancement proposals should be detailed enough so that they are useful to the documentation team, QA, support engineers, early adopters that use nightly builds. If we change some of our decisions, we need to update the EPs to keep them useful.

Note: some my comments have questions that are beyond the detalisation of EPs. I'm trying to understand it deeper and find problems that can make it unimplementable. Implementation details that are not visible to users can be omitted in EPs, but we still may need to discuss them.

@HammerMeetNail
Copy link

@Sunandadadi This looks good, a few comments. I'm curious, are all of these changes backend? Is there any UI work expected?


* If the upstream image and cached version of the image are different:
* The user is authenticated into the upstream registry and only the changed layers of `postgres:14` are pulled.
* The new layers are updated in cache and served to the user in parallel.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if the same uncached blob is requested by multiple clients at the same time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I suspect that the desired behaviour is that we don't end up with duplicates blobs. Do we care about how we end up with that result? 🤔

A user initiates a pulls of an image(say a `postgres:14` image) from an upstream repository on Quay. The repository is checked to see if the image is present.
1. If the image does not exist, a fresh pull is initiated.
* The user is authenticated into the upstream registry and all the layers of `postgres:14` are pulled.
* The pulled layers are saved to cache and served to the user in parallel.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will serving and caching reuse the same connection to the upstream registry or will they open two different connections?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked in depth at the technical feasibility of this, but my idea is to pull a layer from the upstream registry and then cache and stream it. So there should only be one pull against the upstream registry and we'd use the result of that to both cache and stream the layer to the user.


### Goals

* A Quay user can define and configure(credentials, staleness period) via config file/app, a repository in Quay that acts a cache for a specific upstream registry.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe how the configuration process will look like? Will it be a parameter in the configuration file or will it be configured in the web console (or Quay API)? If the web console, will it be available for all users or only to the superuser?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we need to at least mention how the setup will be addressed here.
From the user stories in the epic, I understand that regular users should be able to set up proxy orgs:

  • As a Quay user I want to be able to define an organization in Quay that acts a cache for a specific upstream registry
  • As a Quay user I want to be able to supply credentials to the upstream registry when defining a cache organization so that I can circumvent / extend possible pull-rate limits or access private repositories

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will look into this and update once we have a clearer understanding.

enhancements/proxy-cache.md Outdated Show resolved Hide resolved
enhancements/proxy-cache.md Outdated Show resolved Hide resolved
* If the upstream image and cached version of the image are same:
* No layers are pulled from the upstream repository and the cached image is served to the user.

* If the upstream image and cached version of the image are different:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the upstream image is deleted? Needs new credentials?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the staleness period the client would either get the error propagated to them (image outside staleness period) or get the version quay has in cache (image within staleness period).

![](https://user-images.githubusercontent.com/11522230/145871778-da01585a-7b1b-4c98-903f-809c214578da.png)
Design credits: @fmissi

2. If the image exists in cache:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the image is pulled by SHA, do we need to perform upstream checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need to. Since digests are content addressable.

enhancements/proxy-cache.md Show resolved Hide resolved
images from cache are evicted based on LRU.
* A proxy cache organization will transparently cache and stream images to client. The images in the proxy cache organization should
support the default expected behaviour (like security scanner, time machine, etc) as other images on Quay.
* Given the sensitive nature of accessing potentially untrusted upstream registry all cache pulls needs to be logged (audit log).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulls are already logged. This can probably be inferred based on the namespace and whether it's set up as a proxy cache or not. Or maybe add additional metadata to the existing pull audit log for these cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kleesc do you mean we currently log pulls done by clients pulling from quay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this as being about logging the pulls performed by quay against the upstream registry, not necessarily the pulls performed by the client.

* A proxy cache organization will transparently cache and stream images to client. The images in the proxy cache organization should
support the default expected behaviour (like security scanner, time machine, etc) as other images on Quay.
* Given the sensitive nature of accessing potentially untrusted upstream registry all cache pulls needs to be logged (audit log).
* A Quay user can flush the cache to eliminate excess storage consumption.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be handled manually by the user. It should be done automatically, based on some policy that we decide on. e.g Removing the oldest cached tags/repos.

Copy link
Contributor

@flavianmissi flavianmissi Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The automatic removal will exist along side with the manual flush. Manual flush is specifically mentioned in the user stories on the epic:

  • As a Quay admin I want to be able to leverage the storage quota of an organization to limit the cache size so that backend storage consumption remains predictable by discarding images from the cache according to least recently used or pull frequency
  • [...]
  • As a user I want to be able to flush the cache so that I can eliminate excess storage consumption

If you think this shouldn't be the case then we need to bring it up with Daniel when he's back.

enhancements/proxy-cache.md Outdated Show resolved Hide resolved
![](https://user-images.githubusercontent.com/11522230/145871778-da01585a-7b1b-4c98-903f-809c214578da.png)
Design credits: @fmissi

2. If the image exists in cache:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need to. Since digests are content addressable.

enhancements/proxy-cache.md Outdated Show resolved Hide resolved
enhancements/proxy-cache.md Outdated Show resolved Hide resolved

### Constraints

* If quota management is enabled with proxy cache organization, and say an org is set to a max quota of 500mb and the image the user wants to pull is 700mb.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above as to what quota means in this context.

enhancements/proxy-cache.md Outdated Show resolved Hide resolved
Copy link
Contributor

@flavianmissi flavianmissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Sunanda! 🎉
Let's iterate on this as necessary with the rest of the team after holidays.

@Sunandadadi
Copy link
Contributor Author

@Sunandadadi This looks good, a few comments. I'm curious, are all of these changes backend? Is there any UI work expected?

@HammerMeetNail Yes, there will be UI work to configure a proxy cache organization where users can configure credentials, upstream repo path, staleness period and such

Copy link

@HammerMeetNail HammerMeetNail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Can we back burner any UI work that's needed? There's a frontend rewrite planned for next year that this work might fall nicely into.

@Sunandadadi Sunandadadi merged commit 19adb79 into quay:main Dec 21, 2021
@Sunandadadi Sunandadadi deleted the proxy_cache branch December 21, 2021 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants