-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for Quay as a Proxy Cache #13
Conversation
aab124d
to
ec28d31
Compare
8f1f552
to
98a29fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for taking the time to putting it down into words.
Left some small comments. I'll hold the approval until we discuss it with the rest of the team.
Great work! 🚀
98a29fd
to
2928473
Compare
|
||
### Goals | ||
|
||
* A Quay user can define and configure(credentials, staleness period) via config file/app, a repository in Quay that acts a cache for a specific upstream registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this saying a Quay repo is mapped to an entire upstream registry, like docker.io? A couple bullets down it sounds like a Quay org is mapped to a registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the epic's user stories:
As a administrator I want to be able to select from caching an entire upstream registry (e.g. cache all of docker.io, i.e. docker.io/library/postgres:latest -> quay.corp/cache/library/postgres:latest) or just a selected namespaces (e.g. just cache docker.io/library, i.e. docker.io/library/postgres:latest -> quay.corp/docker-cache/postgres:latest) so that I can constrain access to potentially untrusted upstream registries
Maybe we should add all users stories to the enhancement so there's no confusion.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be the other way around: first we need an enhancement proposal that describes a feature the way it will look in the next release, then we discuss it and accept it, and then we can create stories and implement it.
The enhancement proposals should be detailed enough so that they are useful to the documentation team, QA, support engineers, early adopters that use nightly builds. If we change some of our decisions, we need to update the EPs to keep them useful.
Note: some my comments have questions that are beyond the detalisation of EPs. I'm trying to understand it deeper and find problems that can make it unimplementable. Implementation details that are not visible to users can be omitted in EPs, but we still may need to discuss them.
@Sunandadadi This looks good, a few comments. I'm curious, are all of these changes backend? Is there any UI work expected? |
|
||
* If the upstream image and cached version of the image are different: | ||
* The user is authenticated into the upstream registry and only the changed layers of `postgres:14` are pulled. | ||
* The new layers are updated in cache and served to the user in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if the same uncached blob is requested by multiple clients at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I suspect that the desired behaviour is that we don't end up with duplicates blobs. Do we care about how we end up with that result? 🤔
A user initiates a pulls of an image(say a `postgres:14` image) from an upstream repository on Quay. The repository is checked to see if the image is present. | ||
1. If the image does not exist, a fresh pull is initiated. | ||
* The user is authenticated into the upstream registry and all the layers of `postgres:14` are pulled. | ||
* The pulled layers are saved to cache and served to the user in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will serving and caching reuse the same connection to the upstream registry or will they open two different connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not looked in depth at the technical feasibility of this, but my idea is to pull a layer from the upstream registry and then cache and stream it. So there should only be one pull against the upstream registry and we'd use the result of that to both cache and stream the layer to the user.
|
||
### Goals | ||
|
||
* A Quay user can define and configure(credentials, staleness period) via config file/app, a repository in Quay that acts a cache for a specific upstream registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe how the configuration process will look like? Will it be a parameter in the configuration file or will it be configured in the web console (or Quay API)? If the web console, will it be available for all users or only to the superuser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we need to at least mention how the setup will be addressed here.
From the user stories in the epic, I understand that regular users should be able to set up proxy orgs:
- As a Quay user I want to be able to define an organization in Quay that acts a cache for a specific upstream registry
- As a Quay user I want to be able to supply credentials to the upstream registry when defining a cache organization so that I can circumvent / extend possible pull-rate limits or access private repositories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will look into this and update once we have a clearer understanding.
* If the upstream image and cached version of the image are same: | ||
* No layers are pulled from the upstream repository and the cached image is served to the user. | ||
|
||
* If the upstream image and cached version of the image are different: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the upstream image is deleted? Needs new credentials?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the staleness period the client would either get the error propagated to them (image outside staleness period) or get the version quay has in cache (image within staleness period).
![](https://user-images.githubusercontent.com/11522230/145871778-da01585a-7b1b-4c98-903f-809c214578da.png) | ||
Design credits: @fmissi | ||
|
||
2. If the image exists in cache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the image is pulled by SHA, do we need to perform upstream checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need to. Since digests are content addressable.
images from cache are evicted based on LRU. | ||
* A proxy cache organization will transparently cache and stream images to client. The images in the proxy cache organization should | ||
support the default expected behaviour (like security scanner, time machine, etc) as other images on Quay. | ||
* Given the sensitive nature of accessing potentially untrusted upstream registry all cache pulls needs to be logged (audit log). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulls are already logged. This can probably be inferred based on the namespace and whether it's set up as a proxy cache or not. Or maybe add additional metadata to the existing pull audit log for these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kleesc do you mean we currently log pulls done by clients pulling from quay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this as being about logging the pulls performed by quay against the upstream registry, not necessarily the pulls performed by the client.
* A proxy cache organization will transparently cache and stream images to client. The images in the proxy cache organization should | ||
support the default expected behaviour (like security scanner, time machine, etc) as other images on Quay. | ||
* Given the sensitive nature of accessing potentially untrusted upstream registry all cache pulls needs to be logged (audit log). | ||
* A Quay user can flush the cache to eliminate excess storage consumption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be handled manually by the user. It should be done automatically, based on some policy that we decide on. e.g Removing the oldest cached tags/repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The automatic removal will exist along side with the manual flush. Manual flush is specifically mentioned in the user stories on the epic:
- As a Quay admin I want to be able to leverage the storage quota of an organization to limit the cache size so that backend storage consumption remains predictable by discarding images from the cache according to least recently used or pull frequency
- [...]
- As a user I want to be able to flush the cache so that I can eliminate excess storage consumption
If you think this shouldn't be the case then we need to bring it up with Daniel when he's back.
![](https://user-images.githubusercontent.com/11522230/145871778-da01585a-7b1b-4c98-903f-809c214578da.png) | ||
Design credits: @fmissi | ||
|
||
2. If the image exists in cache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need to. Since digests are content addressable.
enhancements/proxy-cache.md
Outdated
|
||
### Constraints | ||
|
||
* If quota management is enabled with proxy cache organization, and say an org is set to a max quota of 500mb and the image the user wants to pull is 700mb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above as to what quota means in this context.
2928473
to
ec48d3d
Compare
ec48d3d
to
b96015a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, Sunanda! 🎉
Let's iterate on this as necessary with the rest of the team after holidays.
@HammerMeetNail Yes, there will be UI work to configure a proxy cache organization where users can configure credentials, upstream repo path, staleness period and such |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Can we back burner any UI work that's needed? There's a frontend rewrite planned for next year that this work might fall nicely into.
No description provided.