-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add endpoints to obtain files from within registry tarballs #210
Comments
This would be doable, of course, but it does add complexity, work and risk to the package server. We want to minimize that, especially on the public package servers, which are exposed to the entire world. Anyone can put up an arbitrary tarball via automated package registration, so adding this would allow them to then trigger extracting that tarball on our servers, which is a lot riskier than just serving arbitrary tarballs as opaque blobs. Of course, I think we're already rewriting tarballs on storage servers, so there's some risk already, but that doesn't actually touch disk at all—it's implemented entirely in-memory in
@staticfloat, any thoughts? |
I'm a little against doing this kind of processing on the server-side. We would need to decompress, extract, recompress and transmit the piece of the data that you're asking for, and while that's not that much work to do, we should assume that any feature we add here could potentially become extremely popular. I'm imagining people using this to extract files from large artifacts when they only want a subset of what the artifact contains. Of course we could then cache these results separately from the full artifact, but it's a decent amount of work to save not a lot of work on the client side. In general, I'd really like to burn client-side CPU time than server CPU time, because if we offer it up via the server I'm pretty sure we're going to run out of server CPU eventually. |
Right, but what if this was only on private package servers like the ones JuliaHub offers? |
If it's just a private server, I think I would instead just create a second service that gets registries from an upstream package server, and does whatever this application requires. I don't think the motivation behind this (the renovate app) is meant to work with private package servers, is it? It looks to me like it's supposed to work with the opensource package server, although maybe that's just an example? |
I have been working on adding package servers as a "datasource" to the dependency manager Renovate (renovatebot/renovate#29623). This seems like a nice approach as it ensures the dependency manager uses the same mechanism as (most) Julia clients and is therefore more likely to have accurate data (i.e. instead of directly interacting with the underlying Git repository).
One of the things I have run into is that, as far as I can tell, I can only obtain the full tarball of a registry (at a particular "state") and need to handle all the extraction and parsing locally. Given that these tarballs are relatively big and we often only need one or two files from them, it'd be nice if package server had endpoints available "indexing" into the tarballs and handling the necessary extraction and caching in a centralized location. I'm not sure how well this plays with the infrastructure and architecture, but having this information available through specific endpoints would make it a lot more straightforward to write tools like Renovate against package servers.
Some additional notes as to why I consider relying on package servers a nicer approach over directly interacting with the underlying Git repositories:
The text was updated successfully, but these errors were encountered: