Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download the list of packages before requesting from a registry #1375

Open
s-ludwig opened this issue Feb 12, 2018 · 6 comments
Open

Download the list of packages before requesting from a registry #1375

s-ludwig opened this issue Feb 12, 2018 · 6 comments

Comments

@s-ludwig
Copy link
Member

Currently when downloading package information, each registry is queried in sequence, failing and falling back to the next one if a package doesn't exsit or isn't accessible due to any other kind of error.

This has a few implications:

  • Performance: The failing queries usually take up the same amount of time as a failing query, meaning that with multiple additional registries configured, the total time required for querying all packages can go up proportionally. Note that apart from the approach porposed here, this point can also be mitigated by allowing to query multiple packages at once.

  • Privacy:; The current approach potentially gives away the names of the used packages to other registries than the one actually containing those packages. This can leak private information from an internal company network whenever the internal registry server is not available.

  • Bad error messages: It's uncertain whether a package really doesn't exist on a particular server, or if there is some kind of more general error condition. If downloading the package list fails, it is a clear indication of the latter. Also, with the list of packages cached locally, the error message can even list a specific error for the registr(y/ies) that are supposed to contain the package in question, instead of just "Couldn't be found on any of the registries".

@wilzbach
Copy link
Member

See #1366 for using a single request to download the package with its dependencies.

@MartinNowak
Copy link
Member

MartinNowak commented Feb 13, 2018

Please consider the bad experience we had with a metadata-cache in the past (see #755 and #528). Too often the cached information is outdated and the commands that require registry interaction are expected to deliver up-to date information. Downloading a full index every time is likely slower than what we have atm. I considered differential updates of the index edits (like a mysql binlog), but the effort didn't seem appropriate compared to #1366.

Privacy:; The current approach potentially gives away the names of the used packages to other registries than the one actually containing those packages.

Dub clients do query custom servers before the default server and defaults can be turned off, so that should be fine for such use-cases.

Bad error messages: It's uncertain whether a package really doesn't exist on a particular server, or if there is some kind of more general error condition.

That's handled by 404 vs. 5xx errors.

So overall I think an full index on client side comes with a lot of new technical issues and requires quite some work, while hardly solving our current problems. The only plus side, on a registry outage, dub could fallback to the local index (possibly using outdated information). But I think it's simpler to improve availability of the registry to solve that problem.

@s-ludwig
Copy link
Member Author

No, the plan would be to always download the list first before trying to download further package information from a particular registry - caching is not the intention. It would of course be implemented with proper support modification checks, so that it only gets re-downloaded when things have actually changed. Also note that the file is currently 13 KB uncompressed, 6 KB compressed, so that's also not going to be a problem in the forseeable future.

Dub clients do query custom servers before the default server and defaults can be turned off, so that should be fine for such use-cases.

Yes, but if such a server is not reachable for some reason, then the request will simply fall back to the public registry, revealing internal package names to it. I'm assuming that even internal projects will use some public packages and do not necessarily mirror them on their internal registry.

That's handled by 404 vs. 5xx errors.

Getting a 5xx error will still not tell whether a package is found on that server. 404 could also happen due to a misconfiguration of the web server for example and would also lead to the wrong conclusions. Having (or not having) the full list of packages up front on the other hand allows to generate a very precise error message.

It should also not be underestimated that configuring a single private registry will almost double the amount of time spent to fetch information for a public package, because the private retgistry will always be queried first.

@MartinNowak
Copy link
Member

Also note that the file is currently 13 KB uncompressed, 6 KB compressed, so that's also not going to be a problem in the forseeable future.

That's just the package index list without version information, right?
I'm not sure that I really want to get into dealing with privacy concerns, it's only the package names after all. There are interesting techinques like search indexing encrypted files that might work here as well, also somewhat similar to https://www.signal.org/blog/private-contact-discovery/. But are we really up for adding complexity for that?

@s-ludwig
Copy link
Member Author

Yes, just the names. It's also the error messages that would suddenly make sense. I'm not saying that we have to make this a priority, though, but it's something that I would not dismiss.

Speaking of multiple registries, does the multi-package query currently work correctly for dependency trees that span multiple registries?

@MartinNowak
Copy link
Member

MartinNowak commented Feb 24, 2018

Speaking of multiple registries, does the multi-package query currently work correctly for dependency trees that span multiple registries?

Yes, it will remove packages from the query list that are in the response from a registry, still WIP though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants