-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fast-deps] Add a hook to download all "skipped" wheels #8638
Conversation
a34332a
to
191f962
Compare
@McSinyx and I had a chat about this -- the goal here is to implement "do the downloads" as a separate loop within the resolver for now. We've decided to factor out the RequirementSet creation logic into a dedicated function for now, and to have a separate stage prior to creating the RequirementSet where we iterate through and perform the full downloads. The main design issue we're facing is figuring out how to implement that separate stage without changing behaviors for the rest of the resolver. I have a feeling we'd need to make a lot of changes here. :) |
0af3a4c
to
5726655
Compare
content_file.write(chunk) | ||
return file_path, response.headers.get('content-type', '') | ||
|
||
def __call__(self, link, location=None, hashes=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this class and function call did essentially 1 thing: download a file with progress. That was a good thing because it could be understood as a single small chunk, passed around, and used without much effort. When we add more responsibilities it gets harder to understand and maintain.
Instead of pulling functionality from RequirementPreparer into this, and then invoking this from the new resolver, what if we:
- move the functionality from RequirementPreparer somewhere else and invoke it from RequirementPreparer and the new resolver, or
- (preferred) move downloading out of RequirementPreparer all together, and do it either:
- internally in AbstractDistribution/Candidate when required to satisfy some part of processing (transparently to the caller), or
- in a common place after resolving has happened, if still needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(preferred) move downloading out of RequirementPreparer all together, and do it either:
I haven't looked at this PR yet, but IIUC, this was the conclusion from what @McSinyx and I discussed. I do think I agree that Downloader
might not be the best place to put this though. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On current master, network.download.Downloader only initiates the download and passes the lazy response to the preparation code which then performs the actual download (to a file). What 744896f aims to do is making sure that Downloader.__call__
will perform the download, with addition to hash checking to make the interface consistent when caching is involved. In the way I see it, Downloader should be able to act analogous to wget, which is given an URL/Link, optional output target (-O
). I assume more responsibilities refers to the hash checking.
The reason I placed the call to Downloader directly in resolve
is that it is easier to ensure that such call is thread-safe than a higher-level wrapper like a method in RequirementPreparer
. It would also be more trivial to handle the progress bar for a batch of downloads directly within the Downloader than extracting it from the preparer.
If I understand correctly, then RequirementPreparer seems to be the preparer for candidates of the new resolver, and downloading something happens to be part of the work for some candidates, i.e. RequirementPreparer sometimes need to download to be able to construct an (Abstract)Distribution to give to the candidate, so I'm not sure if I understand the model (2.i) is referring to.
Regarding (2.ii), I completely agree that it would be really nice if we can have this call outside of the resolver code. However, in a discuss with @pradyunsg, we came to a conclusion that InstallRequirement is currently to tangled with the preparation code (or that it logically requires the underlying file to exist), so that we will move the call out once we reworked RequirementSet. @chrahunt, I'm not sure if I've correctly understood and addressed your points since my head is buzzing as I try to visualize this but I hope I am able to deliver some reasonings. Do they change your mind about 744896f or do you have any further suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pradyunsg, we came to a conclusion that InstallRequirement is currently to tangled with the preparation code (or that it logically requires the underlying file to exist), so that we will move the call out once we reworked RequirementSet.
The issue is that the code that gets run after Resolver.resolve
returns, is closely tied to the assumption that the underlying file exists. We don't need for reworking of RequirementSet, but rather, we need to cleanly handle the "I need to be fully downloaded" state without needing to touch the old resolver code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@McSinyx I think some of my concern was an extension of my original comments in #8532, which I did not follow up on in #8588 (sorry). I played with it a bit and came up with #8685, which I think we can use as a pretty clean base for your download work. Please take a look!
Regarding the downloader, I think we can probably achieve the desired effect by creating a new function or class that uses it as-is instead of putting more logic into it. Since our fast-deps feature isn't enabled when doing hash checking, the wrapper won't need any of that stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need for reworking of RequirementSet, but rather, we need to cleanly handle the "I need to be fully downloaded" state without needing to touch the old resolver code.
Thank you for the correction.
I played with it a bit and came up with 8685, which I think we can use as a pretty clean base for your download work. Please take a look!
Thanks, I haven't read it carefully but the resulting hook seems really nice!
Regarding the downloader, I think we can probably achieve the desired effect by creating a new function or class that uses it as-is instead of putting more logic into it. Since our fast-deps feature isn't enabled when doing hash checking, the wrapper won't need any of that stuff.
This sounds good to me too. @pradyunsg said that you two had a discussion on this and I assume this is the consensus that came out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about the hashing part, and IMHO apparently we should check them, just to make sure that a wheel is not corrupted during the download process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about the hashing part, and IMHO apparently we should check them, just to make sure that a wheel is not corrupted during the download process.
If we're in "require-hashes" mode, doing partial downloads does not make sense, since we're only permitted to trust artifacts/files with that specific hash. And the only way to compute that hash is to download the entire file. Since we're going to download the entire file (and we know that if you have hashes, you also have performed dependency resolution already; so we're definitely using that downloaded file), we should not use partial downloads/fast-deps when hash-checking mode is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I didn't communicate this right 😛 Let me provide more context (or at least the context I'm assuming):
- We want a routine taking multiple URLs and download them in parallel, with UI taken into account
- Before that's done, let's have a
map
/for
that call something download of each of them, and that something should be thread-safe - It'll be difficult to keep all preparations thread-safe, even it is at the moment (I think)
pip/src/pip/_internal/operations/prepare.py
Lines 493 to 498 in 8b838eb
def prepare_linked_requirement_more(self, req, parallel_builds=False): # type: (InstallRequirement, bool) -> None """Prepare a linked requirement more, if needed.""" if not req.needs_more_preparation: return self._prepare_linked_requirement(req, parallel_builds) - Rather, let what to be parallelized to be solely downloading the file to (usually) a temp dir and let the preparation code use it as cache latter. Both (after download and when verifying the cache) needs hash checking because the file might corrupt for some reason.
- I'm not sure if it's the effect of
--no-cache-dir
, but some wheels just don't get cached within a session. This doesn't seem to sufficient:pip/src/pip/_internal/network/download.py
Lines 63 to 66 in 370322e
if is_from_cache(resp): logger.info("Using cached %s", logged_url) else: logger.info("Downloading %s", logged_url) pip/src/pip/_internal/operations/prepare.py
Lines 290 to 300 in 370322e
def _check_download_dir(link, download_dir, hashes): # type: (Link, str, Optional[Hashes]) -> Optional[str] """ Check download_dir for previously downloaded file with correct hash If a correct file is found return its path else None """ download_path = os.path.join(download_dir, link.filename) if not os.path.exists(download_path): return None # If already downloaded, does its hash match? - If we use a common temp dir for all downloads, this can be down automatically and this common state really well suit
Downloader
. Furthermore the similarity between Downloader.download_one (or something to be renamed from current.__call__
) and Downloader.download_batch would be really nice.
I am truly sorry for not clarifying this earlier in a discussion point and ended up causing a decent amount of confusion 😅
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
This creates a hook to explicitly download distributions of requirements whose metadata are obtain from their wheels lazily (GH-8588) so that
pip install --dry-run
can be cleanly implemented by interject an interruption right before the downloadAlthough a decent amount of refactoring occurred, this should only affects the behavior of the
fast-deps
feature.