Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out a LinkCollector class from PackageFinder #6910

Merged
merged 4 commits into from
Sep 13, 2019

Conversation

cjerdonek
Copy link
Member

@cjerdonek cjerdonek commented Aug 23, 2019

This PR splits out from the PackageFinder class a new LinkCollector class responsible solely for gathering all the links, but without doing any filtering of them. This reduces the PackageFinder class down to ~360 lines and makes it so the rest of PackageFinder doesn't have a dependency on PipSession or making networking requests (just filtering and sorting strings).

The code in this PR should be logically equivalent to what was there before (so in particular no behavior change).

This PR makes progress e.g. on issue #6430 because the LinkCollector class's main collect_links() method provides the "output all possible links" functionality discussed in that issue. The PR also updates the index.py architecture section created in PR #6787 to reflect the new class.

A follow-up PR can move LinkCollector and the index.py functions it depends on to a separate module.

@cjerdonek cjerdonek added C: finder PackageFinder and index related code skip news Does not need a NEWS file entry (eg: trivial changes) type: refactor Refactoring code labels Aug 23, 2019
@cjerdonek cjerdonek force-pushed the link-collector branch 2 times, most recently from f9120d3 to b9a24b2 Compare August 23, 2019 05:55
@cjerdonek
Copy link
Member Author

PR updated to use a setter, as @uranusjr requested. (Thanks for taking a look, @uranusjr!)

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked at the tests yet and haven't 100% reviewed this code. This PR is just slightly too big to review comfortably for me. Or... I'm just tired after writing 2 exams today. 🙃

Anyway, posting my review eagerly because shorter feedback loops are better.

src/pip/_internal/index.py Outdated Show resolved Hide resolved
# (1) links from file locations,
# (2) links from find_links, and
# (3) a dict mapping HTML page url to links from that page.
CollectedLinks = Tuple[List[Link], List[Link], Dict[str, List[Link]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good follow up might be to convert this to a TypedDict, so that this value has some hint in the key names, about what the different values are for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a namedtuple, you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for namedtuple (and I prefer it over typeddict)

@cjerdonek
Copy link
Member Author

cjerdonek commented Aug 23, 2019

This PR is just slightly too big to review comfortably for me.

Anyway, posting my review eagerly because shorter feedback loops are better.

@pradyunsg Thanks! I can try breaking this up into two or three PR's so the number of lines will be smaller in each. Many of the lines are due to moving methods with no change in content, which gives the appearance of more happening than there really is.

@cjerdonek
Copy link
Member Author

Okay, I created PR #6913 to do separately, which will make this PR a lot smaller afterwards.

cjerdonek added a commit that referenced this pull request Aug 24, 2019
Move some PackageFinder methods (preparation for PR #6910)
@pradyunsg
Copy link
Member

Thanks for breaking this up! Even doing separated commits would've worked and this works well too. 🙃

Many of the lines are due to moving methods with no change in content, which gives the appearance of more happening than there really is.

Yea, it's difficult to "see" what's a functionally equivalent change vs what's a moved method.

@cjerdonek
Copy link
Member Author

No problem!

Yea, it's difficult to "see" what's a functionally equivalent change vs what's a moved method.

I agree! I'm always more than happy to break commits and/or PR's up for you or others if it helps make it easier to review. So don't ever hesitate to ask..

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Aug 24, 2019
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Sep 10, 2019
@cjerdonek cjerdonek force-pushed the link-collector branch 4 times, most recently from adccd7c to 96fd8c1 Compare September 10, 2019 08:53
@cjerdonek
Copy link
Member Author

cjerdonek commented Sep 10, 2019

Okay, this PR is ready to review again. The main hold-up was that I wanted to add at least one non-trivial unit test of the LinkCollector class's main collect_links() method. I did that, and it's a unit test (no network requests) since I mocked _get_html_response() for it, which is the underlying function responsible for making PackageFinder's network requests.

130 lines of this PR are for that unit test, and another 50 lines are for updating the package-finding.rst architecture document, so the PR isn't as big as it seems. The main change is to PackageFinder.find_all_candidates(), along with moving PackageFinder._get_pages() to LinkCollector.

I also addressed the previous review comments.

@cjerdonek
Copy link
Member Author

Also, with this PR, LinkCollector is just 86 lines, and PackageFinder is down to 367 lines.

@cjerdonek cjerdonek force-pushed the link-collector branch 2 times, most recently from e65a3e5 to 2464559 Compare September 10, 2019 16:57
@cjerdonek
Copy link
Member Author

I just did some commit history hygiene so it's easier for people to see what's happening in the commit that adds the LinkCollector class, FYI.

Copy link
Member

@chrahunt chrahunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just 2 minor comments.

given names, a link whose URL has a base name matching that name.
"""
for name in names:
for link in links:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do something like

assert any(
    link.url.endswith(name) for link in links
), message_from_exception_below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn’t use assert here, Pytest can emit preplexing errors for asserts in functions. Personally I’d write a custom assertion, but Chris’s version would work well enough for me.

Maybe a middle of the road alternative would be satisfactory:

for name in names:
    if not any(link.url.endswith(name) for link in links):
        raise RuntimeError(...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pytest can emit perplexing errors for asserts in functions

Do you have an example? I've used asserts in helper functions, classes, and fixtures and haven't noticed any issues so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, both of you. I've updated the PR. I didn't feel strongly either way here, so I just went with @chrahunt's original suggestion. I do agree with @uranusjr that pytest often includes extra info that looks confusing and isn't especially helpful, e.g. the following in this case:

E  assert False
E   +  where False = any(<generator object check_links_include.<locals>.<genexpr> at 0x106edb518>)

but because we're including a custom error message, we can just ignore those other portions and just look at the custom message.

2. Constructs a ``CandidateEvaluator`` object and uses that to determine
the best candidate. It does this by calling the ``CandidateEvaluator``
class's ``compute_best_candidate()`` method on the return value of
``find_all_candidates()``. This corresponds to steps 4-5 of the Overview.


.. _link-collector-class:

The ``LinkCollector`` class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this discussion was had before, but would it be better to keep these kinds of docs in the code itself? Then we could just include it here in any number of ways and I think it would be more likely to be kept up to date.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look! I don't really have a preference for how this type of documentation is done. The reason I added it in the first place is that there was a request from a number of people for me to document the various classes in a separate rst file, so I did that. And I was just updating it here. (I will note that one advantage of doing it directly in the rst is that it lets you use the various reStructuredText formatting, links, etc, whereas having that in a docstring would be a bit weird. I'm also not sure the extent to which it would carry over.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original discussion was here: #6787

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, it would look kind of strange to have to the rst in the docstring, given that we don't currently do that anywhere else.

@cjerdonek cjerdonek merged commit 084d797 into pypa:master Sep 13, 2019
@cjerdonek cjerdonek deleted the link-collector branch September 13, 2019 15:50
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Oct 13, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: finder PackageFinder and index related code skip news Does not need a NEWS file entry (eg: trivial changes) type: refactor Refactoring code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants