Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CandidateEvaluator class #6424

Merged
merged 1 commit into from
Apr 22, 2019

Conversation

cjerdonek
Copy link
Member

This is to further simplify the PackageFinder class.

@cjerdonek cjerdonek added type: refactor Refactoring code skip news Does not need a NEWS file entry (eg: trivial changes) C: finder PackageFinder and index related code labels Apr 21, 2019
@cjerdonek
Copy link
Member Author

@uranusjr This PR is something that occurred to me when reviewing one of your earlier PR's. A subsequent PR could be to change PackageFinder to accept a CandidateEvaluator object instead of the five arguments of versions, platform, abi, implementation, and prefer_binary. Similar to FoundCandidates, there can be a separate "create" function or method, if necessary, accepting these additional arguments.

@cjerdonek cjerdonek force-pushed the finder-candidate-evaluator branch from 630361f to a6c8d05 Compare April 21, 2019 02:58
@cjerdonek
Copy link
Member Author

In addition, if CandidateEvaluator becomes responsible for creating the FoundCandidates object, passing a CandidateEvaluator object can replace passing not just five but six arguments to PackageFinder -- the five I mentioned above as well as allow_all_prereleases.

@uranusjr
Copy link
Member

uranusjr commented Apr 21, 2019

This sounds like a great idea! I’ve been thinking about splitting PackageFiner in three steps: Find all links, filter out applicable links, and choose the best out of them. This is definitely going the same direction, refactoring out the final part out of it.

I might start trying to split index.py into its own package so I can use it in other projects, e.g. Pipenv…

@cjerdonek
Copy link
Member Author

Great, I'm glad you like it! Do you approve of this PR so it can be merged?

Regarding index.py, I would really rather keep refactoring it in place from where it is now IMO as there is a large amount of work that can be done to break it down and decouple it from pip's innards, etc. I think it is a series of easy, incremental steps.

@uranusjr
Copy link
Member

Regarding index.py, I would really rather keep refactoring it in place from where it is now IMO as there is a large amount of work that can be done to break it down and decouple it from pip's innards, etc. I think it is a series of easy, incremental steps.

I agree. What I meant is I could try to get a better idea what else needs to be done 😃

@pradyunsg
Copy link
Member

I like the direction you're taking this @cjerdonek @uranusjr! :D

I'll bring up the idea that it might also make sense to make index into a sub-package within pip and spread out components within that sub package.

@cjerdonek
Copy link
Member Author

Thanks, @uranusjr and @pradyunsg!

@cjerdonek cjerdonek merged commit dddd28b into pypa:master Apr 22, 2019
@cjerdonek cjerdonek deleted the finder-candidate-evaluator branch April 22, 2019 10:40
@cjerdonek
Copy link
Member Author

cjerdonek commented Apr 22, 2019

By the way, re: splitting things out further, I noticed that _link_package_versions() and _package_versions() and their supporting functions can be moved to become methods of CandidateEvaluator with very little change.

I believe that would bring the size of the PackageFinder class down to about 450 lines.

@uranusjr
Copy link
Member

My gut feeling is they should be in their own class (CandidateFinder or something)? The candidate selection consists of three steps: collecting links, filtering applicable, and select best. FoundCandidates deals the last, CandidateEvaluator the second, so maybe the first should be its own component as well.

@cjerdonek
Copy link
Member Author

cjerdonek commented Apr 22, 2019

I think the methods I mentioned aren’t named well. They’re also responsible for filtering (no “collection” takes place in them). And that filtering also depends exactly on the set of valid tags which CandidateEvaluator already has. It will be nice because it will collect together all the methods that filter and sort by the valid tags. It will be a very functional (in the sense of pure function), testable class.

@cjerdonek
Copy link
Member Author

The candidate selection consists of three steps: collecting links, filtering applicable, and select best. FoundCandidates deals the last, CandidateEvaluator the second,

Also, I think it might be worth broadening how we think about the process beyond how it is currently structured because I feel like a number of simplifications might be possible. For example, "filtering" is currently also done inside "select best," as well as when collecting links, so the responsibilities are spread across which isn't so good. As another example, I think there is a chance that the filtering done inside "select best" could be happening a lot earlier in the process (even before the InstallationCandidate objects are created).

@cjerdonek
Copy link
Member Author

I just posted PR #6425 to do part of what I suggested in my previous comment. It moves _link_package_versions() to CandidateEvaluator (but not the other methods). It also renames _link_package_versions() to evaluate_link() to make clearer what the method actually does (evaluate whether a link is a candidate for install).

@uranusjr
Copy link
Member

uranusjr commented Apr 23, 2019

Reading the source, I think the functions are probably both inadaquently named and implemented (in the three-step sense). _link_package_versions() does filtering (based on search.formats etc.), but is also responsible for excluding href values that don’t point to a package. I would hope the latter happens during the finding step instead. Maybe some further refactoring can happen before (or after) the PR?

@cjerdonek
Copy link
Member Author

Currently, the _link_package_versions() method's only dependency is on the list of valid tags, so I think it's an improvement to move that method to the class that encapsulates the valid tags. (Indeed, _link_package_versions()'s only method call is to call a method on the CandidateEvaluator class.) It also cuts down the size of the PackageFinder class and reduces what it has to "know about," which makes it easier to reason about. We should definitely keep refactoring after any PR (that's what I was advocating above). I wasn't suggesting this would be the final place for anything to live. :) By refactoring things in easy, incremental steps, each step will be an improvement but not necessarily the way we want it in the end.

@lock
Copy link

lock bot commented May 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: finder PackageFinder and index related code skip news Does not need a NEWS file entry (eg: trivial changes) type: refactor Refactoring code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants