Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor proposal for isort implementation #7738

Closed
bluthej opened this issue Oct 1, 2023 · 5 comments · Fixed by #7963
Closed

Refactor proposal for isort implementation #7738

bluthej opened this issue Oct 1, 2023 · 5 comments · Fixed by #7963
Labels
internal An internal refactor or improvement isort Related to import sorting

Comments

@bluthej
Copy link
Contributor

bluthej commented Oct 1, 2023

I've been working on #1567 and I got a working implementation but after taking the time to make a small refactor in #7665 and coming back to the original issue I'm thinking another refactor might be welcome regarding extensibility of the isort functionality.

Current state

Currently, the formatting is performed by format_import_block which relies on order_imports from isort/order.rs, and there are different methods that are called along the way depending on the kinds of imports we are comparing (they are kind of intertwined).

The ordering rules themselves are implemented in isort/sorting.rs, and here each different combination of import types has its own method.

The main drawback I see from this architecture is that when trying to add a new functionality one needs to sprinkle some changes in the different methods instead of making changes in a single place.

Proposal

I thus propose to reverse the logic, i.e. have a single entry point for comparisons (like the current cmp_either_import) that would chain the different sorting rules (cmp_force_to_top, etc...), and each of them would match on the import types.

I think this would make the isort functionality more easily extensible. With that change, to implement #1567 we just need to add a cmp_string_length method and insert it in the chain of sorting rules. This will simplify the process for similar settings like length sort straight and length sort sections.

What do you think? Any feedback is welcome 😃

@charliermarsh
Copy link
Member

So we'd merge into a vector of EitherImport earlier, and then do a generalized comparison between imports on that vector, rather than ordering twice? Seems reasonable to me, but it's hard to say for certain without looking at the proposed changes. Happy to review a PR.

@charliermarsh charliermarsh added internal An internal refactor or improvement isort Related to import sorting labels Oct 2, 2023
@bluthej
Copy link
Contributor Author

bluthej commented Oct 3, 2023

I guess we don't necessarily need to always merge the two Vecs of EitherImport, we can keep them separate if we don't sort within sections, like in the original Python isort implementation.

I understand it's tough to judge without seeing a PR, I wanted to see if in principle that sounded ok before spending some time on it :)

BTW, I noticed that initially the implementation looked a lot more like the original isort (using a module_key method to sort by key), and then it was changed to the current form with some cmp_* methods to compare pairs of imports. It's tough for me to pinpoint exactly when this happened (the history on Github is not helpful because the folders were renamed I think) so I'm wondering what was the reason for that? I'm asking because the Python implementation looks simpler to me.

@aspizu
Copy link
Contributor

aspizu commented Oct 3, 2023

what do you guys think about my implementation in https://github.com/aspizu/imp?

@bluthej
Copy link
Contributor Author

bluthej commented Oct 3, 2023

@aspizu thanks for sharing!

I see you are relying on the Ord trait to sort the imports. I did give it some thought, but the main issue I see with this is that the imports are not just sorted based on their internal state, they are sorted based on the settings passed in the configuration as well. You could make the argument that the settings could be stored within some import struct and used in the trait implementation but then the issue is that self and other would need to have consistent settings.

So I personally prefer the implementations based on functions that take one or two imports and the settings separately (either in the sort_by_key style like the original isort, or in the cmp_* style, like it's currently the case in ruff).

@bluthej
Copy link
Contributor Author

bluthej commented Oct 6, 2023

Hey,

So I did some digging and eventually found that the cmp_* based implementation of the sorting logic was introduced in #1374 in order to respect natural ordering.

I looked at the original Python implementation and saw that they're splitting the keys based on the numbers found in the string. This is a neat little trick, but this is not directly applicable in Rust since we can't have a Vec with a mix of strings and integers. I guess that's what guided the current implementation towards using the natord crate, which provides the compare and compare_ignore_case functions.

What we could do though is to introduce a NatOrdString wrapper for a String that implements the Ord trait based on the compare function from natord. This type would be used for the return type of the *_key functions, and we could thus use sort_by_cached_key again. This is pretty easy, and I do think this would lead to a more straightforward implementation.

Any thoughts?

@bluthej bluthej mentioned this issue Oct 15, 2023
charliermarsh pushed a commit that referenced this issue Oct 30, 2023
## Summary

Refactor for isort implementation. Closes #7738.

I introduced a `NatOrdString` and a `NatOrdStr` type to have a naturally
ordered `String` and `&str`, and I pretty much went back to the original
implementation based on `module_key`, `member_key` and
`sorted_by_cached_key` from itertools. I tried my best to avoid
unnecessary allocations but it may have been clumsy in some places, so
feedback is appreciated! I also renamed the `Prefix` enum to
`MemberType` (and made some related adjustments) because I think this
fits more what it is, and it's closer to the wording found in the isort
documentation.

I think the result is nicer to work with, and it should make
implementing #1567 and the like easier :)

Of course, I am very much open to any and all remarks on what I did!

## Test Plan

I didn't add any test, I am relying on the existing tests since this is
just a refactor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement isort Related to import sorting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants