Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add Length-Sort config to isort #1567

Closed
Tracked by #6190
einarwar opened this issue Jan 2, 2023 · 20 comments · Fixed by #8841
Closed
Tracked by #6190

Feature request: Add Length-Sort config to isort #1567

einarwar opened this issue Jan 2, 2023 · 20 comments · Fixed by #8841
Labels
good first issue Good for newcomers isort Related to import sorting

Comments

@einarwar
Copy link

einarwar commented Jan 2, 2023

Thanks for an awsome library. I am currently trying to replace the old flake8 and isort setup on our repo with ruff.

We can almost replace isort with ruff, but we are currently using the Length-Sort configuration from isort, which currently can't be found in ruff.

@charliermarsh
Copy link
Member

Love the Annoy-o-Tron.

@charliermarsh charliermarsh added isort Related to import sorting good first issue Good for newcomers labels Jan 2, 2023
@RobertCraigie
Copy link
Contributor

We also need this to be able to migrate to ruff!

@mihirsn
Copy link

mihirsn commented Apr 8, 2023

Hi @charliermarsh 👋🏼 I've started learning Rust for a while but haven't contributed yet and want to start from somewhere. So if you allow can I pick this one ?

@mihirsn
Copy link

mihirsn commented Apr 14, 2023

Hi @charliermarsh, just wanted to know that if you have any idea for the implementation approach for this..if so could let me know I am thinking to pick this up. I saw the comment, a bit confused by it.. so if you can elaborate it would be great.

@vadim-su
Copy link
Contributor

vadim-su commented Jun 1, 2023

Hi there 👋
I will try to implement this option, but I have a question about the already implemented method.
Do I understand correctly that order_by_type and length-sort are mutually exclusive options? If so, should I add an enum with the options?

@warsaw
Copy link

warsaw commented Jun 8, 2023

While you're at supporting length-sort, please also support length-sort-straight. That's probably the biggest isort feature I rely on that is missing from ruff.

@warsaw
Copy link

warsaw commented Jun 17, 2023

Do I understand correctly that order_by_type and length-sort are mutually exclusive options? If so, should I add an enum with the options?

I don't know the answer to that. My guess is that they are not mutually exclusive since different isort options control them, but I don't know how they interact if both are set.

For length-sort and length-sort-straight my guess is that the latter is a subset of the former, but I'm not sure how they interact if say the former were true and the latter were false.

@bluthej
Copy link
Contributor

bluthej commented Sep 25, 2023

Hey! I'm currently working on this one and I have a working implementation of the basic functionality. @suharnikov if you're still working on it as well I'll leave it for you of course :)

I do have one question though, I couldn't help but notice that there are a bunch of functions that take in a whole lot of isort settings (like format_imports in rules/isort/mod.rs), and I was wondering why they don't just take in a reference to an isort::settings::Settings? Is there a particular reason for that? I tried changing the function signatures and it seems to work alright that way too.
Having all the settings passed around separately yields a fair amount of tedious work when you're trying to add a new setting because you have to chase down all the places that they need to be added and change all the function signatures.
But if there's a good reason that I just haven't figured out I'd love to know about it 🙂

@charliermarsh
Copy link
Member

@bluthej - I think it's probably okay to refactor those methods to accept Settings. If you're able to work on that, do you mind putting it up as a standalone PR, with just that refactor, to make it easier to review and merge?

@bluthej
Copy link
Contributor

bluthej commented Sep 25, 2023

@charliermarsh absolutely! That's no problem at all :)

Then I'll start with the refactor and come back to this one when it's merged 😊

zanieb pushed a commit that referenced this issue Sep 26, 2023
## Summary

Pass around a `Settings` struct instead of individual members to
simplify function signatures and to make it easier to add new settings.

This PR was suggested in [this
comment](#1567 (comment)).

## Note on the choices

I chose which functions to modify based on which seem most likely to use
new settings, but suggestions on my choices are welcome!
@bluthej bluthej mentioned this issue Oct 15, 2023
charliermarsh pushed a commit that referenced this issue Oct 30, 2023
## Summary

Refactor for isort implementation. Closes #7738.

I introduced a `NatOrdString` and a `NatOrdStr` type to have a naturally
ordered `String` and `&str`, and I pretty much went back to the original
implementation based on `module_key`, `member_key` and
`sorted_by_cached_key` from itertools. I tried my best to avoid
unnecessary allocations but it may have been clumsy in some places, so
feedback is appreciated! I also renamed the `Prefix` enum to
`MemberType` (and made some related adjustments) because I think this
fits more what it is, and it's closer to the wording found in the isort
documentation.

I think the result is nicer to work with, and it should make
implementing #1567 and the like easier :)

Of course, I am very much open to any and all remarks on what I did!

## Test Plan

I didn't add any test, I am relying on the existing tests since this is
just a refactor.
@charliermarsh
Copy link
Member

#7963 is merged, which should pave the way for this.

@bluthej
Copy link
Contributor

bluthej commented Nov 2, 2023

Hey, so I got back to working on this, and that led me to the following question.

When using length-sort, isort will count the dots of relative imports in the length instead of sorting by distance and then length of the module name itself (without the dots). Is that the behavior we want? Or do we want to first sort based on distance and then length of module name?

@MichaReiser
Copy link
Member

Note: We should clarify whether the length means number of characters or the unicode width. Or are unicode names not supported by python (whether they're discouraged is another question ;))

@bluthej
Copy link
Contributor

bluthej commented Nov 2, 2023

That's a fair point!

On my system (Python 12 on Arch Linux) I cannot seem to be able to import a module with a non-ASCII character, I get a module not found error. However importing members with non-ASCII characters works. Now isort uses the built-in len, which yields the number of characters, not the number of bytes or whatever.

I guess we can use .chars().count() to get the number of characters for members and .len() for modules (I assume the len method is faster because it doesn't iterate over the characters, it just accesses the length stored in the str/String). We might not sort modules properly if there are non-ASCII characters in some of them but as far as I can tell that means the input is not valid Python so it's probably worth the speedup.

@flying-sheep
Copy link
Contributor

flying-sheep commented Nov 3, 2023

Now isort uses the built-in len, which yields the number of characters

What’s a “character”? I think len does codepoints and not, say, grapheme clusters …

For a more accurate length, I think we’d have to iterate grapheme clusters and check which ones are likely to render at double width. (Don’t let yourself be confused by “wide character” sometimes meaning storage size and sometimes literal width on screen)

E.g. check out how my terminal renders “平”, which of course is a valid Python identifier:

image

@bluthej
Copy link
Contributor

bluthej commented Nov 3, 2023

What’s a “character”? I think len does codepoints and not, say, grapheme clusters …

According to the Python docs:

Since Python 3.0, the language’s str type contains Unicode characters

and they add:

The Unicode standard describes how characters are represented by code points

So you are correct, len does codepoints. The Rust equivalent would be .chars().count() right?

For a more accurate length, I think we’d have to iterate grapheme clusters

I wouldn't necessarily say "more accurate", but the argument can be made that it's more "natural" for humans. The downside is that it would make ruff behave differently from isort, so there's a bit of a tradeoff.

E.g. check out how my terminal renders “平”, which of course is a valid Python identifier:

Interesting! I'm going to have to figure out why my example didn't work :)

@charliermarsh
Copy link
Member

I would prefer that we use Unicode width, which is what we use everywhere else (even though we use the term "length" -- we nearly renamed them all when we released the formatter but decided to defer that decision).

@charliermarsh
Copy link
Member

(You should be able to call .width() on any string using the use unicode_width::UnicodeWidthChar trait.)

@flying-sheep
Copy link
Contributor

That’s exactly the concept I was talking about btw 😃

@bluthej
Copy link
Contributor

bluthej commented Nov 4, 2023

(You should be able to call .width() on any string using the use unicode_width::UnicodeWidthChar trait.)

I think you meant the unicode_width::UnicodeWidthStr trait for strings :)
I switched to .width() and added some tests with non-ASCII characters.

What about relative imports? Should we follow isort and count the dots in the length?

charliermarsh pushed a commit that referenced this issue Nov 28, 2023
## Summary

Closes #1567.

Add both `length-sort` and `length-sort-straight` settings for isort.

Here are a few notable points:
- The length is determined using the
[`unicode_width`](https://crates.io/crates/unicode-width) crate, i.e. we
are talking about displayed length (this is explicitly mentioned in the
description of the setting)
- The dots are taken into account in the length to be compatible with
the original isort
- I had to reorder a few fields of the module key struct for it all to
make sense (notably the `force_to_top` field is now the first one)

## Test Plan

I added tests for the following cases:
- Basic tests for length-sort with ASCII characters only
- Tests with non-ASCII characters
- Tests with relative imports
- Tests for length-sort-straight
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers isort Related to import sorting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants