Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New simplify rule: use sum over len + if #13050

Open
janosh opened this issue Aug 22, 2024 · 5 comments
Open

New simplify rule: use sum over len + if #13050

janosh opened this issue Aug 22, 2024 · 5 comments
Labels
needs-decision Awaiting a decision from a maintainer rule Implementing or modifying a lint rule

Comments

@janosh
Copy link

janosh commented Aug 22, 2024

len([val for val in iterable if cond(val)]) # bad

sum(cond(val) for val in iterable) # good

example

len([i**2 for i in range(10) if i**2 < 36])  # bad
>>> 6
sum(i**2 < 36 for i in range(10))  # good
>>> 6
janosh added a commit to materialsproject/pymatgen that referenced this issue Aug 22, 2024
janosh added a commit to materialsproject/pymatgen that referenced this issue Aug 22, 2024
* simplify: prefer sum over len + if

astral-sh/ruff#13050

* fix ruff PD901 pandas-df-variable-name

* fix cast df_energies as float in cp2k parse_energy_file
@tjkuson
Copy link
Contributor

tjkuson commented Aug 22, 2024

Your example suggests using the sum of a generator over the length of a list comprehension. Not sure if this was deliberate, but if this rule were to be accepted, it should probably persist iterable type given the discourse (#10838).

@janosh
Copy link
Author

janosh commented Aug 22, 2024

That was deliberate. I would always prefer generator for brevity.

@DanielYang59
Copy link

DanielYang59 commented Aug 23, 2024

Randomly join the conversation wonderful work Janosh, but we should also be aware of the slight performance penalty for the sum method.

With script:

import timeit


def condition(val):
    return True if val % 2 else False


for seq_len in (10, 100, 1000, 10000, 100000):
    sequence = list(range(seq_len))

    execution_time_a = timeit.timeit(
        'len([val for val in sequence if condition(val)])',
        globals=globals(), number=1000
    )

    execution_time_b = timeit.timeit(
        'sum(condition(val) for val in sequence)',
        globals=globals(), number=1000
    )

    print(f"Run time of length {seq_len}:")
    print(f"With len if: {execution_time_a:.4f}")
    print(f"With sum: {execution_time_b:.4f}")
    print(f"Ratio: {execution_time_a / execution_time_b:.4f}\n")

I got (Python 3.12.5, MacOS 14.6.1, M3 chip):

Run time of length 10:
With len if: 0.0005
With sum: 0.0007
Ratio: 0.6783

Run time of length 100:
With len if: 0.0042
With sum: 0.0054
Ratio: 0.7937

Run time of length 1000:
With len if: 0.0321
With sum: 0.0345
Ratio: 0.9320

Run time of length 10000:
With len if: 0.2593
With sum: 0.3485
Ratio: 0.7441

Run time of length 100000:
With len if: 2.6005
With sum: 3.4534
Ratio: 0.7530

@dhruvmanila dhruvmanila added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels Aug 23, 2024
@tjkuson
Copy link
Contributor

tjkuson commented Aug 23, 2024

Running the same script on Python 3.12 with a sum of a list comprehension (sum([condition(val) for val in sequence)]) narrows the difference and is sometimes faster.

Run time of length 10:
With len if: 0.0006
With sum: 0.0007
Ratio: 0.8689

Run time of length 100:
With len if: 0.0048
With sum: 0.0052
Ratio: 0.9311

Run time of length 1000:
With len if: 0.0380
With sum: 0.0341
Ratio: 1.1151

Run time of length 10000:
With len if: 0.3036
With sum: 0.3147
Ratio: 0.9647

Run time of length 100000:
With len if: 3.0352
With sum: 3.1247
Ratio: 0.9713

On 3.9,

Run time of length 10:
With len if: 0.0008
With sum: 0.0008
Ratio: 0.9064

Run time of length 100:
With len if: 0.0062
With sum: 0.0067
Ratio: 0.9225

Run time of length 1000:
With len if: 0.0456
With sum: 0.0442
Ratio: 1.0327

Run time of length 10000:
With len if: 0.3862
With sum: 0.4296
Ratio: 0.8991

Run time of length 100000:
With len if: 3.8578
With sum: 4.2645
Ratio: 0.9047

@DanielYang59
Copy link

Thanks a lot for the input @tjkuson. Interestingly my previous results were also generated with Python 3.12 on MacOS 14.6.1 with M3 chip (sorry I forgot to mention these details, would update the results).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Awaiting a decision from a maintainer rule Implementing or modifying a lint rule
Projects
None yet
Development

No branches or pull requests

4 participants