New simplify rule: use `sum` over `len` + `if` #13050

janosh · 2024-08-22T09:34:26Z

len([val for val in iterable if cond(val)]) # bad

sum(cond(val) for val in iterable) # good

example

len([i**2 for i in range(10) if i**2 < 36])  # bad
>>> 6
sum(i**2 < 36 for i in range(10))  # good
>>> 6

astral-sh/ruff#13050

* simplify: prefer sum over len + if astral-sh/ruff#13050 * fix ruff PD901 pandas-df-variable-name * fix cast df_energies as float in cp2k parse_energy_file

tjkuson · 2024-08-22T18:40:47Z

Your example suggests using the sum of a generator over the length of a list comprehension. Not sure if this was deliberate, but if this rule were to be accepted, it should probably persist iterable type given the discourse (#10838).

janosh · 2024-08-22T19:31:16Z

That was deliberate. I would always prefer generator for brevity.

DanielYang59 · 2024-08-23T03:34:20Z

Randomly join the conversation wonderful work Janosh, but we should also be aware of the slight performance penalty for the sum method.

With script:

import timeit


def condition(val):
    return True if val % 2 else False


for seq_len in (10, 100, 1000, 10000, 100000):
    sequence = list(range(seq_len))

    execution_time_a = timeit.timeit(
        'len([val for val in sequence if condition(val)])',
        globals=globals(), number=1000
    )

    execution_time_b = timeit.timeit(
        'sum(condition(val) for val in sequence)',
        globals=globals(), number=1000
    )

    print(f"Run time of length {seq_len}:")
    print(f"With len if: {execution_time_a:.4f}")
    print(f"With sum: {execution_time_b:.4f}")
    print(f"Ratio: {execution_time_a / execution_time_b:.4f}\n")

I got (Python 3.12.5, MacOS 14.6.1, M3 chip):

Run time of length 10:
With len if: 0.0005
With sum: 0.0007
Ratio: 0.6783

Run time of length 100:
With len if: 0.0042
With sum: 0.0054
Ratio: 0.7937

Run time of length 1000:
With len if: 0.0321
With sum: 0.0345
Ratio: 0.9320

Run time of length 10000:
With len if: 0.2593
With sum: 0.3485
Ratio: 0.7441

Run time of length 100000:
With len if: 2.6005
With sum: 3.4534
Ratio: 0.7530

tjkuson · 2024-08-23T09:52:20Z

Running the same script on Python 3.12 with a sum of a list comprehension (sum([condition(val) for val in sequence)]) narrows the difference and is sometimes faster.

Run time of length 10:
With len if: 0.0006
With sum: 0.0007
Ratio: 0.8689

Run time of length 100:
With len if: 0.0048
With sum: 0.0052
Ratio: 0.9311

Run time of length 1000:
With len if: 0.0380
With sum: 0.0341
Ratio: 1.1151

Run time of length 10000:
With len if: 0.3036
With sum: 0.3147
Ratio: 0.9647

Run time of length 100000:
With len if: 3.0352
With sum: 3.1247
Ratio: 0.9713

On 3.9,

Run time of length 10:
With len if: 0.0008
With sum: 0.0008
Ratio: 0.9064

Run time of length 100:
With len if: 0.0062
With sum: 0.0067
Ratio: 0.9225

Run time of length 1000:
With len if: 0.0456
With sum: 0.0442
Ratio: 1.0327

Run time of length 10000:
With len if: 0.3862
With sum: 0.4296
Ratio: 0.8991

Run time of length 100000:
With len if: 3.8578
With sum: 4.2645
Ratio: 0.9047

DanielYang59 · 2024-08-23T10:05:01Z

Thanks a lot for the input @tjkuson. Interestingly my previous results were also generated with Python 3.12 on MacOS 14.6.1 with M3 chip (sorry I forgot to mention these details, would update the results).

janosh added a commit to materialsproject/pymatgen that referenced this issue Aug 22, 2024

simplify: prefer sum over len + if

18cfe85

astral-sh/ruff#13050

dhruvmanila added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New simplify rule: use `sum` over `len` + `if` #13050

New simplify rule: use `sum` over `len` + `if` #13050

janosh commented Aug 22, 2024

tjkuson commented Aug 22, 2024

janosh commented Aug 22, 2024

DanielYang59 commented Aug 23, 2024 •

edited

Loading

tjkuson commented Aug 23, 2024 •

edited

Loading

DanielYang59 commented Aug 23, 2024

New simplify rule: use sum over len + if #13050

New simplify rule: use sum over len + if #13050

Comments

janosh commented Aug 22, 2024

tjkuson commented Aug 22, 2024

janosh commented Aug 22, 2024

DanielYang59 commented Aug 23, 2024 • edited Loading

tjkuson commented Aug 23, 2024 • edited Loading

DanielYang59 commented Aug 23, 2024

New simplify rule: use `sum` over `len` + `if` #13050

New simplify rule: use `sum` over `len` + `if` #13050

DanielYang59 commented Aug 23, 2024 •

edited

Loading

tjkuson commented Aug 23, 2024 •

edited

Loading