-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary list comprehension in sum
, min
, max
#3259
Comments
Hmm, it looks like these used to be covered by
C407 was dropped in its entirety in 2021, mostly due to bugs introduced by the subtle behavior changes: adamchainz/flake8-comprehensions#247 There was a proposal in 2022 to add it back for The proposal did not get implemented because Short-circuiting aside, they all reduce memory consumption. |
This would be a great candidate for an opinionated RUFF rule though as they are useful check for the many cases even if do have some false positive. I would even go so far as to recommend including itertools functions as well since they all take in iterables anyway. |
The more, the merry! :) |
We should also include |
An autofix for this is not particularly safe of course - it can change the behavior of code even when there is no short circuiting involved: COUNTER = 0
class Addable:
def __init__(self, i, source=None):
global COUNTER
COUNTER += 1
self.i = i
print(f"__init__({self.i}, source={source})")
def __add__(self, other):
return Addable(COUNTER + self.i + other.i, source="__add__")
def __radd__(self, other):
return Addable(
COUNTER + self.i + (other.i if isinstance(other, Addable) else other),
source="__radd__",
)
def __repr__(self):
return str(self.i)
print("List comprehension:")
COUNTER = 0
result_listcomp = sum([Addable(i, source="listcomp") for i in range(3)])
print(result_listcomp)
print()
print("Generator:")
COUNTER = 0
result_generator = sum(Addable(i, source="generator") for i in range(3))
print(result_generator)
print()
assert result_listcomp == result_generator Output:
|
The rationale for this lint in the case of For cases like
The current documentation for the any/all version of the lint is a bit misleading, IMO. It says the performance benefit is from avoiding overhead of list creation, but it uses an example case that (accidentally?) maximizes the short-circuiting benefits (the first item in
The reason for this is that while the comprehension does have overhead to create a list, generators also have quite a bit of interpreter overhead, as they have to yield on each iteration; in particular, a function like the I don't think we should over-pivot here based on current CPython performance, as performance characteristics change over time, and there's a reasonable possibility that faster-cpython may be able to improve the performance of generators in future versions. And even today, the generator version will be much more memory-efficient for large data than the comprehension one. But I do want to double-check: given the above performance context, do we still think it makes sense to extend this autofix to |
I'm a bit torn. I'm not sure if I'm convinced of the "unsafety" from #3259 (comment) — I feel like we might have other safe fixes that can be broken by operator overloads and global state. Is there more that makes it semantically unsafe? |
Wow, learned a lot from your comment -- thank you! I think the "unsafety" is okay given that we already have the same problem (in theory) with Perhaps we leave this as an open question for a day or so, though, to give others on the issue time to chime in. |
I don't think there's more than that; the unsafety is due to laziness. With the comprehension version, the value expression (i.e. The existing autofix for My feeling is that unsafety is unsafety, even if you have to use advanced features of the language to trigger it -- people stretch Python in all kinds of ways! So I would consider this autofix unsafe in all variants. But I'm not opposed to extending it anyway! It makes sense to me that lots of code might not really care about the perf either way and may as well just be simpler. I'll push a docs-only PR for now to clarify the current docs, and wait a day or so on the actual autofix extension. |
Sounds good to me! Thanks for the response. |
…-all (C419) (#10744) Ref #3259; see in particular #3259 (comment) ## Summary Improve the accuracy of the docs for this lint rule/fix. ## Test Plan Generated the docs locally and visited the page for this rule: ![Screenshot 2024-04-02 at 4 56 40 PM](https://github.com/astral-sh/ruff/assets/61586/64f25cf6-edfe-4682-ac8e-7e21b834f5f2) --------- Co-authored-by: Zanie Blue <[email protected]>
Agree with @carljm's insightful and well presented analysis. Thanks for fixing the original docs - yes, the Also agree with @charliermarsh's comment "yes, we should still add sum, min, and max" and the reasoning. The performance penalty is small and may change as CPython evolves, and I wonder to what extent it's offset by time savings at the next GC due to the reduced memory consumption. I also like the simpler code. I think they should all be marked unsafe, as the "all" and "any" fixes can change both execution extent and order, and the others at least the order. If there are other autofixes currently marked as safe which could similarly change execution semantics in the context of dunder overload magic (with or without side effects on shared state), given how widespread that is, my vote would be to mark them unsafe too. |
Attached PR adds I'm inclined to call that complete for this issue, and defer itertools and more_itertools funcs. One thought there (thanks @AlexWaygood) is that in the future if ruff gains type support, we could do this generally for any function typed as taking an iterable, which would naturally adapt to future additions to itertools. That seems a lot nicer than hardcoding lists of functions for modules that may change over time. |
…check (C419) (#10759) Fixes #3259 ## Summary Renames `UnnecessaryComprehensionAnyAll` to `UnnecessaryComprehensionInCall` and extends the check to `sum`, `min`, and `max`, in addition to `any` and `all`. ## Test Plan Updated snapshot test. Built docs locally and verified the docs for this rule still render correctly.
…-all (C419) (astral-sh#10744) Ref astral-sh#3259; see in particular astral-sh#3259 (comment) ## Summary Improve the accuracy of the docs for this lint rule/fix. ## Test Plan Generated the docs locally and visited the page for this rule: ![Screenshot 2024-04-02 at 4 56 40 PM](https://github.com/astral-sh/ruff/assets/61586/64f25cf6-edfe-4682-ac8e-7e21b834f5f2) --------- Co-authored-by: Zanie Blue <[email protected]>
…check (C419) (astral-sh#10759) Fixes astral-sh#3259 ## Summary Renames `UnnecessaryComprehensionAnyAll` to `UnnecessaryComprehensionInCall` and extends the check to `sum`, `min`, and `max`, in addition to `any` and `all`. ## Test Plan Updated snapshot test. Built docs locally and verified the docs for this rule still render correctly.
It's not part of PIE802 so maybe this should go somewhere else but I noticed #3149 doesn't include built-in aggregators like
sum()
,min()
,max()
.Suggestion is to auto-fix:
Maybe this is covered by
flake8-comprehensions
? Looked but couldn't find it.The text was updated successfully, but these errors were encountered: