Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short-circuiting of all on generators #19151

Closed
cstjean opened this issue Oct 28, 2016 · 8 comments
Closed

Short-circuiting of all on generators #19151

cstjean opened this issue Oct 28, 2016 · 8 comments
Labels
breaking This change will break code
Milestone

Comments

@cstjean
Copy link
Contributor

cstjean commented Oct 28, 2016

This was touched upon in #14782 (related: #11774), but there's no open issue for it. Currently,

all(x->(print(x); x<5), 1:9)
> 12345
> false

# but

all((print(x); x<5) for x in 1:9)
> 123456789
> false
@JeffBezanson
Copy link
Member

I think any and all should always short circuit. I don't understand why they call mapreduce_sc_impl and reduce; something to do with vectorization maybe? If there are special cases like BitArray that don't want short-circuiting (and where you can't tell the difference) those can be separate narrowly-applicable methods.

@fcard
Copy link
Contributor

fcard commented Oct 29, 2016

My memory of it is foggy, but I think that the reason why it doesn't short-circuit always is to keep things like this from happening:

julia> all(Any[true,false,1]) # if short-circuiting, returns false, if not, errors out on 1

And other non-boolean hijinks.
So it only short-circuits on collections with eltype(A)===Bool (actually, it used to not even let other types of collections to be used, but we've grown soft in our demands and now they just don't short-circuit)

My original implementation wouldn't even have allowed generators as arguments, but things have changed in this land since then, so maybe it's time to take another look at this decision... ...?

Related: #18969

@cstjean
Copy link
Contributor Author

cstjean commented Oct 29, 2016

Whether to short-circuit is a semantic choice. It should be done consistently, otherwise it's really hard to know when to expect it. My vote is also to always short-circuit.

My memory of it is foggy, but I think that the reason why it doesn't short-circuit always is to keep things like this from happening

all(a->a, Any[true, true, true, true, false, 10, 23]) is false at the moment, and doesn't trigger an error. all(Any[true, true, true, true, false, 10, 23]) is an error.

@fcard
Copy link
Contributor

fcard commented Oct 29, 2016

@cstjean that's because f in all(f, A) is expected to always return Bool and errors otherwise, so all assumes it's safe to short-circuit. I had some complex runtime type inference mechanism to check that the function was indeed a predicate, but it was scrapped, (thankfully, in retrospect. Function types would serve this purpose better) so we're just relying on your good faith.

So, passing a->a to all/any is meant to be undefined behavior. Maybe. The conversation about this started (#11750 (comment)) but I am not sure it concluded :P

@fcard
Copy link
Contributor

fcard commented Oct 29, 2016

Keep in mind that until 06c93ce any/all always short-circuited, or would not start at all and throw an error. That behavior made sense back then because there were no generators and the recommendation was always to use either all(pred, Coll) or all(BoolColl). Now, with generators, it might make sense to either special-case them or remove this safety check.

@JeffBezanson
Copy link
Member

Yes @fcard I think you're right about the type-based motivation for this. Looking at it again it doesn't seem important to me. There's a lot to be said for specifying the semantics of operations as obvious, simple definitions like

function all(itr)
    for x in itr
        !x && return false
    end
    return true
end

If you have a mixed-type array, the behavior is not surprising since it just corresponds to this obvious definition.

@fcard
Copy link
Contributor

fcard commented Oct 29, 2016

@JeffBezanson Right! Now that I think about it I think the reason for the complexity of the original code was to reuse the optimizations already in place and avoid some other performance pitfalls, (mostly slow anonymous functions) and the resulting semantics could've been seen as a plus since they encouraged better use of the short-circuiting behavior (e.g. the deprecation caught several uses of all(map(f, A))), but now that we have generators and can write all(f(x) for x in A), plus a lot of the performance concerns that existed then don't anymore, the definitions can now probably be simplified to their obvious implementations.

tl;dr I agree :P

@nalimilan
Copy link
Member

Since this change would be breaking, should we target it for 0.6?

@JeffBezanson JeffBezanson added this to the 0.6.0 milestone Nov 1, 2016
@JeffBezanson JeffBezanson added the breaking This change will break code label Nov 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code
Projects
None yet
Development

No branches or pull requests

4 participants