Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any(pred, itr) and all(pred, itr) do not short-circuit. #11750

Closed
sbromberger opened this issue Jun 18, 2015 · 32 comments
Closed

any(pred, itr) and all(pred, itr) do not short-circuit. #11750

sbromberger opened this issue Jun 18, 2015 · 32 comments
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@sbromberger
Copy link
Contributor

From https://groups.google.com/d/msg/julia-users/uGVQvhY64S4/KrBEYsMTUUIJ:

It seems 'any' calls 'mapreduce' which calls 'mapfoldl' which has a specialization that will stop computing in the case of searching for a single true or false value. However, it seems the call to mapreduce instead goes to a more specific method that doesn't implement this shortcut. If 'any' were to call mapfoldl directly, it would do the lazy computing.

It would be great to be able to short circuit any() and all().

@johnmyleswhite
Copy link
Member

I think there's an argument for having two functions. The current any, IIUC, emulates | rather than ||. That behavior isn't useless.

@sbromberger
Copy link
Contributor Author

Bikeshed on the name, or do we make it a default argument short_circuit=false?

@ScottPJones
Copy link
Contributor

cond_any and cond_all maybe? Always adding ; short_circuit=true seems like a bit of a pain... but maybe that's just me...

@ScottPJones
Copy link
Contributor

Also, this might just be a bug... if it was written to call mapreduce, which doesn't guarantee that it won't short-circuit, then I'd say this just needs to be fixed, and if you really want to force accessing everything, you'd need a ; short_circuit=false (the default being true).

@sbromberger
Copy link
Contributor Author

Yeah. Given the way it's being called, I'm not sure the 'evaluate everything' behavior is intentional. I think @lindahua might be able to provide some insight.

ETA: given https://groups.google.com/d/msg/julia-users/uGVQvhY64S4/OlztaVGQjW4J, I'm pretty sure this counts as "unintended behavior".

@jdlangs
Copy link
Contributor

jdlangs commented Jun 18, 2015

A possible complication is that any will work for non-boolean predicates even though it's not really documented for that.

julia> f(x) = (println(x); x)
f (generic function with 1 method)

julia> any(f, rand(0:1,5)) === 1
1
1
0
0
1
true

whereas, if it were to call mapfoldl directly:

julia> mapfoldl(f, Base.OrFun(), rand(0:1,5))
1
ERROR: TypeError: non-boolean (Int64) used in boolean context
 in mapfoldl at reduce.jl:310

This might be another advantage since it is enforcing that the predicate return a Bool.

@ScottPJones
Copy link
Contributor

Couldn't mapfoldl be changed to be more relaxed, and accept other values? What happens to performance if you have to add a wrapper function always for cases like that?

@jdlangs
Copy link
Contributor

jdlangs commented Jun 18, 2015

To relax mapfoldl you could just change f(x) && return true to f(x) != 0 && return true.

@ScottPJones
Copy link
Contributor

Yes, precisely, and I imagine that that would perform better than having to add a wrapper around the function, just to do f(x) != 0

@sbromberger
Copy link
Contributor Author

I don't like the idea of "allowing" non-bool results in the predicate. It appears to violate the documentation:

any(p, itr) → Bool
Determine whether predicate p returns true for any elements of itr.

and it also requires zero(typeof(f(x))) to be defined and equal to 0 which might not always be true.

I think it's better to error out if the predicate does not return Bool. You can always typecast f(x) if you need to.

@ScottPJones
Copy link
Contributor

I think it boils down to an argument over keeping things pure, or making them easier to use...
I've seen things decided both ways in Julia, like having all hex literals automatically be unsigned,
and their type determined by their length, while decimal literals are signed, and start out at Int64...
the rationale there was that it was more convenient...

@simonster
Copy link
Member

Ref #11540

@StefanKarpinski
Copy link
Member

It seems to me that predicates should not have side effects, so it's a reasonable optimization to evaluate past the first true for any or the first false for all. If that's the behavior, it should be documented, however. Short-circuiting at some point does make sense and it's not clear to me that going beyond the first point you know the answer is actually a performance win.

@nalimilan
Copy link
Member

I think I've read several times that it's your responsibility to write != 0 to make this explicit, rather than letting the conversion to boolean be done silently (and sometimes incorrectly). What's the point in changing the | and & methods for mapfoldl to accept any type?

Regarding the issue at stake: +1 for short-circuiting any.

@sbromberger
Copy link
Contributor Author

@StefanKarpinski

It seems to me that predicates should not have side effects, so it's a reasonable optimization to evaluate past the first true for any or the first false for all.

Did you mean "it's a reasonable optimization to NOT evaluate..."?

My f(x) is very expensive (per-vertex graph cycle detection) so short-circuiting is a big deal. Right now I'm doing

function has_cycles{T<:AbstractGraph}(g::T)
    for v in vertices(g)
        if is_cyclic(g, v)
            return true
        end
    end
    return false
end

which does the same thing, but it'd be much cleaner with any().

@StefanKarpinski
Copy link
Member

Ok, that's a pretty convincing case that this should short-circuit.

@jdlangs
Copy link
Contributor

jdlangs commented Jun 18, 2015

I would very surprised if anyone relied on the non-short-circuit behavior. Having the predicate do some sort of data logging is the only case I can think of.

@sbromberger
Copy link
Contributor Author

I would very surprised if anyone relied on the non-short-circuit behavior. Having the predicate do some sort of data logging is the only case I can think of.

... in which case they'd be able to use a standard iterator, right?

The thing is this: any() actually apparently DOES short-circuit in some cases: when you have an AbstractArray of Bool with more than 16 values, it will short-circuit (see https://groups.google.com/d/msg/julia-users/uGVQvhY64S4/OlztaVGQjW4J). Surely this wasn't the intended behavior.

@sbromberger
Copy link
Contributor Author

Just so that people are convinced:

julia> function bar(x)
         info("bar $x")
         x
       end
bar (generic function with 1 method)

julia> i1 = zeros(Bool,10);

julia> i2 = zeros(Bool,20);

julia> i1[2] = true;

julia> i2[2] = true;

julia> any(v->bar(v), i1)
INFO: bar false
INFO: bar true
INFO: bar false
INFO: bar false
INFO: bar false
INFO: bar false
INFO: bar false
INFO: bar false
INFO: bar false
INFO: bar false
true

julia> any(v->bar(v), i2)
INFO: bar false
INFO: bar true
true

@fcard
Copy link
Contributor

fcard commented Jun 19, 2015

Since it's uncommon to call any/all with a predicate and a boolean array, the introduction of the Bool parameter here and here may have been a mistake. I think. (If the specialization was meant to be for the non-predicate version, I'd expect the f parameter to be identified as the identity function, f::IdFun, and calling f(x) would be unnecessary.)

If the non-short-circuiting behavior is desired still, would an optional and general mechanism to break from reduce calls be useful? Like clojure's reduced (but maybe implemented with a predicate argument isreduced instead of a special form). That way, any and all can still use mapreduce without needing their own particular implementation.

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2015

@fcard a tip - if you hit y while viewing code in github, it will refresh the page you're looking at but replacing the branch in the url with a sha hash. That way people can look at your links in the future and they'll still be pointing to a line of code that makes sense, as opposed to something different if code gets added or removed earlier in the file.

@fcard
Copy link
Contributor

fcard commented Jun 19, 2015

@tkelman That's neat! I will do that from now on, thanks.

@yuyichao
Copy link
Contributor

I'm wondering how many people have learnt this trick from @tkelman. I certainly did not too long ago.

Edit: proof #11395 (comment)

@sbromberger
Copy link
Contributor Author

@yuyichao count me in that figure... that's a really great trick.

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2015

I learned it from @hayd originally I think.

@ScottPJones
Copy link
Contributor

Cool! Thanks for the tip!

@hayd
Copy link
Member

hayd commented Jun 19, 2015

IMO semantically "any" means using ||. If you need non-short-circuiting then you should be explicit e.g. use reduce(&, it) or mapreduce(f, &, it).... this is shorter/more concise than a kwarg. Just mention this alternative in the any/all docstring :)

....Or is the issue that you may not want to run this iteratively/in order for performance e.g. it's faster with SIMD but you may run over a few items (which are evaluated after the short-circuit). So not advertising it as short-circuit may be preferable?? :s


For some reason (not quite sure why) I'm reminded of python code like:

[f(x) for x in it]
# f has side effects, so use a list comprehension to evaluate! (creates a list [None, None, ...])
# There's a better / more explicit way!! (and faster too)
for x in it:
    f(x)

@StefanKarpinski
Copy link
Member

I'm totally sold. If someone wants to make a PR to change this I think it's pretty clear this should short circuit.

@mbauman mbauman added the help wanted Indicates that a maintainer wants help on an issue or pull request label Jun 20, 2015
@fcard
Copy link
Contributor

fcard commented Jun 20, 2015

I already know what to change, so let me go ahead and...

fcard added a commit to fcard/julia that referenced this issue Jun 20, 2015
@ScottPJones
Copy link
Contributor

Nice! Test to demonstrate it does short-circuit now?

@fcard
Copy link
Contributor

fcard commented Jun 20, 2015

@ScottPJones Worked on my local tests. If the tests here pass, it should be alright.

Here's the PR: #11774

fcard added a commit to fcard/julia that referenced this issue Jun 20, 2015
fcard added a commit to fcard/julia that referenced this issue Jul 8, 2015
fcard added a commit to fcard/julia that referenced this issue Jul 8, 2015
fcard added a commit to fcard/julia that referenced this issue Jul 8, 2015
fcard added a commit to fcard/julia that referenced this issue Jul 15, 2015
fcard added a commit to fcard/julia that referenced this issue Jul 18, 2015
@sbromberger
Copy link
Contributor Author

confirmed this is working via #11774.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests