Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Make any and all always short-circuit #19543

Merged
merged 5 commits into from
Dec 17, 2016

Conversation

ararslan
Copy link
Member

@ararslan ararslan commented Dec 9, 2016

Fixes #19151

This implements @JeffBezanson's suggested simplification. With this change, any and all will short circuit in all circumstances. Currently they short circuit for arrays (we even have a test for that) but not for generators.

Copy link
Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enjoy that this code is much simpler. Honestly, I think I'd much rather move all of our reduction code back to this sort of simple definition unless there's a really compelling reason. It's gotten pretty hairy and out of control.

@fcard
Copy link
Contributor

fcard commented Dec 9, 2016

This code here could be removed also: base/reduce.jl#L561-L569

@ararslan
Copy link
Member Author

Good catch, @fcard, thanks!

@ararslan
Copy link
Member Author

ararslan commented Dec 10, 2016

These changes make me wonder whether mapreduce_sc and mapreduce_no_sc are necessary...

Side note: The 32-bit Windows build failure looks like one of those intermittent Git issues

@fcard
Copy link
Contributor

fcard commented Dec 10, 2016

@ararslan Do you mean that as "mapreduce should always short-circuit as well" or "mapreduce should never short-circuit and just leave that to any/all"? Tbh I should have just left that function alone but back in the day it was hip to reduce everything to mapreduce. (or map everything to it, eh? eh?)

eltype(itr) <: Bool ?
mapreduce_sc_impl(f, |, itr) :
reduce(or_bool_only, false, itr)
function any(f::Predicate, itr)
Copy link
Contributor

@pabloferz pabloferz Dec 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also change mapreduce_sc_impl in terms of these for now and leave the cleaning of the rest of the reduce code for another PR. That is

mapreduce_sc_impl(f, ::typeof(|), itr) = any(f, itr)
mapreduce_sc_impl(f, ::typeof(&), itr) = all(f, itr)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as though if I change mapreduce to do that directly, there's no need for any of the other mapreduce_*sc* functions.

@ararslan
Copy link
Member Author

ararslan commented Dec 10, 2016

@fcard I mean that—assuming I'm following all the logic correctly—the mapreduce_*sc* functions are only being used for any, all, and mapreduce where the function is a predicate and the reduction operator is | or &. All of these cases short circuit. mapreduce for | and & could just use any/all, rendering all of the mapreduce_*sc* functions unused.

or map everything to it

groan 🙂

@ararslan
Copy link
Member Author

As an aside, I just realized that in the linked issue, the reason why the generator wasn't being short circuited was because the eltype wasn't being inferred as Bool. It was being inferred as Any, so mapreduce_no_sc was called instead.

@fcard
Copy link
Contributor

fcard commented Dec 10, 2016

The only thing that short-circuits other than any and all is reduce(|/&, itr) where eltype(itr) == Bool, which indeed isn't a very impressive case for keeping this in Base. My intention when initially adding those was to eventually extend the number of cases where short-circuiting happens safely, which could be accomplished generically with traits like so:

function mapreduce{R}(f::PureFunction, op::ShortCircuiting{R}, init::R, itr)
  local r::R = init
  local terminal = terminal_values(op,R)
  for x in itr
    r = op(r, f(x))
    r in terminal && return r
  end
  return r
end

ShortCircuiting{R} being a subtrait of PureFunction (explains itself) that requires the function type Op=typeof(op) to be in a terminal_values(op::Op, ::Type{R}) method, which returns a S::Collection{R} such that ∀(x∈S, y)(op(x,y) == x).

...But that's not happening any time soon is it? Even if it eventually becomes possible and is something wanted in Base (it could also go in a package), it's not going to reuse any of the code here. (my original code looked a bit more like this, but really I shouldn't have based it in a dream so far in the future)

What I am trying to say is that unless anyone cares about short-circuiting reduce(|/&, itr) despite there already existing any/all, there is indeed no point in keeping the mapreduce_sc definitions.

... Goodbye, my beautiful children. I hope you served your purpose well 😢>


This is the third or fourth reason that I give for making the code like this, but they were all true in different ways:

  • The reason I didn't go for the simple algorithm was that back then anonymous functions were slow and also there were so many optimizations for mapreduce that I felt like I needed to fall back to that (plus it was hip and cool and everybody in townreduce.jl was doing it)
  • The reason I made short-circuiting so restrictive was because if an f in any/all (f, itr) 1) caused side-effects or 2) was type unstable and returned a non-bool at any point that caused any/all to throw an error, then it would mean that short-circuiting could alter the behavior of the program, and that's terrible.
  • After the goal was set to both extend short-circuiting to mapreduce and filter out anything that could make short-circuiting a behavior-altering property, I came up with this totally uber cool algorithm that could ensure that not only for any and all but anything passed to mapreduce, but I couldn't write it just yet so I made a restricted prototype, which is what is being put to rest here. (back then I thought arrow functions were the way to go, but eventually I concluded traits were more doable and simpler)

Sorry if I am being long-winded and over-explaining myself, I am trying to cope, it was my first big contribution to anything... But I still wholeheartedly approve this simplification.

@ararslan
Copy link
Member Author

ararslan commented Dec 11, 2016

@fcard You have every reason to be proud of your children, for they have served valiantly. You've made an invaluable contribution to Julia through them, and they will live eternally in Julia versions past.

@StefanKarpinski Would you mind taking another look at this? It's changed a fair bit since you submitted your review.

Edit: Oops! Ambiguities galore. Will fix. Fixed.

mapreduce(f, op::ShortCircuiting, n::Number) = n
mapreduce(f, op::ShortCircuiting, itr::AbstractArray) = mapreduce_sc(f,op,itr)
mapreduce(f, op::ShortCircuiting, itr::Any) = mapreduce_sc(f,op,itr)
mapreduce(f, op::Union{typeof(&), typeof(|)}, n::Number) = n
Copy link
Contributor

@fcard fcard Dec 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually should be

mapreduce(f, op::Union{typeof(&), typeof(|)}, n::Number) = f(n)
                                                           ^^^^

I just now noticed this dumdum mistake of mine (thankfully calling mapreduce(f, rand([&,|]), x::Number) instead of f(x) never took off, it had all the potential to be the new x |> f)

But maybe you should just remove all short-circuiting mapreduce methods all together for now, since the only thing using Predicate is any/all and nothing else will short-circuit. You could also test to see if thePredicate thing still has any performance advantages and remove that as well if it doesn't.

Or you could just focus on the any/all part and leave me to clean up after my own mess in a followup PR, all valid options in my book :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah, I noticed the f(n) thing as I was going through here. I've simplified things in this PR but if people end up wanting me to undo stuff here, I'll leave the rest to you. 😉

@martinholters
Copy link
Member

Can't we get rid of Predicate altogether? If we make the API that any and all always short-circuit, while mapreduce never does, that seems pretty usable and also allows for the simplest implementation, I guess.

Presently, Predicate is used to annotate a function to mark it as pure and returning a bool by hand. So the current logic only allows optimization for mapreduce(p::Predicate, f::Function, ...) where, when the code is being written, the programmer knows that the p is a valid predicate (hence he has wrapped it in a Predicate), while he cannot know what f is, hence cannot use any or all. Then, in case f is & or |, short-circuiting would happen automatically. Seems to be a rare situation, probably not worth all the code complexity. Or am I missing something?

@ararslan
Copy link
Member Author

@martinholters Sounds excellent to me! I'll make those changes later today. We should probably run Nanosoldier afterward as well, just in case.

@ararslan ararslan force-pushed the aa/all-short-circuit branch 2 times, most recently from e0e7fe1 to e59a277 Compare December 13, 2016 00:54
@Sacha0
Copy link
Member

Sacha0 commented Dec 13, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@martinholters
Copy link
Member

Those regressions look unrelated. Is your branch missing the commits from #19512/#19491? That might be the reason.

@KristofferC
Copy link
Member

KristofferC commented Dec 13, 2016

Nanosoldier benchmarks on the merge commit if I recall correctly. However, these benchmarks have popped up before so I think they are a bit noisy.

@ararslan
Copy link
Member Author

ararslan commented Dec 13, 2016

Those regressions look unrelated

That's what I was thinking but I'm never sure...

Is your branch missing the commits from #19512/#19491? That might be the reason.

Quite possibly, though I thought I had rebased from master before the most recent push. And if Nanosoldier benchmarks on the merge commit that should make it irrelevant anyway.

It this okay then?

Also: Does this warrant a news item? It changes the behavior such that mapreduce never short circuits whereas any and all always do. Previously mapreduce would sometimes short circuit and any/all would do so when the element type of the iterator was inferrable as Bool.

@tkelman
Copy link
Contributor

tkelman commented Dec 13, 2016

yes, should go in news

@ararslan
Copy link
Member Author

And gone in NEWS it has. 🙂 Thanks, @tkelman.

@@ -74,6 +74,10 @@ Library improvements

* New `titlecase` function, which capitalizes the first character of each word within a string ([#19469]).

* `any` and `all` now always short-circuit, and `mapreduce` never short-circuits ([#19543]).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also mention reduce here, since it is equally affected by the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, added. Thanks!

or_bool_only(a::Bool, b::Bool) = a|b
and_bool_only(a, b) = nonboolean_error(:all, :&)
and_bool_only(a::Bool, b::Bool) = a&b

"""
any(p, itr) -> Bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention short-circuiting behaviour in the docs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with an example, e.g.:

any(i -> (println(i); i > 3), 1:10)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Look okay? Thanks!

@@ -524,6 +476,7 @@ end
any(itr) -> Bool

Test whether any elements of a boolean collection are `true`.
Not all items in `itr` will be visited if a `true` value is found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "Returns true as soon as the first true value is found in itr (short circuiting)." ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mix of imperative ("test whether") and non-imperative ("returns true") seems a little odd to me but maybe it's fine. How about something like

Test whether any elements of a boolean collection are true, returning true as soon as the first true value in itr is encountered (short-circuiting).

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, went with that then. Thanks!

@ararslan
Copy link
Member Author

I just now realized that any! and all! exist. They call mapreduce! with | and & as the reducer, respectively. mapreduce! doesn't have the short-circuiting methods that its cousin mapreduce had, so any! and all! have never short-circuited AFAICT. Assuming I correctly understanding what those functions are supposed to do, those shouldn't change, right? It seems like if you short circuit then you may not fill the destination array entirely. Just want to make sure...

@ararslan
Copy link
Member Author

Is this good to go then?

@tkelman
Copy link
Contributor

tkelman commented Dec 17, 2016

any! and all! are only defined for reducing across a given dimension, right? so they could short circuit in that dimension but still have to fill each element in the orthogonal dimensions. That's probably best left to a separate PR if we decide we want to change that though.

@ararslan
Copy link
Member Author

ararslan commented Dec 17, 2016

Kind of? any!(r, A) detects true values in A along the singleton dimensions in r and overwrites r; there's no dimension specification as such. Disclaimer: I don't fully understand what that means or what the function actually does.

Agreed regarding those changes being a separate PR, if they're necessary.

@tkelman
Copy link
Contributor

tkelman commented Dec 27, 2016

We should probably put back Predicate and ShortCircuiting as non-exported deprecated bindings, since removing them broke several packages.

@ararslan
Copy link
Member Author

ararslan commented Dec 27, 2016

If you can point me to which packages in particular broke I can submit PRs. I've done so for Lazy.jl and ColorVectorSpace.jl so far.

@tkelman
Copy link
Contributor

tkelman commented Dec 27, 2016

Grepping registered packages, it looks like ColorVectorSpace was the only user of ShortCircuiting, and aside from Lazy the only package that uses Base.Predicate is DistributedArrays. Several packages define their own type called Predicate, but they won't notice this.

@ararslan
Copy link
Member Author

Cool, I just submitted a PR to DistributedArrays, so I think I've got everything covered now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants