Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mapaccumulate method #21152

Open
CarloLucibello opened this issue Mar 24, 2017 · 14 comments
Open

add mapaccumulate method #21152

CarloLucibello opened this issue Mar 24, 2017 · 14 comments
Labels
feature Indicates new feature / enhancement requests fold sum, maximum, reduce, foldl, etc.

Comments

@CarloLucibello
Copy link
Contributor

since reduce -> mapreduce and foldl -> mapfoldl, I think that also accumulate deserves an extension which can accept a function argument. See also #21150 for a related issue (more specific)

@bramtayl
Copy link
Contributor

bramtayl commented Mar 24, 2017

Why not just deprecate mapf functions for f function methods for generators? Speaking of generators, I think the Base.Generator constructor should be exported.

@bramtayl
Copy link
Contributor

In fact, that's something I'd be interested in making a PR for if anyone is interested.

@CarloLucibello
Copy link
Contributor Author

CarloLucibello commented Mar 24, 2017

could the fact that

julia> eltype(i for i in 1:10)
Any

have some impact on performances?
BTW should an issue be filed regarding this behaviour of even the simplest generators? Isn't it a potential cause of type instabilitites?

@ararslan
Copy link
Member

I am very, very adamantly against deprecating map and the passing of functions as arguments. This construct is absolutely central to functional programming and is something that Julia does exceptionally well. It's one of the things I find most appealing about the language. I tend to think of Generators as a "use this when expressing something would otherwise be difficult." The fact that generators exist permits people to use that Pythonic style if they want. But we should absolutely not enforce that as the way to map functions.

@CarloLucibello
Copy link
Contributor Author

@ararslan what is your thought on mapreduce, mapfoldl, mapaccumulate...? Could they be sostituted by reduce(f(v) for v in v) and so on?

@bramtayl
Copy link
Contributor

I'm not suggesting getting rid of function passing. In fact I like that syntax. Instead I'm suggesting syntax like reduce(Generator(x -> x + 1, [1, 2]), +)

@ararslan
Copy link
Member

what is your thought on mapreduce, mapfoldl, mapaccumulate...? Could they be sostituted by reduce(f(v) for v in v) and so on?

We can allow them but we should absolutely not deprecate anything in favor of them.

I'm suggesting syntax like reduce(Generator(x -> x + 1, [1, 2]), +)

I'd be fine with allowing that (but with the argument order reversed; + comes first in reduce) but I still think it should not replace anything.

@bramtayl
Copy link
Contributor

Julia's long standing policy, from what I understand, is to avoid multi-concept functions because they are incompletely factored. In fact, Julia is so adamant about it as to nix using _ in function names. Well, anyway, this is a clear case, it seems to me, of a set of easily "factorable" functions.

@martinholters
Copy link
Member

As long as creating Generators allocates, that causes overhead which function-to-be-mapped-as-argument versions don't have.

@bramtayl
Copy link
Contributor

I actually don't quite understand this. Is it possible to use things like generators and avoid allocation?

@ararslan
Copy link
Member

Functions that accomplish multiple tasks at once are often good for efficiency. Plus it can't really be a long-standing policy to avoid them when, e.g. mapreduce has been around since at least 0.1.

@TotalVerb
Copy link
Contributor

There's no reason why generators would cause type instability here, as we don't need to know the eltype to implement reduce, etc. Generators are lazy so there is no big efficiency difference either. We should benchmark the various approaches before making assumptions about performance.

@TotalVerb
Copy link
Contributor

@martinholters I think this doesn't really matter. The extra cost of a heap allocation is negligible unless the arrays being worked with are nearly empty and the operation is nearly free, which seems like a niche use case.

@CarloLucibello
Copy link
Contributor Author

CarloLucibello commented Mar 25, 2017

a small experiment shows that the implementation for generators (or generic iterators?) of reduce needs some love. On master:

julia> r=rand(10000);

julia> @benchmark mapreduce(x->x^2,+,r)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     4.349 μs (0.00% GC)
  median time:      4.907 μs (0.00% GC)
  mean time:        5.596 μs (0.00% GC)
  maximum time:     1.861 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> @benchmark reduce(+,(r^2 for r in r))
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  3
  --------------
  minimum time:     30.702 μs (0.00% GC)
  median time:      42.943 μs (0.00% GC)
  mean time:        44.460 μs (0.00% GC)
  maximum time:     1.170 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

#######
julia> @benchmark mapreduce(x->x^2,+,$r)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.296 μs (0.00% GC)
  median time:      3.752 μs (0.00% GC)
  mean time:        4.254 μs (0.00% GC)
  maximum time:     583.003 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> g = (r^2  for r in r)
Base.Generator{Array{Float64,1},##47#48}(#47, [0.932495, 0.325073, 0.240037, 0.576765, 0.173194, 0.820185, 0.854522, 0.481677, 0.567246, 0.100028  …  0.862458, 0.13161, 0.220919, 0.524313, 0.904255, 0.339595, 0.55281, 0.942101, 0.0240301, 0.892888])

julia> @benchmark reduce(+,$g)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     13.163 μs (0.00% GC)
  median time:      14.126 μs (0.00% GC)
  mean time:        15.798 μs (0.00% GC)
  maximum time:     3.626 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

In this case the generators' construction time has a huge impact.
I'm not sure why those $ in the last two benchmarks matter (tried multiple times with or without)

@brenhinkeller brenhinkeller added feature Indicates new feature / enhancement requests fold sum, maximum, reduce, foldl, etc. labels Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests fold sum, maximum, reduce, foldl, etc.
Projects
None yet
Development

No branches or pull requests

6 participants