Faster cache lookup in `broadcast!` via nested Dicts and get! macro #6107

carlobaldassi · 2014-03-11T00:16:22Z

See JuliaLang/LinearAlgebra.jl#89. Uses nested Dicts instead of indexing with a Tuple. Ugly but effective. The commit also avoids some code duplication via @eval tricks.
My only concern with this is that is the introduction of the @get! macro for Dicts (not exported though); however, it is critical for the performance boost.
I'll merge if there are no objections.

kmsquire · 2014-03-11T06:14:33Z

FWIW, changing this to use the existing get! do-block interface still provides a significant speedup, but is slightly to somewhat slower than using @get! (and a bit uglier!):

With @get!:

add, 10 x 10: 0.039655778 s
add, 100 x 100: 0.033199164 s
add, 1000 x 1000: 0.031801015 s
add_bc, 10 x 10: 0.10324522 s
add_bc, 100 x 100: 0.034570543 s
add_bc, 1000 x 1000: 0.032525607 s
add_bc_prealloc, 10 x 10: 0.025206082 s
add_bc_prealloc, 100 x 100: 0.007657803 s
add_bc_prealloc, 1000 x 1000: 0.017938188 s

With get! do-blocks:

add, 10 x 10: 0.040053297 s
add, 100 x 100: 0.033253148 s
add, 1000 x 1000: 0.032585559 s
add_bc, 10 x 10: 0.119408809 s
add_bc, 100 x 100: 0.034852499 s
add_bc, 1000 x 1000: 0.03241402 s
add_bc_prealloc, 10 x 10: 0.032296623 s
add_bc_prealloc, 100 x 100: 0.007994024 s
add_bc_prealloc, 1000 x 1000: 0.017604986 s

Diff:

diff --git a/base/broadcast.jl b/base/broadcast.jl
index f467df6..89050ae 100644
--- a/base/broadcast.jl
+++ b/base/broadcast.jl
@@ -219,9 +219,15 @@ for (Bsig, Asig, gbf, gbb) in
         function broadcast!(f::Function, B::$Bsig, As::$Asig...)
             nd = ndims(B)
             narrays = length(As)
-            cache_f    = @get!(broadcast_cache, f,       Dict{Int,Dict{Int,Function}}())
-            cache_f_nd = @get!(cache_f,         nd,      Dict{Int,Function}())
-            func       = @get!(cache_f_nd,      narrays, $gbf($gbb, nd, narrays, f))
+            cache_f    = get!(broadcast_cache, f      ) do  
+                Dict{Int,Dict{Int,Function}}() 
+            end
+            cache_f_nd = get!(cache_f,         nd     ) do
+                Dict{Int,Function}() 
+            end
+            func       = get!(cache_f_nd,      narrays) do  
+                $gbf($gbb, nd, narrays, f)
+            end
             func(B, As...)
             B
         end

carlobaldassi · 2014-03-11T09:28:47Z

@kmsquire: Yes, I had tested it at some point; I was under the impression that the gap was bigger, but in fact I see a performance difference of the same order as what you report, i.e. operations on 10x10 matrices with preallocation get a 1.2x to 1.3x penalty for using get! instead of @get!.

Maybe that's acceptable. But I actually still lean towards using the macro, since it's for internal use only (until passing around functions becomes faster) and 25% is not really negligible. Opinions?

timholy · 2014-03-11T11:43:29Z

It seems fine to me. One small comment: we may want to think about indexing things in terms of function, narrays, nd rather than function, nd, narrays. For the vast majority of uses, narrays=2.

Regarding @get!, I've always liked that proposal, and indeed copied it into Images. But I know others had differing opinions, and so I'll wait for others to chime in on this change.

carlobaldassi · 2014-03-11T13:40:21Z

One small comment: we may want to think about indexing things in terms of function, narrays, nd rather than function, nd, narrays. For the vast majority of uses, narrays=2.

You're right, I have updated the PR.

carlobaldassi · 2014-03-11T15:04:30Z

Travis failure is unrelated (it's in the arpack test).

Also, here's a link to the issue where @get! was first discussed: #1764

kmsquire · 2014-03-11T16:35:44Z

Yeah, I think I like the macro syntax better as well (and it is faster). Cheers!

carlobaldassi · 2014-03-12T10:18:47Z

Since there seem to be no objections, I'll rebase and merge.

Same as get! function but does not evaluate the default argument unless needed.

Change broadcast! cache to use nested Dict instead of a Dict with Tuple keys; use get! macro to access its fields. Related to #6041. Also reduces code duplication in broadcast! definitions via eval macro.

Faster cache lookup in `broadcast!` via nested Dicts and get! macro

carlobaldassi added 3 commits March 12, 2014 11:19

Add unexported get! macro for Dicts

5ec926d

Same as get! function but does not evaluate the default argument unless needed.

Faster cache lookup in broadcast! via get! macro

a8a721a

Change broadcast! cache to use nested Dict instead of a Dict with Tuple keys; use get! macro to access its fields. Related to #6041. Also reduces code duplication in broadcast! definitions via eval macro.

Add a note of caution to get! macro

56aa9eb

carlobaldassi added a commit that referenced this pull request Mar 12, 2014

Merge pull request #6107 from JuliaLang/cb/fasterbroadcast

6b88580

Faster cache lookup in `broadcast!` via nested Dicts and get! macro

carlobaldassi merged commit 6b88580 into master Mar 12, 2014

carlobaldassi deleted the cb/fasterbroadcast branch March 12, 2014 10:59

carlobaldassi mentioned this pull request Jan 25, 2015

Function caching issue with broadcast #9869

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster cache lookup in `broadcast!` via nested Dicts and get! macro #6107

Faster cache lookup in `broadcast!` via nested Dicts and get! macro #6107

carlobaldassi commented Mar 11, 2014

kmsquire commented Mar 11, 2014

carlobaldassi commented Mar 11, 2014

timholy commented Mar 11, 2014

carlobaldassi commented Mar 11, 2014

carlobaldassi commented Mar 11, 2014

kmsquire commented Mar 11, 2014

carlobaldassi commented Mar 12, 2014

Faster cache lookup in broadcast! via nested Dicts and get! macro #6107

Faster cache lookup in broadcast! via nested Dicts and get! macro #6107

Conversation

carlobaldassi commented Mar 11, 2014

kmsquire commented Mar 11, 2014

carlobaldassi commented Mar 11, 2014

timholy commented Mar 11, 2014

carlobaldassi commented Mar 11, 2014

carlobaldassi commented Mar 11, 2014

kmsquire commented Mar 11, 2014

carlobaldassi commented Mar 12, 2014

Faster cache lookup in `broadcast!` via nested Dicts and get! macro #6107

Faster cache lookup in `broadcast!` via nested Dicts and get! macro #6107