Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster cache lookup in broadcast! via nested Dicts and get! macro #6107

Merged
merged 3 commits into from
Mar 12, 2014

Conversation

carlobaldassi
Copy link
Member

See JuliaLang/LinearAlgebra.jl#89. Uses nested Dicts instead of indexing with a Tuple. Ugly but effective. The commit also avoids some code duplication via @eval tricks.
My only concern with this is that is the introduction of the @get! macro for Dicts (not exported though); however, it is critical for the performance boost.
I'll merge if there are no objections.

@kmsquire
Copy link
Member

FWIW, changing this to use the existing get! do-block interface still provides a significant speedup, but is slightly to somewhat slower than using @get! (and a bit uglier!):

With @get!:

add, 10 x 10: 0.039655778 s
add, 100 x 100: 0.033199164 s
add, 1000 x 1000: 0.031801015 s
add_bc, 10 x 10: 0.10324522 s
add_bc, 100 x 100: 0.034570543 s
add_bc, 1000 x 1000: 0.032525607 s
add_bc_prealloc, 10 x 10: 0.025206082 s
add_bc_prealloc, 100 x 100: 0.007657803 s
add_bc_prealloc, 1000 x 1000: 0.017938188 s

With get! do-blocks:

add, 10 x 10: 0.040053297 s
add, 100 x 100: 0.033253148 s
add, 1000 x 1000: 0.032585559 s
add_bc, 10 x 10: 0.119408809 s
add_bc, 100 x 100: 0.034852499 s
add_bc, 1000 x 1000: 0.03241402 s
add_bc_prealloc, 10 x 10: 0.032296623 s
add_bc_prealloc, 100 x 100: 0.007994024 s
add_bc_prealloc, 1000 x 1000: 0.017604986 s

Diff:

diff --git a/base/broadcast.jl b/base/broadcast.jl
index f467df6..89050ae 100644
--- a/base/broadcast.jl
+++ b/base/broadcast.jl
@@ -219,9 +219,15 @@ for (Bsig, Asig, gbf, gbb) in
         function broadcast!(f::Function, B::$Bsig, As::$Asig...)
             nd = ndims(B)
             narrays = length(As)
-            cache_f    = @get!(broadcast_cache, f,       Dict{Int,Dict{Int,Function}}())
-            cache_f_nd = @get!(cache_f,         nd,      Dict{Int,Function}())
-            func       = @get!(cache_f_nd,      narrays, $gbf($gbb, nd, narrays, f))
+            cache_f    = get!(broadcast_cache, f      ) do  
+                Dict{Int,Dict{Int,Function}}() 
+            end
+            cache_f_nd = get!(cache_f,         nd     ) do
+                Dict{Int,Function}() 
+            end
+            func       = get!(cache_f_nd,      narrays) do  
+                $gbf($gbb, nd, narrays, f)
+            end
             func(B, As...)
             B
         end

@carlobaldassi
Copy link
Member Author

@kmsquire: Yes, I had tested it at some point; I was under the impression that the gap was bigger, but in fact I see a performance difference of the same order as what you report, i.e. operations on 10x10 matrices with preallocation get a 1.2x to 1.3x penalty for using get! instead of @get!.

Maybe that's acceptable. But I actually still lean towards using the macro, since it's for internal use only (until passing around functions becomes faster) and 25% is not really negligible. Opinions?

@timholy
Copy link
Member

timholy commented Mar 11, 2014

It seems fine to me. One small comment: we may want to think about indexing things in terms of function, narrays, nd rather than function, nd, narrays. For the vast majority of uses, narrays=2.

Regarding @get!, I've always liked that proposal, and indeed copied it into Images. But I know others had differing opinions, and so I'll wait for others to chime in on this change.

@carlobaldassi
Copy link
Member Author

One small comment: we may want to think about indexing things in terms of function, narrays, nd rather than function, nd, narrays. For the vast majority of uses, narrays=2.

You're right, I have updated the PR.

@carlobaldassi
Copy link
Member Author

Travis failure is unrelated (it's in the arpack test).

Also, here's a link to the issue where @get! was first discussed: #1764

@kmsquire
Copy link
Member

Yeah, I think I like the macro syntax better as well (and it is faster). Cheers!

@carlobaldassi
Copy link
Member Author

Since there seem to be no objections, I'll rebase and merge.

Same as get! function but does not evaluate the default
argument unless needed.
Change broadcast! cache to use nested Dict instead of a Dict with Tuple
keys; use get! macro to access its fields.
Related to #6041.

Also reduces code duplication in broadcast! definitions via eval macro.
carlobaldassi added a commit that referenced this pull request Mar 12, 2014
Faster cache lookup in `broadcast!` via nested Dicts and get! macro
@carlobaldassi carlobaldassi merged commit 6b88580 into master Mar 12, 2014
@carlobaldassi carlobaldassi deleted the cb/fasterbroadcast branch March 12, 2014 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants