Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance when using in-place operator for arrays #7726

Closed
allanleal opened this issue Jul 25, 2014 · 7 comments
Closed

Poor performance when using in-place operator for arrays #7726

allanleal opened this issue Jul 25, 2014 · 7 comments

Comments

@allanleal
Copy link

The following code benchmarks the following operation B = -A', where both A and B are matrices. The first function uses in-place array operation B[:] = -A', while the second uses a combination of functions Base.copy_transpose! (why this function is not exported at Base like copy!?) and scale!, and the third a for-loop.

using Benchmark

function inplace()
    n = 1000
    A = rand(n, n)
    B = similar(A)
    B[:] = -A'
end

function alternative()
    n = 1000
    A = rand(n, n)
    B = similar(A)
    Base.copy_transpose!(B, 1:n, 1:n, A, 1:n, 1:n)
    scale!(-1.0, B)
end

function forloop()
    n = 1000
    A = rand(n, n)
    B = similar(A)
    for i = 1:n, j = 1:n
        B[i, j] = -A[j, i]
    end
end

compare([inplace, alternative, forloop], 10)

The result below shows that the in-place version is considerably slower than using a simple for-loop:

|-------|---------------|-----------|----------|--------------|
| Row # | Function      | Average   | Relative | Replications |
| 1     | "inplace"     | 0.0389499 | 1.56895  | 10           |
| 2     | "alternative" | 0.027447  | 1.1056   | 10           |
| 3     | "forloop"     | 0.0248255 | 1.0      | 10           |

I am aware of a similar issue #3424, which has been around for more than a year.

Is there any prevision for when this is going to be fixed?

Julia Version

julia> versioninfo()
Julia Version 0.3.0-rc1+54
Commit 4e92487 (2014-07-17 05:40 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
  WORD_SIZE: 64
  BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
@johnmyleswhite
Copy link
Member

Your inplace function doesn't operate in place: it generates two different temporary arrays that require memory allocation.

@timholy
Copy link
Member

timholy commented Jul 25, 2014

I don't think #3400 is related (and it's closed), did you mean to link to a different issue?

@johnmyleswhite is correct that your inplace is nowhere near in-place. The reason forloop is faster than alternative is that with alternative you have to make two passes through the array, whereas forloop performs both operations using only a single pass.

@allanleal
Copy link
Author

The issue number has been corrected: it is #3424.

Are there any plans for having operations like B[:] = some_array_expression faster, doing something similar to what the Eigen library does with template metaprogramming in C++?

@JeffBezanson
Copy link
Member

Yes, this is a well-known performance issue and we plan to address it. Though I can't promise we'll use template metaprogramming to do it :)
Captured in issues such as #249 #1115 #3424 #1168

@timholy
Copy link
Member

timholy commented Jul 25, 2014

There's the Devectorize package...

It's not an easy problem. At least a few years ago when I last used Eigen, I found that I usually had to turn on some warning (can't remember the details now) so that I was notified when it was allocating memory behind my back. I frequently had to intervene manually or occasionally submit patches to Eigen.

But if you're not dissuaded, expression-parsing is Julia's equivalent to template metaprogramming. So my bet is that your best hope is to contribute to the Devectorize package.

@lindahua
Copy link
Contributor

Currently, Devectorize is working at the macro level. Without knowing type information severely limits its capability. I think staged functions would help a lot when it lands. I may do a major overhaul of Devectorize when this happens.

@Jutho
Copy link
Contributor

Jutho commented Jul 25, 2014

Note that instead of copy_transpose! there is also transpose! (also unexported), which should be faster for larger matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants