Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix multiplications: stack overflow & crash! #3256

Closed
anoe opened this issue May 30, 2013 · 20 comments
Closed

Matrix multiplications: stack overflow & crash! #3256

anoe opened this issue May 30, 2013 · 20 comments
Labels
system:mac Affects only macOS

Comments

@anoe
Copy link

anoe commented May 30, 2013

In the latest OS X binary, 0.2-pre, simple matrix multiplications cause stack overflows and crashes - when the matrices get just a little large:

julia> rand(100,100)*rand(100,100)
ERROR: stack overflow
in gemm! at linalg/blas.jl:347
in gemm_wrapper at linalg/matmul.jl:299
in gemm_wrapper at linalg/matmul.jl:280
in * at linalg/matmul.jl:94

julia> rand(100,100)*rand(100,100)
Illegal instruction: 4

NOTE: this was reproduced multiple times on a freshly restarted iMac with 16GB RAM, running OS X 10.7.5

@ViralBShah
Copy link
Member

@xianyi is helping figure out this one. I will track it down and upload new binaries.

@ViralBShah
Copy link
Member

Are you using Version 0.2.0-1689.r72e1ffa8.dirty - the latest one? These used to crash for me but I had uploaded new binaries that force the use of single-threaded BLAS. This and larger examples no longer crash for me. If it is still crashing, I will switch to using Apple BLAS for the time being and upload new libraries.

Try ENV["OPENBLAS_NUM_THREADS"]=1 in your existing setup - that was essentially the fix.

@anoe
Copy link
Author

anoe commented May 31, 2013

Yes, that's the version I am using. I just tried ENV["OPENBLAS_NUM_THREADS"]=1 - but that doesn't help :-/

@xianyi
Copy link

xianyi commented May 31, 2013

Hi @ViralBShah ?

Did you use 64-bit integer? I noticed that OpenBLAS is built with INTERFACE64=1.

Xianyi

@ViralBShah
Copy link
Member

Yes, we have been doing this for a while now, and even have runtime checks to make sure that all these things match correctly. I will take one more look, just to be sure.

@ViralBShah
Copy link
Member

@xianyi Can you download julia 0.2-pre binaries and try them out? The launcher script forces OPENBLAS_NUM_THREADS=1, but if you delete it, perhaps you can reproduce the crash? It may be a bug on the julia side too - but I am just not sure how to figure this one out.

@ViralBShah
Copy link
Member

@JeffBezanson Did anything change recently in ccall, especially type promotion stuff?

@JeffBezanson
Copy link
Member

I don't believe so. And I've never seen this problem on linux.

@anoe
Copy link
Author

anoe commented Jun 2, 2013

Any progress on this one? :-)

@ViralBShah
Copy link
Member

I will check if Apple BLAS is multi-threaded and try use that while this is sorted out.

@ViralBShah
Copy link
Member

#3365 shows the difficulty of using Apple BLAS. No easy solution there.
#3369 is potentially the same the underlying issue.

@ViralBShah
Copy link
Member

@anoe Can you provide the details of your CPU?

@anoe
Copy link
Author

anoe commented Jun 12, 2013

My machine is a 21.5" iMac (mid 2011); Processor 2.5 GHz Intel Core i5

@anoe
Copy link
Author

anoe commented Jun 13, 2013

Interesting... It works fine on my MacBook Air (mid 2011); Processor 1.8 GHz Intel Core i7
Maybe this is strictly an i5 problem then?

@ViralBShah
Copy link
Member

I use a core i5 mobile without issues.

@ViralBShah
Copy link
Member

@xianyi I am using gfortran 4.8.1 from brew. Do you think that could be causing these problems that we are observing with openblas on mac? The openblas tests run fine for me.

@xianyi
Copy link

xianyi commented Jun 14, 2013

Hi @ViralBShah , @anoe ,

I will try julia master branch on my MacBook Air. I am afraid I cannot reproduce this issue.

@anoe , could you run this test on your MacBook?

  1. Download time_dgemm_int64.c from https://gist.github.com/xianyi/5778561
  2. Build OpenBLAS with make CC=clang NO_AFFINITY=1 INTERFACE64=1 BINARY=64 USE_THREAD=1
  3. compile & run
clang -o time_dgemm_int64 time_dgemm_int64.c /your/path/OpenBLAS/libopenblas.a
./time_dgemm_int64 500 500 500

Xianyi

@anoe
Copy link
Author

anoe commented Jun 14, 2013

@xianyi, you want me to run that test on my MacBook - which does not have this matrix mul. problem?
Or do you want me to run it on my iMac, which does?

Let me know :-)

@ViralBShah
Copy link
Member

Could you try on both?

@ViralBShah
Copy link
Member

This is fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:mac Affects only macOS
Projects
None yet
Development

No branches or pull requests

4 participants