Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBLAS wrongly disables AVX #5077

Closed
rolicot opened this issue Dec 9, 2013 · 10 comments
Closed

OpenBLAS wrongly disables AVX #5077

rolicot opened this issue Dec 9, 2013 · 10 comments
Labels
performance Must go faster upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@rolicot
Copy link

rolicot commented Dec 9, 2013

I have an IvyBridge CPU and Linux 3.2 on Debian wheezy (3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux), therefore I should have full AVX support. Nevertheless, OpenBLAS build (USE_SYSTEM_BLAS=0) falls back to NEHALEM. If I build with unset OPENBLAS_TARGET_ARCH, then NEHALEM is detected, and if I set OPENBLAS_TARGET_ARCH=SANDYBRIDGE, then I get:

deps/openblas-v0.2.8/Makefile.conf_last (selected lines):

CORE=SANDYBRIDGE
LIBCORE=sandybridge
NUM_CORES=2
SANDYBRIDGE=1

but any time I run julia, I get this warning:
OpenBLAS : Your OS does not support AVX instructions. OpenBLAS is using Nehalem kernels as a fallback, which may give poorer performance.

This is my CPU: cat /proc/cpuinfo

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 58
model name  : Intel(R) Pentium(R) CPU 2030M @ 2.50GHz
stepping    : 9
microcode   : 0x19
cpu MHz     : 1200.000
cache size  : 2048 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave lahf_lm arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips    : 4987.81
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Julia version: git label 'v0.2.0' (05c6461)

@pao
Copy link
Member

pao commented Dec 9, 2013

cc @xianyi

@staticfloat
Copy link
Member

@ayehow if you go into the directory deps/openblas-v0.2.8 and run ./getarch 1, what does it print out?
This is what it prints out on my OSX 10.9 MacbookPro:

$ ./getarch 1
#define SANDYBRIDGE
#define L1_DATA_SIZE 32768
#define L1_DATA_LINESIZE 64
#define L2_SIZE 262144
#define L2_LINESIZE 64
#define DTB_DEFAULT_ENTRIES 64
#define DTB_SIZE 4096
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4_1
#define HAVE_SSE4_2
#define HAVE_AVX
#define CORE_SANDYBRIDGE

@rolicot
Copy link
Author

rolicot commented Dec 9, 2013

@staticfloat I get exactly the same:

deps/openblas-v0.2.8$ ./getarch 1
#define SANDYBRIDGE
#define L1_DATA_SIZE 32768
#define L1_DATA_LINESIZE 64
#define L2_SIZE 262144
#define L2_LINESIZE 64
#define DTB_DEFAULT_ENTRIES 64
#define DTB_SIZE 4096
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4_1
#define HAVE_SSE4_2
#define HAVE_AVX
#define CORE_SANDYBRIDGE

@xianyi
Copy link

xianyi commented Dec 9, 2013

@ayehow ,
Could you try OpenBLAS develop branch? It supports Haswell micro-architecture.

@rolicot
Copy link
Author

rolicot commented Dec 10, 2013

Interesting. I tried putting OPENBLAS_VER = develop into deps/Versions.make and build succeeded with Library Name ... libopenblasp-r0.2.8.a (Multi threaded; Max num-threads is 128). I got the same Makefile.conf_last and similar warning about NEHALEM fallback when starting julia. But ./getarch 1 gives me this:

#define HASWELL
#define L1_DATA_SIZE 32768
#define L1_DATA_LINESIZE 64
#define L2_SIZE 262144
#define L2_LINESIZE 64
#define DTB_DEFAULT_ENTRIES 64
#define DTB_SIZE 4096
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4_1
#define HAVE_SSE4_2
#define HAVE_AVX
#define FMA3
#define CORE_HASWELL

I thought that Pentium 2030M was Ivy Bridge, therefore SandyBridge architecture shrinked to 22 nm, while Haswell is a next architecture with the same die size. Where am I wrong?

@pao
Copy link
Member

pao commented Dec 10, 2013

According to http://ark.intel.com/products/72059/Intel-Pentium-Processor-2030M-2M-Cache-2_50-GHz, this processor doesn't support AVX. Looking closely at the flags in your /proc/cpuinfo, AVX doesn't appear there, either.

@rolicot
Copy link
Author

rolicot commented Dec 10, 2013

All right, got it. This budget processor, even though it is Ivy Bridge, simply doesn't contain support for AVX instructions (just found that in Intel datasheet, e.g. http://ark.intel.com/compare/72056,72059). I guess in this case it is best to compile for Nehalem, unless there's some hidden switch to compile for Ivy Bridge instruction set minus the AVX.

@rolicot rolicot closed this as completed Dec 10, 2013
@rolicot
Copy link
Author

rolicot commented Dec 10, 2013

@pao Sorry, I didn't notice your post before finding out myself. I did notice the missing AVX flag in cpuinfo earlier, but I thought it could be some kernel issue.

@xianyi My problem is solved now, but still the fact that the processor is detected by getarch as Haswell (including AVX and FMA3, both missing in reality) could be a bug in devel version.

@xianyi
Copy link

xianyi commented Dec 11, 2013

@ayehow , Please try the following cmd.

cd /your/openblas
make clean
make

Then, please provide config_last.h

@pao
Copy link
Member

pao commented Dec 11, 2013

@xianyi @ayehow You may want to open an issue back on the OpenBLAS tracker for the processor detection stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

No branches or pull requests

4 participants