Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure on Linux ARM64 #67

Open
martin-g opened this issue Apr 23, 2024 · 20 comments · May be fixed by #69
Open

Build failure on Linux ARM64 #67

martin-g opened this issue Apr 23, 2024 · 20 comments · May be fixed by #69

Comments

@martin-g
Copy link

Hello,

I am trying to build Spaln on Linux ARM64/aarch64 and it fails with:

$BUILD_PREFIX/bin/aarch64-conda-linux-gnu-c++ -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/spaln-3.0.4 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -O3 -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib -L$PREFIX/lib -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem $PREFIX/include -I$PREFIX/include -DM_THREAD=1 -c aln2.cc�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT) In file included from seq.h:65,�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)                  from aln.h:27,�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)                  from blksrc.cc:23:�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT) codepot.h:45:44: error: narrowing conversion of '-2' from 'int' to 'char' [-Wnarrowing]�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)    45 | static  const   SGPT2   ZeroSGPT2 = {0, 0, -2, -2};�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)       |                                            ^~�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT) codepot.h:46:56: error: narrowing conversion of '-2' from 'int' to 'char' [-Wnarrowing]�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)    46 | static  const   SGPT6   ZeroSGPT6 = {0, 0, 0, 0, 0, 0, -2, -2};�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)       |                                                        ^~�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT) In file included from seq.h:65,�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)                  from aln.h:27,�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)                  from aln2.cc:24:�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT) codepot.h:45:44: error: narrowing conversion of '-2' from 'int' to 'char' [-Wnarrowing]�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)    45 | static  const   SGPT2   ZeroSGPT2 = {0, 0, -2, -2};�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)       |                                            ^~�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT) codepot.h:46:56: error: narrowing conversion of '-2' from 'int' to 'char' [-Wnarrowing]�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)    46 | static  const   SGPT6   ZeroSGPT6 = {0, 0, 0, 0, 0, 0, -2, -2};�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT)       |                                                        ^~�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT) make: *** [Makefile:36: aln2.o] Error 1�[0m
08:09:15 �[32mBIOCONDA INFO�[0m (OUT) make: *** Waiting for unfinished jobs....�[0m
08:09:16 �[32mBIOCONDA INFO�[0m (OUT) make: *** [Makefile:36: blksrc.o] Error 1�[0m
08:09:16 �[32mBIOCONDA INFO�[0m (OUT) Traceback (most recent call last):�[0m

The reason is that char type is unsigned for ARM64 architecture:

static const SGPT2 ZeroSGPT2 = {0, 0, -2, -2};

The simplest solution is to change char to int for the fields which could have negative values.
Other/better ideas ?

Thank you!

@ogotoh
Copy link
Owner

ogotoh commented Apr 25, 2024

Dear Martin,

Thank you for your report.

I had no doubt that char c can take a value in the range of -128 <= c <= 127, so the compiler error you reported is unexpected. As you suggested, the simplest way would be to change the data type of phs5 and phs3 in struct SGPT2 (codepot.h) from char to short or int.

Alternatively, although I am entirely ignorant about Linux ARM64/aarch64, I wonder there might be a way for assigning a 8 byte constant(s) to initialize a char variable(s). Is cast like char(-2) ilegal?

I am trying to port vectorized version of spaln from Intel to other architectures. However, the project is still on the way.

Osamu,

@martin-g
Copy link
Author

Dear Osamu,

mgrigorov in 🌐 euler-arm-22 in spaln/src on  master via C v10.3.1-gcc took 10s 
❯ git diff
diff --git i/src/codepot.h w/src/codepot.h
index c2e7527..54bbb81 100644
--- i/src/codepot.h
+++ w/src/codepot.h
@@ -42,8 +42,8 @@ struct SGPT6 {
        char   phs3;
 };
 
-static const   SGPT2   ZeroSGPT2 = {0, 0, -2, -2};
-static const   SGPT6   ZeroSGPT6 = {0, 0, 0, 0, 0, 0, -2, -2};
+static const   SGPT2   ZeroSGPT2 = {0, 0, char(-2), char(-2)};
+static const   SGPT6   ZeroSGPT6 = {0, 0, 0, 0, 0, 0, char(-2), char(-2)};
 static const   float   rlmt_quant = 0.8;
 

indeed fixed the problem!

But now it failed with several errors like:

fwd2b1.cc:1138:23:   required from here
simd_functions.h:477:61: error: cannot convert ‘__Int8x16_t’ to ‘int16x8_t’
  477 |  regist_v bit_and(regist_v u, regist_v v) {return vandq_s16(u, v);}
      |                                                             ^
      |                                                             |
      |                                                             __Int8x16_t
In file included from simd_functions.h:32,
                 from fwd2s1_simd.h:39,
                 from fwd2b1.cc:34:
/usr/lib/gcc/aarch64-linux-gnu/10.3.1/include/arm_neon.h:1591:22: note:   initializing argument 1 of ‘int16x8_t vandq_s16(int16x8_t, int16x8_t)’
 1591 | vandq_s16 (int16x8_t __a, int16x8_t __b)
      |            ~~~~~~~~~~^~~
In file included from fwd2s1_simd.h:39,
                 from fwd2b1.cc:34:
simd_functions.h: In instantiation of ‘regist_v Simd_functions<short int, 8, regist_v>::bit_andnot(regist_v, regist_v) [with regist_v = __Int8x16_t]’:
fwd2s1_simd.h:1061:10:   required from ‘VTYPE SimdAln2s1<var_t, Nelem, regist_v, regist_m>::forwardS1(int*) [with var_t = short int; int Nelem = 8; regist_v = __Int8x16_t; regist_m = __Int8x16_t; VTYPE = int]’
fwd2b1.cc:1138:23:   required from here
simd_functions.h:481:36: error: cannot convert ‘__Int8x16_t’ to ‘int16x8_t’
  481 |      return vandq_s16(u, vmvnq_s16(v));
      |                                    ^
      |                                    |
      |                                    __Int8x16_t

I see that it is supposed to use #include <arm_neon.h> for ARM64. Let me take a deeper look!

@martin-g
Copy link
Author

For some reason the build also fails on my x86_64 dev machine:

fwd2h1.cc:2228:47: warning: ignoring attributes on template argument ‘__m512i’ [-Wignored-attributes]
In file included from fwd2h1.cc:37:
fwd2h1_wip_simd.h: In instantiation of ‘VTYPE SimdAln2h1<var_t, Nelem, regist_v, regist_m>::forwardH1_wip(Mfile*) [with var_t = short int; int Nelem = 32; regist_v = __vector(8) long long int; regist_m = __vector(8) long long int; VTYPE = int]’:
fwd2h1.cc:2032:43:   required from here
fwd2h1_simd.h:218:31: error: ‘class SimdAln2h1<short int, 32, __vector(8) long long int, __vector(8) long long int>’ has no member named ‘all_zero’
  218 | #define AllZero(a)      this->all_zero(a)
      |                         ~~~~~~^~~~~~~~
fwd2h1_wip_simd.h:228:25: note: in expansion of macro ‘AllZero’
  228 |                     if (AllZero(ph_v)) continue;
      |                         ^~~~~~~
fwd2h1_simd.h:218:31: error: ‘class SimdAln2h1<short int, 32, __vector(8) long long int, __vector(8) long long int>’ has no member named ‘all_zero’
  218 | #define AllZero(a)      this->all_zero(a)
      |                         ~~~~~~^~~~~~~~
fwd2h1_wip_simd.h:286:25: note: in expansion of macro ‘AllZero’
  286 |                     if (AllZero(ph_v)) continue;
      |                         ^~~~~~~
fwd2h1_wip_simd.h: In instantiation of ‘VTYPE SimdAln2h1<var_t, Nelem, regist_v, regist_m>::hirschbergH1_wip(int (*)[10], const int&) [with var_t = short int; int Nelem = 32; regist_v = __vector(8) long long int; regist_m = __vector(8) long long int; VTYPE = int; Dim10 = int [10]]’:
fwd2h1.cc:2240:29:   required from here
fwd2h1_simd.h:218:31: error: ‘class SimdAln2h1<short int, 32, __vector(8) long long int, __vector(8) long long int>’ has no member named ‘all_zero’
  218 | #define AllZero(a)      this->all_zero(a)
      |                         ~~~~~~^~~~~~~~
fwd2h1_wip_simd.h:579:25: note: in expansion of macro ‘AllZero’
  579 |                     if (AllZero(ph_v)) continue;
      |                         ^~~~~~~
fwd2h1_simd.h:218:31: error: ‘class SimdAln2h1<short int, 32, __vector(8) long long int, __vector(8) long long int>’ has no member named ‘all_zero’
  218 | #define AllZero(a)      this->all_zero(a)
      |                         ~~~~~~^~~~~~~~
fwd2h1_wip_simd.h:660:25: note: in expansion of macro ‘AllZero’
  660 |                     if (AllZero(ph_v)) continue;
      |                         ^~~~~~~
make: *** [Makefile:35: fwd2h1.o] Error 1
lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            11th Gen Intel(R) Core(TM) i7-11390H @ 3.40GHz
    CPU family:          6
    Model:               140
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            2
    CPU max MHz:         5000.0000
    CPU min MHz:         400.0000
    BogoMIPS:            6835.20
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts 
                         rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc
                         _deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad 
                         fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xs
                         aves split_lock_detect dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg 
                         avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear ibt flush_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   192 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    5 MiB (4 instances)
  L3:                    12 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7
Vulnerabilities:         
...

@martin-g
Copy link
Author

With #68 I've added CI for testing the build on x86_64 and aarch64.
There the build passes on x86_64! The failure on my dev machine must be some hardware specific ?!

@martin-g martin-g linked a pull request Apr 25, 2024 that will close this issue
@ogotoh
Copy link
Owner

ogotoh commented May 15, 2024

Dear Martin,

Thank you for your suggestions. By using sse2neon.h, I could successfully install and run spaln on my Mac Note Pro (actually, I am not a heavy user of Mac). By reviewing my codes of simd_functions.h, I found a few elementary errors. After fixing them, spaln now normally runs on my Mac without the help of sse2neon.h. The new version (version 3.05) has just been uploaded. Although I am not familiar with Linux ARM64/aarch64, I hope the new version may be good for the architecture, potentially after minor revisions.

As the MacOS does not seem support zlib library by default, I manually installed it from the source. Likely, I could not figure out how to install GSL-dependent programs in Mac. Or more generally I want to know how to use GNU facilities on Mac or other OSs based on ARM64/aarch64 architecture. If you know something about this issue, please let me know.

Osamu,

@martin-g
Copy link
Author

Thanks for the update, @ogotoh !
Let me try it 3.0.5 on Linux ARM64!

@martin-g
Copy link
Author

Here are the errors I face with 3.0.5 on Linux ARM64:

mgrigorov in 🌐 euler-arm-22 in /tmp/spaln/spaln-ver.3.0.5 
❯ cd src/

mgrigorov in 🌐 euler-arm-22 in spaln/spaln-ver.3.0.5/src via C v10.3.1-gcc 
❯ ./configure 
Checking for 64 bits compiler...
use CXX=g++ CFLAGS=-O3 -march=native
binaries install to /home/mgrigorov/bin
tables install to /home/mgrigorov/table
alndbs install to /home/mgrigorov/seqdb
wrote Makefile
wrote files
OK. try make and make install. good luck!

mgrigorov in 🌐 euler-arm-22 in spaln/spaln-ver.3.0.5/src via C v10.3.1-gcc 
❯ make -j
g++ -O3 -march=native -DM_THREAD=1 -c blksrc.cc
g++ -O3 -march=native -DM_THREAD=1 -c aln2.cc
g++ -O3 -march=native -DM_THREAD=1 -c dbs.cc
g++ -O3 -march=native -DM_THREAD=1 -c gaps.cc
g++ -O3 -march=native -DM_THREAD=1 -c codepot.cc
g++ -O3 -march=native -DM_THREAD=1 -c divseq.cc
g++ -O3 -march=native -DM_THREAD=1 -c gsinfo.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2b1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2d1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2h1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2s1.cc
g++ -O3 -march=native -DM_THREAD=1 -c bitpat.cc
g++ -O3 -march=native -DM_THREAD=1 -c eijunc.cc
g++ -O3 -march=native -DM_THREAD=1 -c seq.cc
g++ -O3 -march=native -DM_THREAD=1 -c simmtx.cc
g++ -O3 -march=native -DM_THREAD=1 -c sqpr.cc
g++ -O3 -march=native -DM_THREAD=1 -c utilseq.cc
g++ -O3 -march=native -DM_THREAD=1 -c vmf.cc
g++ -O3 -march=native -DM_THREAD=1 -c wln.cc
g++ -O3 -march=native -DM_THREAD=1 -c boyer_moore.cc
g++ -O3 -march=native -DM_THREAD=1 -c clib.cc
g++ -O3 -march=native -DM_THREAD=1 -c iolib.cc
g++ -O3 -march=native -DM_THREAD=1 -c mfile.cc
g++ -O3 -march=native -DM_THREAD=1 -c sets.cc
g++ -O3 -march=native -DM_THREAD=1 -c supprime.cc
make: *** No rule to make target '-lm', needed by 'spaln'.  Stop.
make: *** Waiting for unfinished jobs....
In file included from fwd2s1_simd.h:39,
                 from fwd2b1.cc:34:
simd_functions.h: In member function ‘Simd_functions<unsigned char>::int_v Simd_functions<unsigned char>::add(Simd_functions<unsigned char>::int_v, Simd_functions<unsigned char>::int_v)’:
simd_functions.h:1507:52: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
 1507 |  int_v add(int_v u, int_v v) {return vqaddq_s8(u, v);}
      |                                                    ^
simd_functions.h:1507:48: error: cannot convert ‘Simd_functions<unsigned char>::int_v’ {aka ‘uint8x16_t’} to ‘int8x16_t’
 1507 |  int_v add(int_v u, int_v v) {return vqaddq_s8(u, v);}
      |                                                ^
      |                                                |
      |                                                Simd_functions<unsigned char>::int_v {aka uint8x16_t}
In file included from simd_functions.h:32,
                 from fwd2s1_simd.h:39,
                 from fwd2b1.cc:34:
/usr/lib/gcc/aarch64-linux-gnu/10.3.1/include/arm_neon.h:2740:22: note:   initializing argument 1 of ‘int8x16_t vqaddq_s8(int8x16_t, int8x16_t)’
 2740 | vqaddq_s8 (int8x16_t __a, int8x16_t __b)
      |            ~~~~~~~~~~^~~
In file included from fwd2s1_simd.h:39,
                 from fwd2b1.cc:34:
simd_functions.h: In member function ‘Simd_functions<short unsigned int>::int_v Simd_functions<short unsigned int>::cast16to8(Simd_functions<short unsigned int>::int_v)’:
simd_functions.h:1624:44: error: cannot convert ‘uint8x16_t’ to ‘int8x16_t’
 1624 |      return vreinterpretq_u16_s8(vqtbl1q_u8(w, b_v));
      |                                  ~~~~~~~~~~^~~~~~~~
      |                                            |
      |                                            uint8x16_t
In file included from simd_functions.h:32,
                 from fwd2s1_simd.h:39,
                 from fwd2b1.cc:34:
/usr/lib/gcc/aarch64-linux-gnu/10.3.1/include/arm_neon.h:5813:33: note:   initializing argument 1 of ‘uint16x8_t vreinterpretq_u16_s8(int8x16_t)’
 5813 | vreinterpretq_u16_s8 (int8x16_t __a)
      |                       ~~~~~~~~~~^~~
In file included from fwd2s1_simd.h:39,
                 from fwd2b1.cc:34:
simd_functions.h: In member function ‘Simd_functions<int>::int_v Simd_functions<int>::cast32to16(Simd_functions<int>::int_v)’:
simd_functions.h:1674:33: error: cannot convert ‘uint32x4_t’ to ‘Simd_functions<int>::int_v’ {aka ‘int32x4_t’} in return
 1674 |      return vreinterpretq_u32_s8(vqtbl1q_s8(w, b_v));
      |             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
      |                                 |
      |                                 uint32x4_t
In file included from fwd2h1_simd.h:39,
                 from fwd2h1.cc:37:
simd_functions.h: In member function ‘Simd_functions<unsigned char>::int_v Simd_functions<unsigned char>::add(Simd_functions<unsigned char>::int_v, Simd_functions<unsigned char>::int_v)’:
simd_functions.h:1507:52: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
 1507 |  int_v add(int_v u, int_v v) {return vqaddq_s8(u, v);}
      |                                                    ^
simd_functions.h:1507:48: error: cannot convert ‘Simd_functions<unsigned char>::int_v’ {aka ‘uint8x16_t’} to ‘int8x16_t’
 1507 |  int_v add(int_v u, int_v v) {return vqaddq_s8(u, v);}
      |                                                ^
      |                                                |
      |                                                Simd_functions<unsigned char>::int_v {aka uint8x16_t}
In file included from simd_functions.h:32,
                 from fwd2h1_simd.h:39,
                 from fwd2h1.cc:37:
/usr/lib/gcc/aarch64-linux-gnu/10.3.1/include/arm_neon.h:2740:22: note:   initializing argument 1 of ‘int8x16_t vqaddq_s8(int8x16_t, int8x16_t)’
 2740 | vqaddq_s8 (int8x16_t __a, int8x16_t __b)
      |            ~~~~~~~~~~^~~
In file included from fwd2h1_simd.h:39,
                 from fwd2h1.cc:37:
simd_functions.h: In member function ‘Simd_functions<short unsigned int>::int_v Simd_functions<short unsigned int>::cast16to8(Simd_functions<short unsigned int>::int_v)’:
simd_functions.h:1624:44: error: cannot convert ‘uint8x16_t’ to ‘int8x16_t’
 1624 |      return vreinterpretq_u16_s8(vqtbl1q_u8(w, b_v));
      |                                  ~~~~~~~~~~^~~~~~~~
      |                                            |
      |                                            uint8x16_t
In file included from simd_functions.h:32,
                 from fwd2h1_simd.h:39,
                 from fwd2h1.cc:37:
/usr/lib/gcc/aarch64-linux-gnu/10.3.1/include/arm_neon.h:5813:33: note:   initializing argument 1 of ‘uint16x8_t vreinterpretq_u16_s8(int8x16_t)’
 5813 | vreinterpretq_u16_s8 (int8x16_t __a)
      |                       ~~~~~~~~~~^~~
In file included from fwd2s1_simd.h:39,
                 from fwd2s1.cc:36:
simd_functions.h: In member function ‘Simd_functions<unsigned char>::int_v Simd_functions<unsigned char>::add(Simd_functions<unsigned char>::int_v, Simd_functions<unsigned char>::int_v)’:
simd_functions.h:1507:52: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
 1507 |  int_v add(int_v u, int_v v) {return vqaddq_s8(u, v);}
      |                                                    ^
In file included from fwd2h1_simd.h:39,
                 from fwd2h1.cc:37:
simd_functions.h: In member function ‘Simd_functions<int>::int_v Simd_functions<int>::cast32to16(Simd_functions<int>::int_v)’:
simd_functions.h:1674:33: error: cannot convert ‘uint32x4_t’ to ‘Simd_functions<int>::int_v’ {aka ‘int32x4_t’} in return
 1674 |      return vreinterpretq_u32_s8(vqtbl1q_s8(w, b_v));
      |             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
      |                                 |
      |                                 uint32x4_t
simd_functions.h:1507:48: error: cannot convert ‘Simd_functions<unsigned char>::int_v’ {aka ‘uint8x16_t’} to ‘int8x16_t’
 1507 |  int_v add(int_v u, int_v v) {return vqaddq_s8(u, v);}
      |                                                ^
      |                                                |
      |                                                Simd_functions<unsigned char>::int_v {aka uint8x16_t}
In file included from simd_functions.h:32,
                 from fwd2s1_simd.h:39,
                 from fwd2s1.cc:36:
/usr/lib/gcc/aarch64-linux-gnu/10.3.1/include/arm_neon.h:2740:22: note:   initializing argument 1 of ‘int8x16_t vqaddq_s8(int8x16_t, int8x16_t)’
 2740 | vqaddq_s8 (int8x16_t __a, int8x16_t __b)
      |            ~~~~~~~~~~^~~
In file included from fwd2s1_simd.h:39,
                 from fwd2s1.cc:36:
simd_functions.h: In member function ‘Simd_functions<short unsigned int>::int_v Simd_functions<short unsigned int>::cast16to8(Simd_functions<short unsigned int>::int_v)’:
simd_functions.h:1624:44: error: cannot convert ‘uint8x16_t’ to ‘int8x16_t’
 1624 |      return vreinterpretq_u16_s8(vqtbl1q_u8(w, b_v));
      |                                  ~~~~~~~~~~^~~~~~~~
      |                                            |
      |                                            uint8x16_t
In file included from simd_functions.h:32,
                 from fwd2s1_simd.h:39,
                 from fwd2s1.cc:36:
/usr/lib/gcc/aarch64-linux-gnu/10.3.1/include/arm_neon.h:5813:33: note:   initializing argument 1 of ‘uint16x8_t vreinterpretq_u16_s8(int8x16_t)’
 5813 | vreinterpretq_u16_s8 (int8x16_t __a)
      |                       ~~~~~~~~~~^~~
In file included from fwd2s1_simd.h:39,
                 from fwd2s1.cc:36:
simd_functions.h: In member function ‘Simd_functions<int>::int_v Simd_functions<int>::cast32to16(Simd_functions<int>::int_v)’:
simd_functions.h:1674:33: error: cannot convert ‘uint32x4_t’ to ‘Simd_functions<int>::int_v’ {aka ‘int32x4_t’} in return
 1674 |      return vreinterpretq_u32_s8(vqtbl1q_s8(w, b_v));
      |             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
      |                                 |
      |                                 uint32x4_t
make: *** [Makefile:40: fwd2b1.o] Error 1
In file included from fwd2s1.cc:36:
fwd2s1_simd.cc: In member function ‘VTYPE SimdAln2s1::forwardS1(int*)’:
fwd2s1_simd.h:216:30: error: cannot convert ‘Simd_functions<short int>::int_v’ {aka ‘int16x8_t’} to ‘SimdAln2s1::regist_m’ {aka ‘uint16x8_t’} in assignment
  216 | #define To_mask(a) this->load(a)
      |                    ~~~~~~~~~~^~~
      |                              |
      |                              Simd_functions<short int>::int_v {aka int16x8_t}
fwd2s1_simd.cc:683:15: note: in expansion of macro ‘To_mask’
  683 |       msk_m = To_mask(pb_a);
      |               ^~~~~~~
make: *** [Makefile:40: fwd2s1.o] Error 1
make: *** [Makefile:40: fwd2h1.o] Error 1

@martin-g
Copy link
Author

As you can see I needed to make some changes to Makefile.in to fix issues like make: *** No rule to make target '-lm', needed by 'spaln'. Stop.

@martin-g
Copy link
Author

#68 introduces CI (i.e. automated testing) that could be used to verify changes on Linux x86_64 and Linux aarch64. I could add jobs for Mac x86_64 and arm64 too if you are interested!

@ogotoh
Copy link
Owner

ogotoh commented May 16, 2024

Dear Martin,

Looking into the error messages you kindly sent me, all errors are likely to be derived from type mismatches. Although I tried to fix them, I cannot confirm the fixation by myself as clang++ on MacOS Pro doesn’t report type mismatches you reported. Could you examine the new version 3.0.5a on your system?

I will consider your changes in Makefile.in later.

Osamu,

@martin-g
Copy link
Author

martin-g commented May 16, 2024

3.0.5a looks much better:


mgrigorov in 🌐 euler-arm-22 in spaln/spaln-ver.3.0.5a/src via C v10.3.1-gcc 
❯ ./configure 
Checking for 64 bits compiler...
use CXX=g++ CFLAGS=-O3 -march=native
binaries install to /home/mgrigorov/bin
tables install to /home/mgrigorov/table
alndbs install to /home/mgrigorov/seqdb
wrote Makefile
wrote files
OK. try make and make install. good luck!

mgrigorov in 🌐 euler-arm-22 in spaln/spaln-ver.3.0.5a/src via C v10.3.1-gcc 
❯ make
g++ -O3 -march=native -DM_THREAD=1 -c blksrc.cc
g++ -O3 -march=native -DM_THREAD=1 -c aln2.cc
g++ -O3 -march=native -DM_THREAD=1 -c dbs.cc
g++ -O3 -march=native -DM_THREAD=1 -c gaps.cc
g++ -O3 -march=native -DM_THREAD=1 -c codepot.cc
g++ -O3 -march=native -DM_THREAD=1 -c divseq.cc
g++ -O3 -march=native -DM_THREAD=1 -c gsinfo.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2b1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2d1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2h1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2s1.cc
g++ -O3 -march=native -DM_THREAD=1 -c bitpat.cc
g++ -O3 -march=native -DM_THREAD=1 -c eijunc.cc
g++ -O3 -march=native -DM_THREAD=1 -c seq.cc
g++ -O3 -march=native -DM_THREAD=1 -c simmtx.cc
g++ -O3 -march=native -DM_THREAD=1 -c sqpr.cc
g++ -O3 -march=native -DM_THREAD=1 -c utilseq.cc
g++ -O3 -march=native -DM_THREAD=1 -c vmf.cc
g++ -O3 -march=native -DM_THREAD=1 -c wln.cc
g++ -O3 -march=native -DM_THREAD=1 -c boyer_moore.cc
ar rc sblib.a aln2.o dbs.o gaps.o  codepot.o divseq.o gsinfo.o \
fwd2b1.o fwd2d1.o fwd2h1.o fwd2s1.o bitpat.o eijunc.o \
seq.o simmtx.o sqpr.o utilseq.o vmf.o wln.o boyer_moore.o
ranlib sblib.a
g++ -O3 -march=native -DM_THREAD=1 -c clib.cc
g++ -O3 -march=native -DM_THREAD=1 -c iolib.cc
g++ -O3 -march=native -DM_THREAD=1 -c mfile.cc
g++ -O3 -march=native -DM_THREAD=1 -c sets.cc
g++ -O3 -march=native -DM_THREAD=1 -c supprime.cc
ar rc clib.a clib.o iolib.o mfile.o sets.o supprime.o 
ranlib clib.a
make: *** No rule to make target '-lm', needed by 'spaln'.  Stop.

Good job!
It seems only the Makefile improvements are needed.

@ogotoh
Copy link
Owner

ogotoh commented May 17, 2024

Dear Martin,

Thank you very much for your quick response. It seems that the type mismatches in Ver.3.0.5 have been remedied.

I have a few questions with regard to your suggestions on Makefile.in.

  1. Why did you suggest to change from $(SLIB) to sblib.a in the line ‘spaln:’, etc?
  2. How did you circumvent the error “make: *** No rule to make target '-lm', needed by 'spaln'. Stop.”?

Osamu,

@martin-g
Copy link
Author

See https://github.com/ogotoh/spaln/pull/69/files#diff-6cdcfc8601359a7023cb8d31281d057a3f8c370cb6f3112d1b9c8e3b408ce8bbR26-R28

CLIB = clib.a -lpthread -lm -lz
SLIB = sblib.a $(CLIB)
ULIB = ublib.a $(SLIB)

Here you specify both the file names which act as Make targets and the linker flags.
If you keep using $(xLIB) as a Make target then it will try to find target/files with names as -lm and -lz.
So, I just replaced the $(xLIB) usages in the dependent targets with the respective file names - xlib.a and kept using $(xLIB) as linker flags.

While explaining this I realize that I have a bug in my PR!
At https://github.com/ogotoh/spaln/pull/69/files#diff-6cdcfc8601359a7023cb8d31281d057a3f8c370cb6f3112d1b9c8e3b408ce8bbR90 $(SLIB) should be replaced with sblib.a clib.a, not just sblib.a.

@ogotoh
Copy link
Owner

ogotoh commented May 20, 2024

Dear Martin,

I have modified Makefile.in. If it is OK, I will rearrange tags, that are currently somewhat messy.

Osamu,

@martin-g
Copy link
Author

mgrigorov in 🌐 euler-arm-22 in /tmp/spaln 
❯ cd spaln-ver.3.0.5b/src/

mgrigorov in 🌐 euler-arm-22 in spaln/spaln-ver.3.0.5b/src via C v10.3.1-gcc 
❯ ./configure 
Checking for 64 bits compiler...
use CXX=g++ CFLAGS=-O3 -march=native
binaries install to /home/mgrigorov/bin
tables install to /home/mgrigorov/table
alndbs install to /home/mgrigorov/seqdb
wrote Makefile
wrote files
OK. try make and make install. good luck!

mgrigorov in 🌐 euler-arm-22 in spaln/spaln-ver.3.0.5b/src via C v10.3.1-gcc 
❯ make 
g++ -O3 -march=native -DM_THREAD=1 -c blksrc.cc
g++ -O3 -march=native -DM_THREAD=1 -c aln2.cc
g++ -O3 -march=native -DM_THREAD=1 -c dbs.cc
g++ -O3 -march=native -DM_THREAD=1 -c gaps.cc
g++ -O3 -march=native -DM_THREAD=1 -c codepot.cc
g++ -O3 -march=native -DM_THREAD=1 -c divseq.cc
g++ -O3 -march=native -DM_THREAD=1 -c gsinfo.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2b1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2d1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2h1.cc
g++ -O3 -march=native -DM_THREAD=1 -c fwd2s1.cc
g++ -O3 -march=native -DM_THREAD=1 -c bitpat.cc
g++ -O3 -march=native -DM_THREAD=1 -c eijunc.cc
g++ -O3 -march=native -DM_THREAD=1 -c seq.cc
g++ -O3 -march=native -DM_THREAD=1 -c simmtx.cc
g++ -O3 -march=native -DM_THREAD=1 -c sqpr.cc
g++ -O3 -march=native -DM_THREAD=1 -c utilseq.cc
g++ -O3 -march=native -DM_THREAD=1 -c vmf.cc
g++ -O3 -march=native -DM_THREAD=1 -c wln.cc
g++ -O3 -march=native -DM_THREAD=1 -c boyer_moore.cc
ar rc sblib.a aln2.o dbs.o gaps.o  codepot.o divseq.o gsinfo.o \
fwd2b1.o fwd2d1.o fwd2h1.o fwd2s1.o bitpat.o eijunc.o \
seq.o simmtx.o sqpr.o utilseq.o vmf.o wln.o boyer_moore.o
ranlib sblib.a
g++ -O3 -march=native -DM_THREAD=1 -c clib.cc
g++ -O3 -march=native -DM_THREAD=1 -c iolib.cc
g++ -O3 -march=native -DM_THREAD=1 -c mfile.cc
g++ -O3 -march=native -DM_THREAD=1 -c sets.cc
g++ -O3 -march=native -DM_THREAD=1 -c supprime.cc
ar rc clib.a clib.o iolib.o mfile.o sets.o supprime.o 
ranlib clib.a
g++ -O3 -march=native -DM_THREAD=1 -o spaln spaln.cc blksrc.o sblib.a clib.a -lpthread -lm -lz
g++ -O3 -march=native -DM_THREAD=1 -o sortgrcd sortgrcd.cc sblib.a clib.a -lpthread -lm -lz
g++ -O3 -march=native -DM_THREAD=1 -o makmdm makmdm.cc clib.a -lpthread -lm -lz
g++ -O3 -march=native -DM_THREAD=1 -o makdbs makdbs.cc sblib.a clib.a -lpthread -lm -lz

mgrigorov in 🌐 euler-arm-22 in spaln/spaln-ver.3.0.5b/src via C v10.3.1-gcc took 50s 
❯ file spaln sortgrcd makmdm makdbs 
spaln:    ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=b298de082863a601bfe3572e86d7646992e9c0da, for GNU/Linux 3.7.0, not stripped
sortgrcd: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=da6a5fd23307f7c8ee57a2d1095d5d8d06568300, for GNU/Linux 3.7.0, not stripped
makmdm:   ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=080bfb2e243640002fcefab41ddb9e889be21f97, for GNU/Linux 3.7.0, not stripped
makdbs:   ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=bae38ddae3ec6ef3f96fa7578cc88db567c2ce40, for GNU/Linux 3.7.0, not stripped

mgrigorov in 🌐 euler-arm-22 in spaln/spaln-ver.3.0.5b/src via C v10.3.1-gcc 
❯ ./spaln 
No input seq file !

*** SPALN_ARM_NEON version 3.0.5 <240520> ***

Usage:
spaln -W[Genome.bkn] -KD [W_Options] Genome.mfa	(to write block inf.)
spaln -W[Genome.bkp] -KP [W_Options] Genome.mfa	(to write block inf.)
spaln -W[AAdb.bka] -KA [W_Options] AAdb.faa	(to write aa db inf.)
spaln -W [Genome.mfa|AAdb.faa]	(alternative to makdbs.)
spaln [R_options] genomic_segment cDNA.fa	(to align)
spaln [R_options] genomic_segment protein.fa	(to align)
spaln [R_options] -dGenome cDNA.fa	(to map & align)
spaln [R_options] -dGenome protein.fa	(to map & align)
spaln [R_options] -aAAdb genomic_segment.fa	(to search aa database & align)
spaln [R_options] -aAAdb protein.fa	(to search aa database)

in the following, # = integer or real number; $ = string; default in ()

W_Options:
	-XC#	number of bit patterns < 6 (1)
	-XG#	Maximum expected gene size (inferred from genome|db size)
	-Xk#	Word size (inferred from genome|db size)
	-Xb#	Block size (inferred from genome|db size)
	-Xa#	Abundance factor (10)
	-Xr#	Minimum ORF length with -KP (30))
	-g	gzipped output
	-t#	Mutli-thread operation with # threads

R_Options (representatives):
	-A[0-3]	0: scalar, 1..3: simd; 1: rigorous, 2: intermediate, 3: fast
	-H#	Minimum score for report (35)
	-L or -LS or -L#	semi-global or local alignment (-L)
	-M#[,#2]	Number of outputs per query (1) (4 if # is omitted)
		#2 (4) specifies the max number of candidate loci
		This option is effective only for map-and-align modes
	-O#[,#2,..] (GvsA|C)	0:Gff3_gene; 1:alignment; 2:Gff3_match; 3:Bed; 4:exon-inf;
			5:intron-inf; 6:cDNA; 7:translated; 8:block-only;
			10:SAM; 12:binary; 15:query+GS (4)
	-O#[,#2,..] (AvsA)	0:statistics; 1:alignment; 2:Sugar; 3:Psl; 4:XYL;
			5:srat+XYL; 8:Cigar; 9:Vulgar; 10:SAM; (4)
	-Q#	0:DP; 1-3:HSP-Search; 4-7; Block-Search (3)
	-R$	Read block information file *.bkn, *.bkp or *.bka
	-S#	Orientation. 0:annotation; 1:forward; 2:reverse; 3:both (3)
	-T$	Subdirectory where species-specific parameters reside
	-a$	Specify AAdb. Must run `makeidx.pl -ia' breforehand
	-A$	Same as -a but db sequences are stored in memory
	-d$	Specify genome. Must run `makeidx.pl -i[n|p]' breforehand
	-D$	Same as -d but db sequences are stored in memory
	-g	gzipped output in combination with -O12
	-l#	Number of characters per line in alignment (60)
	-o$	File/directory/prefix where results are written (stdout)
	-pa#	Remove 3' poly A >= # (0: don't remove)
	-pF	Output full Fasta entry name
	-pj	Suppress splice junction information with -O[6|7]
	-pn	Retain existing output file
	-po	Overwrite existing output file
	-pw	Report results even if the score is below the threshold
	-pT	Exclude termination codon from CDS
	-r$	Report information about block data file
	-u#	Gap-extension penalty (3)
	-v#	Gap-open penalty (8)
	-w#	Band width for DP matrix scan (100)
	-t[#]	Mutli-thread operation with # threads
	-ya#	Stringency of splice site. 0->3:strong->weak
	-yl3	Ddouble affine gap penalty
	-ym#	Nucleotide match score (2)
	-yn#	Nucleotide mismatch score (-6)
	-yo#	Penalty for a premature termination codon (100)
	-yx#	Penalty for a frame shift error (100)
	-yy#	Weight for splice site signal (8)
	-yz#	Weight for coding potential (2)
	-yB#	Weight for branch point signal (0)
	-yI$	Intron length distribution
	-yL#	Minimum expected length of intron (30)
	-yS[#]	Use species-specific parameter set (0.0/0.5)
	-yX0	Don't use parameter set for cross-species comparison
	-yZ#	Weight for intron potential (0)
	-XG#	Reset maximum expected gene size, suffix k or M is effective

Examples:
	spaln -W -KP -E -t4 dictdisc_g.gf
	spaln -W -KA -Xk5 Swiss.faa
	spaln -O -LS 'chr1.fa 10001 40000' cdna.nfa
	spaln -Q0,1,7 -t10 -TTetrapod -XG2M -ommu/ -dmus_musc_g hspcdna.nfa
	spaln -Q7 -O5 -t10 -Tdictdics -ddictdisc_g [-E] 'dictdisc.faa (101 200)' > ddi.intron
	spaln -Q7 -O0 -t10 -Tdictdics -aSwiss 'chr1.nfa 200001 210000' > Chr1_200-210K.gff
	spaln -Q4 -O0 -t10 -M10 -aSwiss dictdisc.faa > dictdisc.alignment_score

Looks very good to me ! 👍🏻

@ogotoh
Copy link
Owner

ogotoh commented May 21, 2024

Dear Martin,

I thank you so much for your kind cooperation!! Without you help, it would have taken much longer time to make spaln executable on Linux ARM64 and MacOS.

Osamu,

@martin-g
Copy link
Author

@ogotoh Could you please make a new release/tag with the latest improvements ?
At bioconda/bioconda-recipes#48017 we experience problems with the build at Mac OSX Intel due to the custom patches in the Makefile.
Thank you!

@ogotoh
Copy link
Owner

ogotoh commented May 27, 2024

Dear Martin,

As described in “Changes in version 3.0.5”, I manually installed libz.a from the source. At present, I have no idea how to atomize this procedure by the spaln installer.

Osamu,

@martin-g
Copy link
Author

Dear Osamu,

In the PR that I suggested for testing on Ubuntu x86_64 and aarch64 I use apt-get install zlib1g-dev to install the dependency.
At Conda the dependency is listed at https://github.com/bioconda/bioconda-recipes/pull/48017/files#diff-371bea2c633f595f7fb0d4b6f3f1526de941f009a951ed7921db57fb5ef2e8b7R27

So, in my opinion you don't need to do anything about it - spaln should just assume that libz is already provided by other means.

@ogotoh
Copy link
Owner

ogotoh commented May 27, 2024

Dear Martin,

I see. Thank you again for your persistent efforts for easier installation of Spaln into various architectures.

Osamu,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants