-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEON: implement all intrinsics supported by architecture A64 #1080
Conversation
vmulh_lane_f16, vmulh_laneq_f16, vmul_lane_f16, vmul_laneq_f16, vmulq_laneq_f16.
Modified wrong implementation "Ties to Away" to "rounding to nearest with ties to Away" add.h: Remove redundant code.
one ld2_f16, twenty-two ld2_lane series, and twenty-two ld2_dup series.
cmla_lane, cmla_rot180_lane, cmla_rot270_lane, cmla_rot90_lane, recpx.
- 8 cvmla{q}_lane{q}_f{16/32} - 2 cvmla{q}_rot90_f16 - 8 cvmla{q}_rot90_lane{q}_f{16/32} - 2 cvmla{q}_rot180_f16 - 8 cvmla{q}_rot180_lane{q}_f{16/32} - 2 cvmla{q}_rot270_f16 - 8 cvmla{q}_rot270_lane{q}_f{16/32}
__crc32, aes, ras, sha1, sha256, sha512, sm3, sm4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow!
This will take me a while to review, so I will wait for you to finish passing the CI first.
@@ -133,6 +133,19 @@ typedef union { | |||
#endif | |||
} simde_float64x1_private; | |||
|
|||
// [Eric] Add 64bits poly type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// [Eric] Add 64bits poly type | |
// [Eric] Add 64bits poly type |
Hey Eric, I appreciate your massive PR! You will receive credit; but I don't think these comments add value to the reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll remove it.
@yyctw Can this be closed now? |
Hi all, this is Eric from Andes Technology Corporation. This PR includes
Implement all poly-related types using
uint
.Implement all functions related to the
poly
type (with test cases).Implement all functions related to the
bf16
type (without test cases).Add 1403 initial implementations and corresponding test cases in 166 families which are listed below:
__crc32
,abd
,abdl_high
,add
,addhn_high
,aes
,bfmlal
,bsl
,cadd_rot270
,cadd_rot90
,ceq
,ceqz
,cgez
,cgtz
,cle
,cltz
,cmla
,cmla_lane
,cmla_rot180
,cmla_rot180_lane
,cmla_rot270
,cmla_rot270_lane
,cmla_rot90
,cmla_rot90_lane
,cnt
,combine
,copy_lane
,create
,cvt
,cvt_n
,cvtm
,cvtp
,div
,dot
,dot_lane
,dup_lane
,dup_n
,eor
,ext
,fmlal
,fmlsl
,get_high
,get_lane
,get_low
,ld1
,ld1_dup
,ld1_lane
,ld1_x2
,ld1_x3
,ld1_x4
,ld1q_x2
,ld1q_x3
,ld1q_x4
,ld2
,ld2_dup
,ld2_lane
,ld3
,ld3_dup
,ld3_lane
,ld4
,ld4_dup
,ld4_lane
,ldr
,maxnm
,maxnmv
,maxv
,minnm
,minnmv
,minv
,mmlaq
,mul
,mull
,mull_high
,mull_high_lane
,mull_high_n
,mulx
,mulx_lane
,mulx_n
,mvn
,padd
,pmax
,pmaxnm
,pmin
,pminnm
,qmovun_high
,qrdmlah
,qrdmlah_lane
,qrdmlsh
,qrdmlsh_lane
,qrdmulh_lane
,qrshl
,qrshrn_high_n
,qrshrun_high_n
,qshl_n
,qshlu_n
,qshrn_high_n
,qshrn_n
,qshrun_high_n
,qshrun_n
,qtbl
,qtbx
,raddhn
,raddhn_high
,rax
,rbit
,recps
,reinterpret
,rev16
,rev32
,rev64
,rnd
,rnd32x
,rnd32z
,rnd64x
,rnd64z
,rnda
,rndi
,rndm
,rndp
,rndx
,rshrn_high_n
,rsubhn
,rsubhn_high
,set_lane
,sha1
,sha256
,sha512
,shll_high_n
,shr_n
,shrn_high_n
,shrn_n
,sli_n
,sm3
,sm4
,sri_n
,st1
,st1_lane
,st1_x2
,st1_x3
,st1_x4
,st1q_x2
,st1q_x3
,st1q_x4
,st2
,st2_lane
,st3
,st3_lane
,st4
,st4_lane
,str
,subhn_high
,sudot_lane
,tbl
,tbx
,trn
,trn1
,trn2
,tst
,usdot
,usdot_lane
,uzp
,uzp1
,uzp2
,zip
,zip1
,zip2
Thanks for reading and any recommendations are welcome!