Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEON: implement all intrinsics supported by architecture A64 #1080

Closed
wants to merge 136 commits into from

Conversation

yyctw
Copy link
Contributor

@yyctw yyctw commented Oct 17, 2023

Hi all, this is Eric from Andes Technology Corporation. This PR includes

Implement all poly-related types using uint.
Implement all functions related to the poly type (with test cases).
Implement all functions related to the bf16 type (without test cases).
Add 1403 initial implementations and corresponding test cases in 166 families which are listed below:

  • __crc32, abd, abdl_high, add, addhn_high, aes, bfmlal, bsl, cadd_rot270, cadd_rot90,
  • ceq, ceqz, cgez, cgtz, cle, cltz, cmla, cmla_lane, cmla_rot180, cmla_rot180_lane,
  • cmla_rot270, cmla_rot270_lane, cmla_rot90, cmla_rot90_lane, cnt, combine, copy_lane, create, cvt, cvt_n,
  • cvtm, cvtp, div, dot, dot_lane, dup_lane, dup_n, eor, ext, fmlal,
  • fmlsl, get_high, get_lane, get_low, ld1, ld1_dup, ld1_lane, ld1_x2, ld1_x3, ld1_x4,
  • ld1q_x2, ld1q_x3, ld1q_x4, ld2, ld2_dup, ld2_lane, ld3, ld3_dup, ld3_lane, ld4,
  • ld4_dup, ld4_lane, ldr, maxnm, maxnmv, maxv, minnm, minnmv, minv, mmlaq,
  • mul, mull, mull_high, mull_high_lane, mull_high_n, mulx, mulx_lane, mulx_n, mvn, padd,
  • pmax, pmaxnm, pmin, pminnm, qmovun_high, qrdmlah, qrdmlah_lane, qrdmlsh, qrdmlsh_lane, qrdmulh_lane,
  • qrshl, qrshrn_high_n, qrshrun_high_n, qshl_n, qshlu_n, qshrn_high_n, qshrn_n, qshrun_high_n, qshrun_n, qtbl,
  • qtbx, raddhn, raddhn_high, rax, rbit, recps, reinterpret, rev16, rev32, rev64,
  • rnd, rnd32x, rnd32z, rnd64x, rnd64z, rnda, rndi, rndm, rndp, rndx,
  • rshrn_high_n, rsubhn, rsubhn_high, set_lane, sha1, sha256, sha512, shll_high_n, shr_n, shrn_high_n,
  • shrn_n, sli_n, sm3, sm4, sri_n, st1, st1_lane, st1_x2, st1_x3, st1_x4,
  • st1q_x2, st1q_x3, st1q_x4, st2, st2_lane, st3, st3_lane, st4, st4_lane, str,
  • subhn_high, sudot_lane, tbl, tbx, trn, trn1, trn2, tst, usdot, usdot_lane,
  • uzp, uzp1, uzp2, zip, zip1, zip2

Thanks for reading and any recommendations are welcome!

vmulh_lane_f16, vmulh_laneq_f16, vmul_lane_f16,
vmul_laneq_f16, vmulq_laneq_f16.
Modified wrong implementation "Ties to Away" to "rounding to nearest
with ties to Away"
add.h: Remove redundant code.
one ld2_f16, twenty-two ld2_lane series, and twenty-two ld2_dup series.
cmla_lane, cmla_rot180_lane, cmla_rot270_lane, cmla_rot90_lane, recpx.
- 8 cvmla{q}_lane{q}_f{16/32}
- 2 cvmla{q}_rot90_f16
- 8 cvmla{q}_rot90_lane{q}_f{16/32}
- 2 cvmla{q}_rot180_f16
- 8 cvmla{q}_rot180_lane{q}_f{16/32}
- 2 cvmla{q}_rot270_f16
- 8 cvmla{q}_rot270_lane{q}_f{16/32}
Copy link
Collaborator

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow!

This will take me a while to review, so I will wait for you to finish passing the CI first.

@@ -133,6 +133,19 @@ typedef union {
#endif
} simde_float64x1_private;

// [Eric] Add 64bits poly type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// [Eric] Add 64bits poly type
// [Eric] Add 64bits poly type

Hey Eric, I appreciate your massive PR! You will receive credit; but I don't think these comments add value to the reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll remove it.

@mr-c
Copy link
Collaborator

mr-c commented Nov 16, 2023

@yyctw Can this be closed now?

@yyctw yyctw closed this Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants