Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVE #609

Open
nemequ opened this issue Sep 3, 2020 · 0 comments
Open

SVE #609

nemequ opened this issue Sep 3, 2020 · 0 comments
Labels
instruction-set-support Implementing new SIMD ISA extensions portably

Comments

@nemequ
Copy link
Member

nemequ commented Sep 3, 2020

This list is chapter 6 of the Arm C Language Extensions for SVE document, which are the required functions for SVE; there are also some optional functions for SVE, and required and optional functions for SVE2, so eventually there will be 4 issues total.

  • 6.1 Introduction
  • 6.2 Loads
    • 6.2.1 LD1: Unextended load
    • 6.2.2 LD1SB: Load 8-bit data and sign-extend
    • 6.2.3 LD1UB: Load 8-bit data and zero-extend
    • 6.2.4 LD1SH: Load 16-bit data and sign-extend
    • 6.2.5 LD1UH: Load 16-bit data and zero-extend
    • 6.2.6 LD1SW: Load 32-bit data and sign-extend
    • 6.2.7 LD1UW: Load 32-bit data and zero-extend
    • 6.2.8 LD1RQ: Unextended load and replicate to quadword
    • 6.2.9 LDFF1: Unextended load, first-faulting
    • 6.2.10 LDFF1SB: Load 8-bit data and sign-extend, first-faulting
    • 6.2.11 LDFF1UB: Load 8-bit data and zero-extend, first-faulting
    • 6.2.12 LDFF1SH: Load 16-bit data and sign-extend, first-faulting
    • 6.2.13 LDFF1UH: Load 16-bit data and zero-extend, first-faulting
    • 6.2.14 LDFF1SW: Load 32-bit data and sign-extend, first-faulting
    • 6.2.15 LDFF1UW: Load 32-bit data and zero-extend, first-faulting
    • 6.2.16 LDNF1: Unextended load, non-faulting
    • 6.2.17 LDNF1SB: Load 8-bit data and sign-extend, non-faulting
    • 6.2.18 LDNF1UB: Load 8-bit data and zero-extend, non-faulting
    • 6.2.19 LDNF1SH: Load 16-bit data and sign-extend, non-faulting
    • 6.2.20 LDNF1UH: Load 16-bit data and zero-extend, non-faulting
    • 6.2.21 LDNF1SW: Load 32-bit data and sign-extend, non-faulting
    • 6.2.22 LDNF1UW: Load 32-bit data and zero-extend, non-faulting
    • 6.2.23 LDNT1: Unextended load, non-temporal
    • 6.2.24 LD2: Load two-element structures into two vectors
    • 6.2.25 LD3: Load three-element structures into three vectors
    • 6.2.26 LD4: Load four-element structures into four vectors
  • 6.3 Stores
    • 6.3.1 ST1: Store one vector, with no truncation
    • 6.3.2 ST1B: Store one vector, truncating to 8 bits
    • 6.3.3 ST1H: Store one vector, truncating to 16 bits
    • 6.3.4 ST1W: Store one vector, truncating to 32 bits
    • 6.3.5 STNT1: Store one vector, with no truncation, non-temporal
    • 6.3.6 ST2: Store two vectors into two-element structures
    • 6.3.7 ST3: Store three vectors into three-element structures
    • 6.3.8 ST4: Store four vectors into four-element structures
  • 6.4 Prefetches
    • 6.4.1 PRFB: Prefetch 8-bit data
    • 6.4.2 PRFH: Prefetch 16-bit data
    • 6.4.3 PRFW: Prefetch 32-bit data
    • 6.4.4 PRFD: Prefetch 64-bit data
  • 6.5 Address calculations
    • 6.5.1 ADRB: Compute vector address for 8-bit data
    • 6.5.2 ADRH: Compute vector address for 16-bit data
    • 6.5.3 ADRW: Compute vector address for 32-bit data
    • 6.5.4 ADRD: Compute vector address for 64-bit data
  • 6.6 Scalar to vector operations
    • 6.6.1 DUP: Duplicate scalar value (all done except _bf16 and _x (setting inactive to unknown) versions)
    • 6.6.2 DUPQ: Duplicate scalars to every quadword of a vector
    • 6.6.3 INDEX: Create index series
  • 6.7 Integer arithmetic
    • 6.7.1 ADD: Modular integer addition
    • 6.7.2 QADD: Saturating integer addition
    • 6.7.3 SUB: Modular integer subtraction (in progress, 40 of 60)
    • 6.7.4 SUBR: Modular integer subtraction, reversed
    • 6.7.5 QSUB: Saturating integer subtraction
    • 6.7.6 ABD: Integer absolute difference
    • 6.7.7 MUL: Integer multiplication, returning low half
    • 6.7.8 MULH: Integer multiplication, returning high half
    • 6.7.9 MAD: Integer addition of product (multiplicand first)
    • 6.7.10 MLA: Integer addition of product (addend first)
    • 6.7.11 MSB: Integer subtraction of product (multiplicand first)
    • 6.7.12 MLS: Integer subtraction of product (minuend first)
    • 6.7.13 DOT: Integer addition of dot product
    • 6.7.14 DIV: Integer division
    • 6.7.15 DIVR: Integer division, reversed
    • 6.7.16 MAX: Integer maximum
    • 6.7.17 MIN: Integer minimum
    • 6.7.18 NEG: Integer negation
    • 6.7.19 ABS: Integer absolute
  • 6.8 Logical operations
    • 6.8.1 AND: Bitwise AND
    • 6.8.2 BIC: Bitwise AND NOT
    • 6.8.3 ORR: Bitwise OR
    • 6.8.4 EOR: Bitwise exclusive OR
    • 6.8.5 NOT: Bitwise inverse
    • 6.8.6 CNOT: Logical inverse
  • 6.9 Shifts
    • 6.9.1 LSL: Shift left
    • 6.9.2 LSR: Logical shift right
    • 6.9.3 ASR: Arithmetic shift right, rounding towards -Inf
    • 6.9.4 ASRD: Arithmetic shift right, rounding towards zero
    • 6.9.5 INSR: Shift vector and insert scalar
  • 6.10 Integer reductions
    • 6.10.1 ADDV: Integer addition reduction
    • 6.10.2 MAXV: Integer maximum reduction
    • 6.10.3 MINV: Integer minimum reduction
    • 6.10.4 ANDV: Integer AND reduction
    • 6.10.5 ORV: Integer OR reduction
    • 6.10.6 EORV: Integer exclusive OR reduction
  • 6.11 Integer comparisons
    • 6.11.1 CMPEQ: Integer compare equal
    • 6.11.2 CMPNE: Integer compare not equal
    • 6.11.3 CMPLT: Integer compare less than
    • 6.11.4 CMPLE: Integer compare less than or equal to
    • 6.11.5 CMPGE: Integer compare greater than or equal to
    • 6.11.6 CMPGT: Integer compare greater than
  • 6.12 While comparisons
    • 6.12.1 WHILELT: While incrementing variable is less than
    • 6.12.2 WHILELE: While incrementing variable is less than or equal to
  • 6.13 Counting bits
    • 6.13.1 CLS: Count leading sign bits
    • 6.13.2 CLZ: Count leading zero bits
    • 6.13.3 CNT: Count nonzero bits
  • 6.14 Conversion
    • 6.14.1 EXTB: Extend from low 8 bits
    • 6.14.2 EXTH: Extend from low 16 bits
    • 6.14.3 EXTW: Extend from low 32 bits
  • 6.15 Reversal
    • 6.15.1 RBIT: Reverse bits within elements
    • 6.15.2 REVB: Reverse bytes within elements
    • 6.15.3 REVH: Reverse halfwords within elements
    • 6.15.4 REVW: Reverse words within elements
  • 6.16 Floating-point arithmetic
    • 6.16.1 ADD: Floating-point addition
    • 6.16.2 CADD: Floating-point complex addition with rotation
    • 6.16.3 SUB: Floating-point subtraction
    • 6.16.4 SUBR: Floating-point subtraction, reversed
    • 6.16.5 ABD: Floating-point absolute difference
    • 6.16.6 MUL: Floating-point multiplication
    • 6.16.7 MULX: Floating-point multiplication extended
    • 6.16.8 MAD: Fused floating-point addition of product (multiplicand first)
    • 6.16.9 MLA: Fused floating-point addition of product (addend first)
    • 6.16.10 CMLA: Fused floating-point complex addition of product with rotation
    • 6.16.11 MSB: Fused floating-point subtraction of product (multiplicand first)
    • 6.16.12 MLS: Fused floating-point subtraction of product (minuend first)
    • 6.16.13 NMAD: Fused floating-point addition of product, negated (multiplicandfirst)
    • 6.16.14 NMLA: Fused floating-point addition of product, negated (addend first)
    • 6.16.15 NMSB: Fused floating-point subtraction of product, negated (multiplicandfirst)
    • 6.16.16 NMLS: Fused floating-point subtraction of product, negated (minuend first)
    • 6.16.17 DIV: Floating-point division
    • 6.16.18 DIVR: Floating-point division, reversed
    • 6.16.19 MAX: Floating-point maximum
    • 6.16.20 MAXNM: Floating-point maximum number
    • 6.16.21 MIN: Floating-point minimum
    • 6.16.22 MINNM: Floating-point minimum number
    • 6.16.23 SCALE: Floating-point adjust exponent
    • 6.16.24 TSMUL: Floating-point trigonometric starting value
    • 6.16.25 TMAD: Floating-point trigonometric multiply-add coefficient
    • 6.16.26 TSSEL: Floating-point trigonometric select coefficient
    • 6.16.27 ABS: Floating-point absolute
    • 6.16.28 NEG: Floating-point negation
    • 6.16.29 SQRT: Floating-point square root
    • 6.16.30 EXPA: Floating-point exponent accelerator
    • 6.16.31 RECPE: Floating-point reciprocal estimate
    • 6.16.32 RECPS: Floating-point reciprocal step
    • 6.16.33 RECPX: Floating-point reciprocal exponent
    • 6.16.34 RSQRTE: Floating-point reciprocal square root estimate
    • 6.16.35 RSQRTS: Floating-point reciprocal square root step
    • 6.16.36 RINTA: Floating-point round to nearest, ties away from zero
    • 6.16.37 RINTI: Floating-point round using current rounding mode (inexact)
    • 6.16.38 RINTM: Floating-point round towards -Inf
    • 6.16.39 RINTN: Floating-point round to nearest, ties to even
    • 6.16.40 RINTP: Floating-point round towards +Inf
    • 6.16.41 RINTX: Floating-point round using current rounding mode (exact)
    • 6.16.42 RINTZ: Floating-point round towards zero
  • 6.17 Floating-point reductions
    • 6.17.1 ADDA: Left-to-right floating-point addition reduction
    • 6.17.2 ADDV: Tree-based floating-point addition reduction
    • 6.17.3 MAXV: Floating-point maximum reduction
    • 6.17.4 MAXNMV: Floating-point maximum number reduction
    • 6.17.5 MINV: Floating-point minimum reduction
    • 6.17.6 MINNMV: Floating-point minimum number reduction
  • 6.18 Floating-point comparisons
    • 6.18.1 CMPEQ: Floating-point compare equal
    • 6.18.2 CMPNE: Floating-point compare not equal
    • 6.18.3 CMPLT: Floating-point compare less than
    • 6.18.4 CMPLE: Floating-point compare less than or equal to
    • 6.18.5 CMPGE: Floating-point compare greater than or equal to
    • 6.18.6 CMPGT: Floating-point compare greater than
    • 6.18.7 CMPUO: Floating-point compare unordered
    • 6.18.8 ACLT: Floating-point absolute compare less than
    • 6.18.9 ACLE: Floating-point absolute compare less than or equal to
    • 6.18.10 ACGE: Floating-point absolute compare greater than or equal to
    • 6.18.11 ACGT: Floating-point absolute compare greater than
  • 6.19 Floating-point conversions
    • 6.19.1 CVT: Convert floating-point value to integer
    • 6.19.2 CVT: Convert integer value to floating-point
    • 6.19.3 CVT: Convert floating-point value to wider type
    • 6.19.4 CVT: Convert floating-point value to narrower type
  • 6.20 Permutation and selection
    • 6.20.1 LASTA: Extract element after last active
    • 6.20.2 LASTB: Extract last active element
    • 6.20.3 CLASTA: Extract element after last active with fallback
    • 6.20.4 CLASTB: Extract last active element with fallback
    • 6.20.5 COMPACT: Compact vector and fill with zero
    • 6.20.6 SPLICE: Splice two vectors under predicate control
    • 6.20.7 EXT: Extract vector from pair of vectors
    • 6.20.8 SEL: Conditionally select elements from two inputs (all done except _bf6 and _b versions)
    • 6.20.9 DUP: Duplicate one element of a vector
    • 6.20.10 DUPQ: Duplicate one quadword of a vector
    • 6.20.11 TBL: Table lookup/permute using vector of indices
    • 6.20.12 REV: Reverse the elements in a single input
    • 6.20.13 TRN1: Interleave even elements from two inputs
    • 6.20.14 TRN2: Interleave odd elements from two inputs
    • 6.20.15 UNPKHI: Unpack and extend high half of an input
    • 6.20.16 UNPKLO: Unpack and extend low half of an input
    • 6.20.17 UZP1: Select even elements from two inputs
    • 6.20.18 UZP2: Select odd elements from two inputs
    • 6.20.19 ZIP1: Interleave elements from low halves of two inputs
    • 6.20.20 ZIP2: Interleave elements from high halves of two inputs
  • 6.21 Vector creation
    • 6.21.1 CREATE2: Create a tuple of two vectors
    • 6.21.2 CREATE3: Create a tuple of three vectors
    • 6.21.3 CREATE4: Create a tuple of four vectors
    • 6.21.4 UNDEF: Create an uninitialized vector
    • 6.21.5 UNDEF2: Create an uninitialized tuple of two vectors
    • 6.21.6 UNDEF3: Create an uninitialized tuple of three vectors
    • 6.21.7 UNDEF4: Create an uninitialized tuple of four vectors
  • 6.22 Vector insertion and extraction
    • 6.22.1 SET2: Change one vector in a tuple of two vectors
    • 6.22.2 SET3: Change one vector in a tuple of three vectors
    • 6.22.3 SET4: Change one vector in a tuple of four vectors
    • 6.22.4 GET2: Extract one vector from a tuple of two vectors
    • 6.22.5 GET3: Extract one vector from a tuple of three vectors
    • 6.22.6 GET4: Extract one vector from a tuple of four vectors
  • 6.23 Predicate creation
    • 6.23.1 PTRUE: Return an all-true predicate for a given pattern (inherent versions done, no direct tests)
    • 6.23.2 PFALSE: Return an all-false predicate
    • 6.23.3 DUP: Duplicate boolean value
    • 6.23.4 DUPQ: Duplicate boolean values to fill a predicate
  • 6.24 Predicate operations
    • 6.24.1 MOV: Copy predicate
    • 6.24.2 AND: Predicate AND
    • 6.24.3 BIC: Predicate AND NOT
    • 6.24.4 NAND: Predicate NAND
    • 6.24.5 ORR: Predicate OR
    • 6.24.6 ORN: Predicate OR NOT
    • 6.24.7 NOR: Predicate NOR
    • 6.24.8 EOR: Predicate exclusive OR
    • 6.24.9 NOT: Predicate NOT
    • 6.24.10 BRKA: Break after first true condition
    • 6.24.11 BRKB: Break before first true condition
    • 6.24.12 BRKN: Propagate break to next partition
    • 6.24.13 BRKPA: Propagate and break after first true condition
    • 6.24.14 BRKPB: Propagate and break before first true condition
    • 6.24.15 PFIRST: Set first active predicate element to true
    • 6.24.16 PNEXT: Set next active predicate element to true
  • 6.25 Testing predicates
    • 6.25.1 PTEST: Test active elements (svptest_first done, no direct test)
  • 6.26 FFR manipulation
    • 6.26.1 RDFFR: Read the first-fault register
    • 6.26.2 SETFFR: Set the first-fault register
    • 6.26.3 WRFFR: Write to the first-fault register
  • 6.27 Counting elements
    • 6.27.1 CNTP: Count active elements
    • 6.27.2 CNTB: Count the number of 8-bit elements in a pattern (inherent version done, no direct tests)
    • 6.27.3 CNTH: Count the number of 16-bit elements in a pattern (inherent version done, no direct tests)
    • 6.27.4 CNTW: Count the number of 32-bit elements in a pattern (inherent version done, no direct tests)
    • 6.27.5 CNTD: Count the number of 64-bit elements in a pattern (inherent version done, no direct tests)
    • 6.27.6 LEN: Return the number of elements in a vector
  • 6.28 Saturating scalar arithmetic
    • 6.28.1 QINCB: Saturating increment by a multiple of svcntb
    • 6.28.2 QINCH: Saturating increment by a multiple of svcnth
    • 6.28.3 QINCW: Saturating increment by a multiple of svcntw
    • 6.28.4 QINCD: Saturating increment by a multiple of svcntd
    • 6.28.5 QINCP: Saturating increment by a multiple of svcntp
    • 6.28.6 QDECB: Saturating decrement by a multiple of svcntb
    • 6.28.7 QDECH: Saturating decrement by a multiple of svcnth
    • 6.28.8 QDECW: Saturating decrement by a multiple of svcntw
    • 6.28.9 QDECD: Saturating decrement by a multiple of svcntd
    • 6.28.10 QDECP: Saturating decrement by a multiple of svcntp
  • 6.29 Reinterpreting data
    • 6.29.1 REINTERPRET: Reinterpret vector contents (all done except _bf16 versions; no direct tests)
@nemequ nemequ added the instruction-set-support Implementing new SIMD ISA extensions portably label Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
instruction-set-support Implementing new SIMD ISA extensions portably
Projects
None yet
Development

No branches or pull requests

1 participant