Skip to content

Latest commit

 

History

History
207 lines (171 loc) · 8.8 KB

rv128.md

File metadata and controls

207 lines (171 loc) · 8.8 KB

RV128

work in progress

Notes on candidate instructions for RV128 (unofficial).

A ‘candidate’ RISC-V 128-bit ISA is proposed for an ABI that can choose to use either 64-bit or 128-bit pointers. A 64-bit pointer model (offsets from a base register) will provide the benefits of the larger 128-bit integrals while controlling the size of application binaries.

A 64-bit near model is introduced and referred to as LP64FP128 (Long and Pointer 64-bit, Far Pointer 128-bit).

Present day code is already using 128-bit, 256-bit, and 512-bit registers for scientific code (DNA string search), multimedia, device security and many other applications. 2048-bit big integers are in every day use in public key cryptography and 256-bit Eliptic Curve encryption along with the GF(2^128) Galois Field algorithms are in widespread use. The popular examples are ECDHE and AES128-GCM AEAD algorithms employed by Mozilla Firefox, Microsoft Edge, Internet Exporer, Apple Safari and Google Chrome. There are numerous applications that can benefit from the increased bandwidth offered by 128-bit integrals.

However while large integrals are available today, the maximum physical addressable bits on typical processors remains in the order of 32-48 bits (virtual) and 32-44 bits (physical). Future processors at the very high end may soon reach 57-bits although it is likely that for the next two decades, a compact code model that uses 64-bit pointers, in the common case is likely to be more practical on 128-bit CPUs, while at the same time not precluding a pure 128-bit pointer model.

With this in mind an LP64FP128 model is proposed alongside a P128 model that uses the full 128-bit address space for pointers.

Instead of redefining existing C type sizes as was done during the transtion from ILP32 to LP64 and LLP64, these two proposals retain the current meaning of long, long long and introduce a new C type: long long long that requires no new language keywords.

C / C++ types ILP32 LP64 P128 LP64FP128
int 32 32 32 32
long 32 64 64 64
long long 64 64 64 64
__int128 (GCC) - - 128 128
long long long - - 128 128
void* 32 64 128 64
void* attribute ((near)) 32 64 128 64
void* attribute ((far)) 32 64 128 128

Notes

  • riscv32 is ILP32
  • riscv64 is LP64 (code model is 32-bits, data model is 64-bits)
  • riscv128 is LP64FP128 (code model is 32-bits, data model is 128-bits with 64-bit base register offsets)
  • int128_t and uint128_t types for <cstdint> and <stdint.h>
  • With P128: sizeof(uintptr_t) == sizeof(uintmax_t) (*2)
  • With LP64FP128: sizeof(uintptr_t) != sizeof(uintmax_t) (*2)

RV128I

The majority of the RV128 instruction set is a simple translation from the RV32 and RV64 bit model however the RV128I load and store instructions needs special consideration due to the OP-LOAD encoding space and other issues such as the RISC-V atomics model not being strictly compatible with the C11 atomics model. Also, the current OP-LOAD major opcode has one free instruction based on the I-Type encoding (12-bit offset), so an alternative encoding is considered.

Instruction Opcode
ADDD OP-64
SUBD OP-64
SLLD OP-64
SRLD OP-64
SUBD OP-64
SRAD OP-64
ADDID OP-IMM-64
SLLID OP-IMM-64
SRLID OP-IMM-64
SRAID OP-IMM-64

RV128M

Instruction Opcode
MULD OP-64
DIVD OP-64
DIVUD OP-64
REMD OP-64
REMUD OP-64

RV128A

Instruction Opcode Encoding
LR.D OP-AMO funct3[14:12]=Q.width
SC.D OP-AMO funct3[14:12]=Q.width
AMO{*}.D OP-AMO funct3[14:12]=Q.width

RV32Q, RV64Q

Instruction Opcode Encoding
FLQ OP-LOAD-FP funct3[14:12]=Q.width
FSQ OP-STORE-FP funct3[14:12]=Q.width
F{*}.Q OP-FP fmt[26:25]=Q.fmt

RV128FDQ

Opcode Opcode Encoding
FCVT.{F,D,Q}.C OP-FP funct5[31:27]={{F,D,Q}.C}, fmt[26:25]={S,D,Q}
FCVT.{F,D,Q}.CU OP-FP funct5[31:27]={{F,D,Q}.CU}, fmt[26:25]={S,D,Q}
FCVT.C.{F,D,Q} OP-FP funct5[31:27]={C.{F,D,Q}}, fmt[26:25]={S,D,Q}
FCVT.CU.{F,D,Q} OP-FP funct5[31:27]={CU.{F,D,Q}}, fmt[26:25]={S,D,Q}

OP-LOAD-STORE-128

Given the proposed 128-bit ABI will make large use of register register loads and stores, a register register encoding is proposed that is similar to the AMO encoding and contains slots for 3 registers and 7 function bits instead of 2 registers and 12 immediate bits.

One of the main concerns raised about a potential register register encoding for loads and stores is the requirement for 3 register read ports for stores. This is also required for Fused Multiply Add, and it may be that the additional read port can be justified on larger RV128 designs.

Word, Double Word and Quad Word instructions

The following table shows Word, Double Word and Quad Word register relative load and store instructions.

Opcode Operands Notes
LW flags rd,rs2(rs1)
LWU flags rd,rs2(rs1)
LD flags rd,rs2(rs1)
LDU flags rd,rs2(rs1)
LQ flags rd,rs2(rs1)
LQU flags rd,rs2(rs1)
LX flags rd,rs2(rs1) can set metadata address bit
SW flags rs3,rs2(rs1)
SD flags rs3,rs2(rs1)
SQ flags rs3,rs2(rs1)
SX flags rs3,rs2(rs1) can set metadata address bit
AMOCMPSWAP.{w,d,q} flags rd,rs2,(rs1)

Byte and Half

The following table shows Byte and Half register relative load and store instructions.

  • Byte and Half instructions may or may not be supported due to the AMO width encoding.
  • Consider far strings (byte and half register relative instructions).
Opcode Operands
LB flags rd,rs2(rs1)
LBU flags rd,rs2(rs1)
LH flags rd,rs2(rs1)
LHU flags rd,rs2(rs1)
SB flags rs3,rs2(rs1)
SH flags rs3,rs2(rs1)

Encoding

  • Layout equivalent to AMO; funct5 encoding space and 2-bit width is encoding available for the opcode
  • Represent C11 atomics using memory flags from fence pred,succ to allow consume, not .aq.rl
  • Load prefetch temporal hint to preload data into cache, rd = x0
  • Load and Store read around and write around temporal hints (cache bybass)
  • LX and SX are duplicately defined in addition to the XLEN length loads and stores. LX and SX are XLEN loads and stores of addresses versus integrals (LW,LD,LQ) while functionally equivalent differ in that they do not specify word length being loaded and instead load the native width of the operating mode. This distinction also allows static analysis to distinguish pointer word sized loads.
  • Potentiall use two major opcodes to un-alias register slots in 31:27 and 11:7 and increase encoding space
  • Decide between using OP-LOAD-STORE-128 or OP-LOAD-128 and OP-STORE-128
  • Increased encoding space would allow code and data pointer metadata bit for LX and SX
  • There is precedent for aliasing register slots. The RVC compressed encodings alias rd/rs1

Address Groups

  • Flags could optionally be able to indicate metadata. e.g. the "address group bit" that is similar in concept to extended internal state in the FPU register file, or they could be implicitly set by LX and SX.
  • Separate instruction for forming address allows an implementation to treat addresses as a separate group and invalidate metadata address bit based on operations that are not valid on the address group e.g. address + offset is valid while address + address is invalid. Implementations can load offsets into the lower bits of generel purpose registers and use offsets with register register loads and stores against a base register such as gp, tp or sp.
  • X-Only code can use immediates to form addresses with far offsets with the address prefix stored in code that is not accessible to memory read primitives.