RV128

work in progress

Notes on candidate instructions for RV128 (unofficial).

A ‘candidate’ RISC-V 128-bit ISA is proposed for an ABI that can choose to use either 64-bit or 128-bit pointers. A 64-bit pointer model (offsets from a base register) will provide the benefits of the larger 128-bit integrals while controlling the size of application binaries.

A 64-bit near model is introduced and referred to as LP64FP128 (Long and Pointer 64-bit, Far Pointer 128-bit).

Present day code is already using 128-bit, 256-bit, and 512-bit registers for scientific code (DNA string search), multimedia, device security and many other applications. 2048-bit big integers are in every day use in public key cryptography and 256-bit Eliptic Curve encryption along with the GF(2^128) Galois Field algorithms are in widespread use. The popular examples are ECDHE and AES128-GCM AEAD algorithms employed by Mozilla Firefox, Microsoft Edge, Internet Exporer, Apple Safari and Google Chrome. There are numerous applications that can benefit from the increased bandwidth offered by 128-bit integrals.

However while large integrals are available today, the maximum physical addressable bits on typical processors remains in the order of 32-48 bits (virtual) and 32-44 bits (physical). Future processors at the very high end may soon reach 57-bits although it is likely that for the next two decades, a compact code model that uses 64-bit pointers, in the common case is likely to be more practical on 128-bit CPUs, while at the same time not precluding a pure 128-bit pointer model.

With this in mind an LP64FP128 model is proposed alongside a P128 model that uses the full 128-bit address space for pointers.

Instead of redefining existing C type sizes as was done during the transtion from ILP32 to LP64 and LLP64, these two proposals retain the current meaning of long, long long and introduce a new C type: long long long that requires no new language keywords.

C / C++ types	ILP32	LP64	P128	LP64FP128
int	32	32	32	32
long	32	64	64	64
long long	64	64	64	64
__int128 (GCC)	-	-	128	128
long long long	-	-	128	128
void*	32	64	128	64
void* attribute ((near))	32	64	128	64
void* attribute ((far))	32	64	128	128

Notes

riscv32 is ILP32
riscv64 is LP64 (code model is 32-bits, data model is 64-bits)
riscv128 is LP64FP128 (code model is 32-bits, data model is 128-bits with 64-bit base register offsets)
int128_t and uint128_t types for <cstdint> and <stdint.h>
With P128: sizeof(uintptr_t) == sizeof(uintmax_t) (*2)
With LP64FP128: sizeof(uintptr_t) != sizeof(uintmax_t) (*2)

RV128I

The majority of the RV128 instruction set is a simple translation from the RV32 and RV64 bit model however the RV128I load and store instructions needs special consideration due to the OP-LOAD encoding space and other issues such as the RISC-V atomics model not being strictly compatible with the C11 atomics model. Also, the current OP-LOAD major opcode has one free instruction based on the I-Type encoding (12-bit offset), so an alternative encoding is considered.

Instruction	Opcode
ADDD	OP-64
SUBD	OP-64
SLLD	OP-64
SRLD	OP-64
SUBD	OP-64
SRAD	OP-64
ADDID	OP-IMM-64
SLLID	OP-IMM-64
SRLID	OP-IMM-64
SRAID	OP-IMM-64

RV128M

Instruction	Opcode
MULD	OP-64
DIVD	OP-64
DIVUD	OP-64
REMD	OP-64
REMUD	OP-64

RV128A

Instruction	Opcode	Encoding
LR.D	OP-AMO	funct3[14:12]=Q.width
SC.D	OP-AMO	funct3[14:12]=Q.width
AMO{*}.D	OP-AMO	funct3[14:12]=Q.width

RV32Q, RV64Q

Instruction	Opcode	Encoding
FLQ	OP-LOAD-FP	funct3[14:12]=Q.width
FSQ	OP-STORE-FP	funct3[14:12]=Q.width
F{*}.Q	OP-FP	fmt[26:25]=Q.fmt

RV128FDQ

Opcode	Opcode	Encoding
FCVT.{F,D,Q}.C	OP-FP	funct5[31:27]={{F,D,Q}.C}, fmt[26:25]={S,D,Q}
FCVT.{F,D,Q}.CU	OP-FP	funct5[31:27]={{F,D,Q}.CU}, fmt[26:25]={S,D,Q}
FCVT.C.{F,D,Q}	OP-FP	funct5[31:27]={C.{F,D,Q}}, fmt[26:25]={S,D,Q}
FCVT.CU.{F,D,Q}	OP-FP	funct5[31:27]={CU.{F,D,Q}}, fmt[26:25]={S,D,Q}

OP-LOAD-STORE-128

Given the proposed 128-bit ABI will make large use of register register loads and stores, a register register encoding is proposed that is similar to the AMO encoding and contains slots for 3 registers and 7 function bits instead of 2 registers and 12 immediate bits.

One of the main concerns raised about a potential register register encoding for loads and stores is the requirement for 3 register read ports for stores. This is also required for Fused Multiply Add, and it may be that the additional read port can be justified on larger RV128 designs.

Word, Double Word and Quad Word instructions

The following table shows Word, Double Word and Quad Word register relative load and store instructions.

Opcode	Operands	Notes
LW	flags rd,rs2(rs1)
LWU	flags rd,rs2(rs1)
LD	flags rd,rs2(rs1)
LDU	flags rd,rs2(rs1)
LQ	flags rd,rs2(rs1)
LQU	flags rd,rs2(rs1)
LX	flags rd,rs2(rs1)	can set metadata address bit
SW	flags rs3,rs2(rs1)
SD	flags rs3,rs2(rs1)
SQ	flags rs3,rs2(rs1)
SX	flags rs3,rs2(rs1)	can set metadata address bit
AMOCMPSWAP.{w,d,q}	flags rd,rs2,(rs1)

Byte and Half

The following table shows Byte and Half register relative load and store instructions.

Byte and Half instructions may or may not be supported due to the AMO width encoding.
Consider far strings (byte and half register relative instructions).

Opcode	Operands
LB	flags rd,rs2(rs1)
LBU	flags rd,rs2(rs1)
LH	flags rd,rs2(rs1)
LHU	flags rd,rs2(rs1)
SB	flags rs3,rs2(rs1)
SH	flags rs3,rs2(rs1)

Encoding

Layout equivalent to AMO; funct5 encoding space and 2-bit width is encoding available for the opcode
Represent C11 atomics using memory flags from fence pred,succ to allow consume, not .aq.rl
Load prefetch temporal hint to preload data into cache, rd = x0
Load and Store read around and write around temporal hints (cache bybass)
LX and SX are duplicately defined in addition to the XLEN length loads and stores. LX and SX are XLEN loads and stores of addresses versus integrals (LW,LD,LQ) while functionally equivalent differ in that they do not specify word length being loaded and instead load the native width of the operating mode. This distinction also allows static analysis to distinguish pointer word sized loads.
Potentiall use two major opcodes to un-alias register slots in 31:27 and 11:7 and increase encoding space
Decide between using OP-LOAD-STORE-128 or OP-LOAD-128 and OP-STORE-128
Increased encoding space would allow code and data pointer metadata bit for LX and SX
There is precedent for aliasing register slots. The RVC compressed encodings alias rd/rs1

Address Groups

Flags could optionally be able to indicate metadata. e.g. the "address group bit" that is similar in concept to extended internal state in the FPU register file, or they could be implicitly set by LX and SX.
Separate instruction for forming address allows an implementation to treat addresses as a separate group and invalidate metadata address bit based on operations that are not valid on the address group e.g. address + offset is valid while address + address is invalid. Implementations can load offsets into the lower bits of generel purpose registers and use offsets with register register loads and stores against a base register such as gp, tp or sp.
X-Only code can use immediates to form addresses with far offsets with the address prefix stored in code that is not accessible to memory read primitives.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rv128.md

rv128.md

RV128

Notes

RV128I

RV128M

RV128A

RV32Q, RV64Q

RV128FDQ

OP-LOAD-STORE-128

Word, Double Word and Quad Word instructions

Byte and Half

Encoding

Address Groups

Files

rv128.md

Latest commit

History

rv128.md

File metadata and controls

RV128

Notes

RV128I

RV128M

RV128A

RV32Q, RV64Q

RV128FDQ

OP-LOAD-STORE-128

Word, Double Word and Quad Word instructions

Byte and Half

Encoding

Address Groups