GTE Overview
GTE Registers
GTE Saturation
GTE Opcode Summary
GTE Coordinate Calculation Commands
GTE General Purpose Calculation Commands
GTE Color Calculation Commands
GTE Division Inaccuracy
The GTE doesn't have any memory or I/O ports mapped to the CPU memory bus,
instead, it's solely accessed via coprocessor opcodes:
mov cop0r12,rt ;-enable/disable COP2 (GTE) via COP0 status register
mov cop2r0-63,rt ;\write parameters to GTE registers
mov cop2r0-31,[rs+imm] ;/
mov cop2cmd,imm25 ;-issue GTE command
mov rt,cop2r0-63 ;\read results from GTE registers
mov [rs+imm],cop2r0-31 ;/
jt cop2flg,dest ;-jump never ;\implemented (no exception), but,
jf cop2flg,dest ;-jump always ;/flag seems to be always "false"
GTE (memory-?) load and store instructions have a delay of 2 instructions, for
any GTE commands or operations accessing that register. Any? That's wrong!
GTE instructions and functions should not be used in
- Delay slots of jumps and branches
- Event handlers or interrupts (sounds like nonsense?) (need push/pop though)
If an instruction that reads a GTE register or a GTE command is executed before
the current GTE command is finished, the CPU will hold until the instruction
has finished. The number of cycles each GTE instruction takes is shown in the
command list.
31-25 Must be 0100101b for "COP2 imm25" instructions
20-24 Fake GTE Command Number (00h..1Fh) (ignored by hardware)
19 sf - Shift Fraction in IR registers (0=No fraction, 1=12bit fraction)
17-18 MVMVA Multiply Matrix (0=Rotation. 1=Light, 2=Color, 3=Reserved)
15-16 MVMVA Multiply Vector (0=V0, 1=V1, 2=V2, 3=IR/long)
13-14 MVMVA Translation Vector (0=TR, 1=BK, 2=FC/Bugged, 3=None)
11-12 Always zero (ignored by hardware)
10 lm - Saturate IR1,IR2,IR3 result (0=To -8000h..+7FFFh, 1=To 0..+7FFFh)
6-9 Always zero (ignored by hardware)
0-5 Real GTE Command Number (00h..3Fh) (used by hardware)
The MVMVA bits are used only by the MVMVA opcode (the bits are zero for all
other opcodes).
The "sf" and "lm" bits are usually fixed (either set, or cleared, depending on
the command) (for MVMVA, the bits are variable) (also, "sf" can be changed for
some commands like SQR) (although they are usually fixed for most other
opcodes, changing them might have some effect on some/all opcodes)?
cop2r0-1 3xS16 VXY0,VZ0 Vector 0 (X,Y,Z)
cop2r2-3 3xS16 VXY1,VZ1 Vector 1 (X,Y,Z)
cop2r4-5 3xS16 VXY2,VZ2 Vector 2 (X,Y,Z)
cop2r6 4xU8 RGBC Color/code value
cop2r7 1xU16 OTZ Average Z value (for Ordering Table)
cop2r8 1xS16 IR0 16bit Accumulator (Interpolate)
cop2r9-11 3xS16 IR1,IR2,IR3 16bit Accumulator (Vector)
cop2r12-15 6xS16 SXY0,SXY1,SXY2,SXYP Screen XY-coordinate FIFO (3 stages)
cop2r16-19 4xU16 SZ0,SZ1,SZ2,SZ3 Screen Z-coordinate FIFO (4 stages)
cop2r20-22 12xU8 RGB0,RGB1,RGB2 Color CRGB-code/color FIFO (3 stages)
cop2r23 4xU8 (RES1) Prohibited
cop2r24 1xS32 MAC0 32bit Maths Accumulators (Value)
cop2r25-27 3xS32 MAC1,MAC2,MAC3 32bit Maths Accumulators (Vector)
cop2r28-29 1xU15 IRGB,ORGB Convert RGB Color (48bit vs 15bit)
cop2r30-31 2xS32 LZCS,LZCR Count Leading-Zeroes/Ones (sign bits)
cop2r32-36 9xS16 RT11RT12,..,RT33 Rotation matrix (3x3) ;cnt0-4
cop2r37-39 3x 32 TRX,TRY,TRZ Translation vector (X,Y,Z) ;cnt5-7
cop2r40-44 9xS16 L11L12,..,L33 Light source matrix (3x3) ;cnt8-12
cop2r45-47 3x 32 RBK,GBK,BBK Background color (R,G,B) ;cnt13-15
cop2r48-52 9xS16 LR1LR2,..,LB3 Light color matrix source (3x3) ;cnt16-20
cop2r53-55 3x 32 RFC,GFC,BFC Far color (R,G,B) ;cnt21-23
cop2r56-57 2x 32 OFX,OFY Screen offset (X,Y) ;cnt24-25
cop2r58 BuggyU16 H Projection plane distance. ;cnt26
cop2r59 S16 DQA Depth queing parameter A (coeff) ;cnt27
cop2r60 32 DQB Depth queing parameter B (offset);cnt28
cop2r61-62 2xS16 ZSF3,ZSF4 Average Z scale factors ;cnt29-30
cop2r63 U20 FLAG Returns any calculation errors ;cnt31
Note in some functions format is different from the one that's given here.
Rotation matrix (RT) Light matrix (LLM) Light Color matrix (LCM)
cop2r32.lsbs=RT11 cop2r40.lsbs=L11 cop2r48.lsbs=LR1
cop2r32.msbs=RT12 cop2r40.msbs=L12 cop2r48.msbs=LR2
cop2r33.lsbs=RT13 cop2r41.lsbs=L13 cop2r49.lsbs=LR3
cop2r33.msbs=RT21 cop2r41.msbs=L21 cop2r49.msbs=LG1
cop2r34.lsbs=RT22 cop2r42.lsbs=L22 cop2r50.lsbs=LG2
cop2r34.msbs=RT23 cop2r42.msbs=L23 cop2r50.msbs=LG3
cop2r35.lsbs=RT31 cop2r43.lsbs=L31 cop2r51.lsbs=LB1
cop2r35.msbs=RT32 cop2r43.msbs=L32 cop2r51.msbs=LB2
cop2r36 =RT33 cop2r44 =L33 cop2r52 =LB3
Each element is 16bit (1bit sign, 3bit integer, 12bit fraction). Reading the
last elements (RT33,L33,LB3) returns the 16bit value sign-expanded to 32bit.
cop2r37 (cnt5) - TRX - Translation vector X (R/W?)
cop2r38 (cnt6) - TRY - Translation vector Y (R/W?)
cop2r39 (cnt7) - TRZ - Translation vector Z (R/W?)
Each element is 32bit (1bit sign, 31bit integer).
Used only for MVMVA, RTPS, RTPT commands.
cop2r45 (cnt13) - RBK - Background color red component
cop2r46 (cnt14) - GBK - Background color green component
cop2r47 (cnt15) - BBK - Background color blue component
Each element is 32bit (1bit sign, 19bit integer, 12bit fraction).
cop2r53 (cnt21) - RFC - Far color red component
cop2r54 (cnt22) - GFC - Far color green component
cop2r55 (cnt23) - BFC - Far color blue component
Each element is 32bit (1bit sign, 27bit integer, 4bit fraction).
cop2r56 (cnt24) - OFX - Screen offset X
cop2r57 (cnt25) - OFY - Screen offset Y
cop2r58 (cnt26) - H - Projection plane distance
cop2r59 (cnt27) - DQA - Depth queing parameter A.(coeff.)
cop2r60 (cnt28) - DQB - Depth queing parameter B.(offset.)
The X and Y values are each 32bit (1bit sign, 15bit integer, 16bit fraction).
The H value is 16bit unsigned (0bit sign, 16bit integer, 0bit fraction). BUG:
When reading the H register, the hardware does accidently <sign-expand>
the <unsigned> 16bit value (ie. values +8000h..+FFFFh are returned as
FFFF8000h..FFFFFFFFh) (this bug applies only to "mov rd,cop2r58" opcodes; the
actual calculations via RTPS/RTPT opcodes are working okay).
The DQA value is only 16bit (1bit sign, 7bit integer, 8bit fraction).
The DQB value is 32bit (1bit sign, 7bit integer, 24bit? fraction).
Used only for RTPS/RTPT commands.
cop2r61 (cnt29) ZSF3 | 0|ZSF3 1,3,12| Z3 average scale factor (normally 1/3)
cop2r62 (cnt30) ZSF4 | 0|ZSF4 1,3,12| Z4 average scale factor (normally 1/4)
cop2r7 OTZ (R) | |OTZ 0,15, 0| Average Z value (for Ordering Table)
Used only for AVSZ3/AVSZ4 commands.
cop2r12 - SXY0 rw|SY0 1,15, 0|SX0 1,15, 0| Screen XY fifo (older)
cop2r13 - SXY1 rw|SY1 1,15, 0|SX1 1,15, 0| Screen XY fifo (old)
cop2r14 - SXY2 rw|SY2 1,15, 0|SX2 1,15, 0| Screen XY fifo (new)
cop2r15 - SXYP rw|SYP 1,15, 0|SXP 1,15, 0| SXY2-mirror with move-on-write
cop2r16 - SZ0 rw| 0|SZ0 0,16, 0| Screen Z fifo (oldest)
cop2r17 - SZ1 rw| 0|SZ1 0,16, 0| Screen Z fifo (older)
cop2r18 - SZ2 rw| 0|SZ2 0,16, 0| Screen Z fifo (old)
cop2r19 - SZ3 rw| 0|SZ3 0,16, 0| Screen Z fifo (new)
SX,SY,SZ are used as Output for RTPS/RTPT. Additionally, SX,SY are used as
Input for NCLIP, and SZ is used as Input for AVSZ3/AVSZ4.
The SZn Fifo has 4 stages (required for AVSZ4 command), the SXYn Fifo has only
3 stages, and a special mirrored register: SXYP is a mirror of SXY2, the
difference is that writing to SXYP moves SXY2/SXY1 to SXY1/SXY0, whilst writing
to SXY2 (or any other SXYn or SZn registers) changes only the written register,
but doesn't move any other Fifo entries.
Vector 0 (V0) Vector 1 (V1) Vector 2 (V2) Vector 3 (IR)
cop2r0.lsbs - VX0 cop2r2.lsbs - VX1 cop2r4.lsbs - VX2 cop2r9 - IR1
cop2r0.msbs - VY0 cop2r2.msbs - VY1 cop2r4.msbs - VY2 cop2r10 - IR2
cop2r1 - VZ0 cop2r3 - VZ1 cop2r5 - VZ2 cop2r11 - IR3
All elements are signed 16bit. The IRn and VZn elements occupy a whole 32bit
register, reading these registers returns the 16bit value sign-expanded to
32bit. Note: IRn can be also indirectly accessed via IRGB/ORGB registers.
cop2r6 - RGBC rw|CODE |B |G |R | Color/code
cop2r20 - RGB0 rw|CD0 |B0 |G0 |R0 | Characteristic color fifo.
cop2r21 - RGB1 rw|CD1 |B1 |G1 |R1 |
cop2r22 - RGB2 rw|CD2 |B2 |G2 |R2 |
cop2r23 - (RES1) | | Prohibited
RES1 seems to be unused... looks like an unused Fifo stage... RES1 is
read/write-able... unlike SXYP (for SXYn Fifo) it does not mirror to RGB2, nor
does it have a move-on-write function...
cop2r8 IR0 rw|Sign |IR0 1, 3,12| Intermediate value 0.
Used as Output for RTPS/RTPT, and as Input for various commands.
cop2r24 MAC0 rw|MAC0 1,31,0 | Sum of products value 0
cop2r25 MAC1 rw|MAC1 1,31,0 | Sum of products value 1
cop2r26 MAC2 rw|MAC2 1,31,0 | Sum of products value 2
cop2r27 MAC3 rw|MAC3 1,31,0 | Sum of products value 3
Expands 5:5:5 bit RGB (range 0..1Fh) to 16:16:16 bit RGB (range 0000h..0F80h).
0-4 Red (0..1Fh) (R/W) ;multiplied by 80h, and written to IR1
5-9 Green (0..1Fh) (R/W) ;multiplied by 80h, and written to IR2
10-14 Blue (0..1Fh) (R/W) ;multiplied by 80h, and written to IR3
15-31 Not used (always zero) (Read only)
After writing to IRGB, the result can be read from IR3 after TWO nop's, and
from IR1,IR2 after THREE nop's (for uncached code, ONE nop would work). When
using IR1,IR2,IR3 as parameters for GTE commands, similar timing restrictions
might apply... depending on when the specific commands use the parameters?
Collapses 16:16:16 bit RGB (range 0000h..0F80h) to 5:5:5 bit RGB (range
0..1Fh). Negative values (8000h..FFFFh/80h) are saturated to 00h, large
positive values (1000h..7FFFh/80h) are saturated to 1Fh, there are no overflow
or saturation flags set in cop2r63 though.
0-4 Red (0..1Fh) (R) ;IR1 divided by 80h, saturated to +00h..+1Fh
5-9 Green (0..1Fh) (R) ;IR2 divided by 80h, saturated to +00h..+1Fh
10-14 Blue (0..1Fh) (R) ;IR3 divided by 80h, saturated to +00h..+1Fh
15-31 Not used (always zero) (Read only)
Any changes to IR1,IR2,IR3 are reflected to this register (and, actually also
to IRGB) (ie. ORGB is simply a read-only mirror of IRGB).
Reading LZCR returns the leading 0 count of LZCS if LZCS is positive and the
leading 1 count of LZCS if LZCS is negative. The results are in range 1..32.
See GTE Saturation chapter.
Maths overflows are indicated in FLAG register. In most cases, the result is
saturated to MIN/MAX values (except MAC0,MAC1,MAC2,MAC3 which aren't
saturated). For IR1,IR2,IR3 many commands allow to select the MIN value via
"lm" bit of the GTE opcode (though not all commands, RTPS/RTPT always act as if
lm=0).
31 Error Flag (Bit30..23, and 18..13 ORed together) (Read only)
30 MAC1 Result larger than 43 bits and positive
29 MAC2 Result larger than 43 bits and positive
28 MAC3 Result larger than 43 bits and positive
27 MAC1 Result larger than 43 bits and negative
26 MAC2 Result larger than 43 bits and negative
25 MAC3 Result larger than 43 bits and negative
24 IR1 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0)
23 IR2 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0)
22 IR3 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0)
21 Color-FIFO-R saturated to +00h..+FFh
20 Color-FIFO-G saturated to +00h..+FFh
19 Color-FIFO-B saturated to +00h..+FFh
18 SZ3 or OTZ saturated to +0000h..+FFFFh
17 Divide overflow. RTPS/RTPT division result saturated to max=1FFFFh
16 MAC0 Result larger than 31 bits and positive
15 MAC0 Result larger than 31 bits and negative
14 SX2 saturated to -0400h..+03FFh
13 SY2 saturated to -0400h..+03FFh
12 IR0 saturated to +0000h..+1000h
0-11 Not used (always zero) (Read only)
Bit30-12 are read/write-able, ie. they can be set/reset by software, however,
that's normally not required - all bits are automatically reset at the begin of
a new GTE command.
Bit31 is apparently intended for RTPS/RTPT commands, since it triggers only on
flags that are affected by these two commands, but even for that commands it's
totally useless since one could as well check if FLAG is nonzero.
Note: Writing 32bit values to 16bit GTE registers by software does not trigger
any overflow/saturation flags (and does not do any saturation), eg. writing
12008900h (positive 32bit) to a signed 16bit register sets that register to
FFFF8900h (negative 16bit).
Opc Name Clk Expl.
00h - N/A (modifies similar registers than RTPS...)
01h RTPS 15 Perspective Transformation single
0xh - N/A
06h NCLIP 8 Normal clipping
0xh - N/A
0Ch OP(sf) 6 Outer product of 2 vectors
0xh - N/A
10h DPCS 8 Depth Cueing single
11h INTPL 8 Interpolation of a vector and far color vector
12h MVMVA 8 Multiply vector by matrix and add vector (see below)
13h NCDS 19 Normal color depth cue single vector
14h CDP 13 Color Depth Que
15h - N/A
16h NCDT 44 Normal color depth cue triple vectors
1xh - N/A
1Bh NCCS 17 Normal Color Color single vector
1Ch CC 11 Color Color
1Dh - N/A
1Eh NCS 14 Normal color single
1Fh - N/A
20h NCT 30 Normal color triple
2xh - N/A
28h SQR(sf)5 Square of vector IR
29h DCPL 8 Depth Cue Color light
2Ah DPCT 17 Depth Cueing triple (should be fake=08h, but isn't)
2xh - N/A
2Dh AVSZ3 5 Average of three Z values
2Eh AVSZ4 6 Average of four Z values
2Fh - N/A
30h RTPT 23 Perspective Transformation triple
3xh - N/A
3Dh GPF(sf)5 General purpose interpolation
3Eh GPL(sf)5 General purpose interpolation with base
3Fh NCCT 39 Normal Color Color triple vector
Unknown if/what happens when using the "N/A" opcodes?
The fake opcode number in bit20-24 has absolutely no effect on the hardware, it
seems to be solely used to (or not to) confuse developers. Having the opcodes
sorted by their fake numbers gives a more or less well arranged list:
Fake Name Clk Expl.
00h - N/A
01h RTPS 15 Perspective Transformation single
02h RTPT 23 Perspective Transformation triple
03h - N/A
04h MVMVA 8 Multiply vector by matrix and add vector (see below)
05h - N/A
06h DCPL 8 Depth Cue Color light
07h DPCS 8 Depth Cueing single
08h DPCT 17 Depth Cueing triple (should be fake=08h, but isn't)
09h INTPL 8 Interpolation of a vector and far color vector
0Ah SQR(sf)5 Square of vector IR
0Bh - N/A
0Ch NCS 14 Normal color single
0Dh NCT 30 Normal color triple
0Eh NCDS 19 Normal color depth cue single vector
0Fh NCDT 44 Normal color depth cue triple vectors
10h NCCS 17 Normal Color Color single vector
11h NCCT 39 Normal Color Color triple vector
12h CDP 13 Color Depth Que
13h CC 11 Color Color
14h NCLIP 8 Normal clipping
15h AVSZ3 5 Average of three Z values
16h AVSZ4 6 Average of four Z values
17h OP(sf) 6 Outer product of 2 vectors
18h - N/A
19h GPF(sf)5 General purpose interpolation
1Ah GPL(sf)5 General purpose interpolation with base
1Bh - N/A
1Ch - N/A
1Dh - N/A
1Eh - N/A
1Fh - N/A
For the sort-effect, DCPT should use fake=08h, but Sony seems to have
accidently numbered it fake=0Fh in their devkit (giving it the same fake number
as for NCDT). Also, "Wipeout 2097" accidently uses 0140006h (fake=01h and
distorted bit18) instead of 1400006h (fake=14h) for NCLIP.
The LZCS/LZCR registers offer a Count-Leading-Zeroes/Leading-Ones function.
The IRGB/ORGB registers allow to convert between 48bit and 15bit RGB colors.
These registers work without needing to send any COP2 commands. However, unlike
for commands (which do automatically halt the CPU when needed), one must insert
dummy opcodes between writing and reading the registers.
RTPS performs final Rotate, translate and perspective transformation on vertex
V0. Before writing to the FIFOs, the older entries are moved one stage down.
RTPT is same as RTPS, but repeats for V1 and V2. The "sf" bit should be usually
set.
IR1 = MAC1 = (TRX*1000h + RT11*VX0 + RT12*VY0 + RT13*VZ0) SAR (sf*12)
IR2 = MAC2 = (TRY*1000h + RT21*VX0 + RT22*VY0 + RT23*VZ0) SAR (sf*12)
IR3 = MAC3 = (TRZ*1000h + RT31*VX0 + RT32*VY0 + RT33*VZ0) SAR (sf*12)
SZ3 = MAC3 SAR ((1-sf)*12) ;ScreenZ FIFO 0..+FFFFh
MAC0=(((H*20000h/SZ3)+1)/2)*IR1+OFX, SX2=MAC0/10000h ;ScrX FIFO -400h..+3FFh
MAC0=(((H*20000h/SZ3)+1)/2)*IR2+OFY, SY2=MAC0/10000h ;ScrY FIFO -400h..+3FFh
MAC0=(((H*20000h/SZ3)+1)/2)*DQA+DQB, IR0=MAC0/1000h ;Depth cueing 0..+1000h
If the result of the "(((H*20000h/SZ3)+1)/2)" division is greater than 1FFFFh,
then the division result is saturated to +1FFFFh, and the divide overflow bit
in the FLAG register gets set; that happens if the vertex is exceeding the
"near clip plane", ie. if it is very close to the camera (SZ3<=H/2), exactly
at the camara position (SZ3=0), or behind the camera (negative Z coordinates
are saturated to SZ3=0). For details on the division, see:
GTE Division Inaccuracy
For "far plane clipping", one can use the SZ3 saturation flag (MaxZ=FFFFh), or
the IR3 saturation flag (MaxZ=7FFFh) (eg. used by Wipeout 2097), or one can
compare the SZ3 value with any desired MaxZ value by software.
Note: The command does saturate IR1,IR2,IR3 to -8000h..+7FFFh (regardless of lm
bit). When using RTP with sf=0, then the IR3 saturation flag (FLAG.22) gets set
<only> if "MAC3 SAR 12" exceeds -8000h..+7FFFh (although IR3 is saturated
when "MAC3" exceeds -8000h..+7FFFh).
MAC0 = SX0*SY1 + SX1*SY2 + SX2*SY0 - SX0*SY2 - SX1*SY0 - SX2*SY1
The sign of the result indicates whether the polygon coordinates are arranged
clockwise or anticlockwise (ie. whether the front side or backside is visible).
If the result is zero, then it's neither one (ie. the vertices are all arranged
in a straight line). Note: The GPU probably renders straight lines as invisble
0 pixel width lines?
MAC0 = ZSF3*(SZ1+SZ2+SZ3) ;for AVSZ3
MAC0 = ZSF4*(SZ0+SZ1+SZ2+SZ3) ;for AVSZ4
OTZ = MAC0/1000h ;for both (saturated to 0..FFFFh)
Adds three or four Z values together and multplies them by a fixed point value.
The result can be used as index in the GPU's Ordering Table (OT).
GPU Depth Ordering
The scaling factors would be usually ZSF3=N/30h and ZSF4=N/40h, where "N" is
the number of entries in the OT (max 10000h). SZn and OTZ are unsigned 16bit
values, for whatever reason ZSFn registers are signed 16bit values (negative
values would allow a negative result in MAC0, but would saturate OTZ to zero).
Multiply vector by matrix and vector addition.
Mx = matrix specified by mx ;RT/LLM/LCM - Rotation, light or color matrix
Vx = vector specified by v ;V0, V1, V2, or [IR1,IR2,IR3]
Tx = translation vector specified by cv ;TR or BK or Bugged/FC, or None
Calculation:
MAC1 = (Tx1*1000h + Mx11*Vx1 + Mx12*Vx2 + Mx13*Vx3) SAR (sf*12)
MAC2 = (Tx2*1000h + Mx21*Vx1 + Mx22*Vx2 + Mx23*Vx3) SAR (sf*12)
MAC3 = (Tx3*1000h + Mx31*Vx1 + Mx32*Vx2 + Mx33*Vx3) SAR (sf*12)
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
Multiplies a vector with either the rotation matrix, the light matrix or the
color matrix and then adds the translation vector or background color vector.
The GTE also allows selection of the far color vector (FC), but this vector is
not added correctly by the hardware: The return values are reduced to the last
portion of the formula, ie. MAC1=(Mx13*Vx3) SAR (sf*12), and similar for MAC2
and MAC3, nethertheless, some bits in the FLAG register seem to be adjusted as
if the full operation would have been executed. Setting Mx=3 selects a garbage
matrix (with elements -60h, +60h, IR0, RT13, RT13, RT13, RT22, RT22, RT22).
[MAC1,MAC2,MAC3] = [IR1*IR1,IR2*IR2,IR3*IR3] SHR (sf*12)
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ;IR1,IR2,IR3 saturated to max 7FFFh
Calculates the square of a vector. The result is, of course, always positive,
so the "lm" flag for negative saturation has no effect.
[MAC1,MAC2,MAC3] = [IR3*D2-IR2*D3, IR1*D3-IR3*D1, IR2*D1-IR1*D2] SAR (sf*12)
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ;copy result
Calculates the outer product of two signed 16bit vectors. Note: D1,D2,D3 are
meant to be the RT11,RT22,RT33 elements of the RT matrix "misused" as vector.
lm should be usually zero.
The LZCS/LZCR registers offer a Count-Leading-Zeroes/Leading-Ones function.
In: V0=Normal vector (for triple variants repeated with V1 and V2),
BK=Background color, RGBC=Primary color/code, LLM=Light matrix, LCM=Color
matrix, IR0=Interpolation value.
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (LLM*V0) SAR (sf*12)
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (BK*1000h + LCM*IR) SAR (sf*12)
[MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4 ;<--- for NCDx/NCCx
[MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0 ;<--- for NCDx only
[MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12) ;<--- for NCDx/NCCx
Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
In: [IR1,IR2,IR3]=Vector, RGBC=Primary color/code, LCM=Color matrix,
BK=Background color, and, for CDP, IR0=Interpolation value, FC=Far color.
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (BK*1000h + LCM*IR) SAR (sf*12)
[MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4
[MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0 ;<--- for CDP only
[MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12)
Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
In: [IR1,IR2,IR3]=Vector, FC=Far Color, IR0=Interpolation value, CODE=MSB of
RGBC, and, for DCPL, R,G,B=LSBs of RGBC.
[MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4 ;<--- for DCPL only
[MAC1,MAC2,MAC3] = [IR1,IR2,IR3] SHL 12 ;<--- for INTPL only
[MAC1,MAC2,MAC3] = [R,G,B] SHL 16 ;<--- for DPCS/DPCT
[MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0
[MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12)
Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
DPCT executes thrice, and reads the R,G,B values from RGB0 (ie. reads from the
Bottom of the Color FIFO, instead of from the RGBC register) (the CODE value is
kept read from RGBC as usually), so, after DPCT execution, the RGB0,RGB1,RGB2
Fifo entries are modified.
[MAC1,MAC2,MAC3] = [0,0,0] ;<--- for GPF only
[MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SHL (sf*12) ;<--- for GPL only
[MAC1,MAC2,MAC3] = (([IR1,IR2,IR3] * IR0) + [MAC1,MAC2,MAC3]) SAR (sf*12)
Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
Note: Although the SHL in GPL is theoretically undone by the SAR, 44bit
overflows can occur internally when sf=1.
[IR1,IR2,IR3] = (([RFC,GFC,BFC] SHL 12) - [MAC1,MAC2,MAC3]) SAR (sf*12)
[MAC1,MAC2,MAC3] = (([IR1,IR2,IR3] * IR0) + [MAC1,MAC2,MAC3])
Note: Above "[IR1,IR2,IR3]=(FC-MAC)" is saturated to -8000h..+7FFFh (ie. as if
lm=0), anyways, further writes to [IR1,IR2,IR3] (within the same command) are
saturated as usually (ie. depening on lm setting).
Works like MVMVA command (see there), but with fixed Tx/Vx/Mx parameters, the
sf/lm bits can be changed and do affect the results (although normally both
bits should be set for use with color matrices).
The 8bit RGB values written to the top of Color Fifo are the 32bit MACn values
divided by 16, and saturated to +00h..+FFh, and of course, the older Fifo
entries are moved downwards. Note that, at the GPU side, the meaning of the RGB
values depends on whether or not texture blending is used (for untextured
polygons FFh is max brightness) (for texture blending FFh is double brightness
and 80h is normal brightness).
The 8bit CODE value is intended to contain a GP0(20h..7Fh) Rendering command,
allowing to automatically merge the 8bit command number, with the 24bit color
value.
The IRGB/ORGB registers allow to convert between 48bit and 15bit RGB colors.
Although the result of the commands in this chapter is written to the Color
FIFO, some commands like GPF/GPL may be also used for other purposes (eg. to
scale or scale/translate single vertices).
Basically, the GTE division does (attempt to) work as so (using 33bit maths):
n = (((H*20000h/SZ3)+1)/2)
alternatly, below would give (almost) the same result (using 32bit maths):
n = ((H*10000h+SZ3/2)/SZ3)
in both cases, the result is saturated about as so:
if n>1FFFFh or division_by_zero then n=1FFFFh, FLAG.Bit17=1, FLAG.Bit31=1
However, the real GTE hardware is using a fast, but less accurate division
mechanism (based on Unsigned Newton-Raphson (UNR) algorithm):
if (H < SZ3*2) then ;check if overflow
z = count_leading_zeroes(SZ3) ;z=0..0Fh (for 16bit SZ3)
n = (H SHL z) ;n=0..7FFF8000h
d = (SZ3 SHL z) ;d=8000h..FFFFh
u = unr_table[(d-7FC0h) SHR 7] + 101h ;u=200h..101h
d = ((2000080h - (d * u)) SHR 8) ;d=10000h..0FF01h
d = ((0000080h + (d * u)) SHR 8) ;d=20000h..10000h
n = min(1FFFFh, (((n*d) + 8000h) SHR 16)) ;n=0..1FFFFh
else n = 1FFFFh, FLAG.Bit17=1, FLAG.Bit31=1 ;n=1FFFFh plus overflow flag
the GTE's unr_table[000h..100h] consists of following values:
FFh,FDh,FBh,F9h,F7h,F5h,F3h,F1h,EFh,EEh,ECh,EAh,E8h,E6h,E4h,E3h ;\
E1h,DFh,DDh,DCh,DAh,D8h,D6h,D5h,D3h,D1h,D0h,CEh,CDh,CBh,C9h,C8h ; 00h..3Fh
C6h,C5h,C3h,C1h,C0h,BEh,BDh,BBh,BAh,B8h,B7h,B5h,B4h,B2h,B1h,B0h ;
AEh,ADh,ABh,AAh,A9h,A7h,A6h,A4h,A3h,A2h,A0h,9Fh,9Eh,9Ch,9Bh,9Ah ;/
99h,97h,96h,95h,94h,92h,91h,90h,8Fh,8Dh,8Ch,8Bh,8Ah,89h,87h,86h ;\
85h,84h,83h,82h,81h,7Fh,7Eh,7Dh,7Ch,7Bh,7Ah,79h,78h,77h,75h,74h ; 40h..7Fh
73h,72h,71h,70h,6Fh,6Eh,6Dh,6Ch,6Bh,6Ah,69h,68h,67h,66h,65h,64h ;
63h,62h,61h,60h,5Fh,5Eh,5Dh,5Dh,5Ch,5Bh,5Ah,59h,58h,57h,56h,55h ;/
54h,53h,53h,52h,51h,50h,4Fh,4Eh,4Dh,4Dh,4Ch,4Bh,4Ah,49h,48h,48h ;\
47h,46h,45h,44h,43h,43h,42h,41h,40h,3Fh,3Fh,3Eh,3Dh,3Ch,3Ch,3Bh ; 80h..BFh
3Ah,39h,39h,38h,37h,36h,36h,35h,34h,33h,33h,32h,31h,31h,30h,2Fh ;
2Eh,2Eh,2Dh,2Ch,2Ch,2Bh,2Ah,2Ah,29h,28h,28h,27h,26h,26h,25h,24h ;/
24h,23h,22h,22h,21h,20h,20h,1Fh,1Eh,1Eh,1Dh,1Dh,1Ch,1Bh,1Bh,1Ah ;\
19h,19h,18h,18h,17h,16h,16h,15h,15h,14h,14h,13h,12h,12h,11h,11h ; C0h..FFh
10h,0Fh,0Fh,0Eh,0Eh,0Dh,0Dh,0Ch,0Ch,0Bh,0Ah,0Ah,09h,09h,08h,08h ;
07h,07h,06h,06h,05h,05h,04h,04h,03h,03h,02h,02h,01h,01h,00h,00h ;/
00h ;<-- one extra table entry (for "(d-7FC0h)/80h"=100h) ;-100h
Above can be generated as "unr_table[i]=min(0,(40000h/(i+100h)+1)/2-101h)".
Some special cases: NNNNh/0001h uses a big multiplier (d=20000h), in practice,
this can occur only for 0000h/0001h and 0001h/0001h (due to the H<SZ3*2
overflow check).
The min(1FFFFh) limit is needed for cases like FE3Fh/7F20h, F015h/780Bh, etc.
(these do produce UNR result 20000h, and are saturated to 1FFFFh, but without
setting overflow FLAG bits).