-
Notifications
You must be signed in to change notification settings - Fork 3
VIP CHIP 8 Timing
(Version 1.1, 2025-01-06, by Steffen "Gulrak" Schümann)
Normally CHIP-8 interpreters don't strive to be cycle accurate, as this is extremely depending on the variant. Still it is possible to make a cycle accurate interpreter that will behave the same timing wise as the original interpreter on a COSMAC VIP, that is, each opcode and its time point relative to the VIP frame timing will be at the same time as on a real system.
This document provides the opcode execution timing and information about the overall timing. It does not show the derivation of those values, but they where generated by following the execution paths of the original interpreter, adding up cycles on the way and finding which have fixed times and which have variables times and on what they depend. I created this documents from notes I made on the way when I implemented my own (possibly first overall?) implementation of this in an interpreter in June 2023, the CHIP-8-STRICT core inside Cadmium, and I ran various programs on it and a COSMAC VIP emulation executing the original interpreter next to each other to verify my implementation stays in sync with a VIP.
Disclaimer: While keyboard input opcodes behave as close to the original as possible, the fact that they can't actually access the real keyboard leads to the theoretical fact that pressing a key at the same time in a frame on a real COSMAC VIP and the emulation, will still not lead to the same recognition time as the operating systems event system sets other granularity constraints.
Also, while I tried hard and checked with a bunch of programs, there still can be bugs, so please don't implement critical technology with it, or use it at your own risk. But if you find a bug please let me know, so it can be fixed here.
The Interpreter starts the execution of the actual loaded program 3250 cycles in, as that is what the monitor startup checking for ram size and the initialization of the interpreter takes. If one also wants to count CHIP-8 instructions, those start with two, as the four bytes before 512/0x200 are already executed opcodes to clear (00E0) and enable (004B) the display.
Each instruction has some time in machine cycles that it takes to execute. The VIP timing is influenced by the interrupt/dma timing so the concept of frame cycles is important. A frame has 3668 machine cycles. In it there is the interrupt routine that is responsible for the timers and the video display. The cycle time of the next interrupt is calculated by:
((machineCycles + 2572) / 3668) * 3668) + 1096
If the current machine cycles is greater or equal to the next interrupt time, an interrupt call is to be simulated. It takes
1832 + (soundTimer ? 4 : 0) + (delayTimer ? 8 : 0)
machine cycles to complete and in this "time" the timers need to be decremented and the screen updated.
NOTE: When incrementing the machine cycles, a check for interrupt needs to be done
and the interrupt inserted. In the pseudo-code a function
addCycles()
is always used to increment machine cycles and it is assumed that it will
do the incrementing and handle interrupts. If your
emulator needs an outer frame control (so you trigger it per frame to execute one frame), you need to
add additional logic to return to outer emulation
loop and return for the next frame (typically by decrementing
the PC to keep at the opcode and using some additional waiting
state to know where to continue in the opcode). If you instead e.g. run the emulation in it's own thread, that would signal the frontent a frame has ended and to update the screen (and possibly push audio data), or if you use a language that can yield
, you might be able to use that from inside the addCycles()
instead.
Either logic is outside of the scope of this document as to many ways to implement this are possible.
For this type of CHIP-8 emulation, the main point is about where the opcode starts, not the inner opcode cycles, besides 00E0, Dxyn and Fx0A and they are described later in more detail.
NOTE on fetch and decode: All cycle numbers are in machine cycles. The Detailed Cycles column lists
the fetch and decoding time extra, the first summand is the common fetch and decode
time, which is simply 40 machine cycles for every opcode in the 0nnn
range,
and 68 machine cycles for all the others, then in case of Fnnn
opcodes,
a second dispatch stage is used, which adds another 4 machine cycles. The last
summand is the actual opcode execution time, all of them are given as a sum
in the Total Cycles column.
Opcode | Detailed Cycles | Total Cycles |
Notes |
---|---|---|---|
0nnn |
- | - | undefined, as this depends on the machine code called, this needs backend emulation, only COSMAC VIP and DREAM6800 support this |
00E0 |
40 + 3078 | 3118 | see below for some additional hint |
00EE |
40 + 10 | 50 | |
1nnn |
68 + 12 | 80 | |
2nnn |
68 + 26 | 94 | |
3xnn |
68 + 10 | 78 | +4 if skipping |
4xnn |
68 + 10 | 78 | +4 if skipping |
5xy0 |
68 + 14 | 82 | +4 if skipping |
6xnn |
68 + 6 | 74 | |
7xnn |
68 + 10 | 78 | |
8xy0 |
68 + 12 | 80 | |
8xy1 |
68 + 44 | 112 | |
8xy2 |
68 + 44 | 112 | |
8xy3 |
68 + 44 | 112 | |
8xy4 |
68 + 44 | 112 | |
8xy5 |
68 + 44 | 112 | |
8xy6 |
68 + 44 | 112 | |
8xy7 |
68 + 44 | 112 | |
8xyE |
68 + 44 | 112 | |
9xy0 |
68 + 14 | 82 | +4 if skipping |
Annn |
68 + 12 | 80 | |
Bnnn |
68 + 22 | 90 | +2 on PC high byte change |
Cnnn |
68 + 36 | 104 | |
Dxyn |
* | * | see below |
Ex9E |
68 + 14 | 82 | +4 if skipping |
ExA1 |
68 + 14 | 82 | +4 if skipping |
Fx07 |
68 + 4 + 6 | 78 | |
Fx0A |
* | * | see below |
Fx15 |
68 + 4 + 6 | 78 | |
Fx18 |
68 + 4 + 6 | 78 | |
Fx1E |
68 + 4 + 12 | 84 | +6 on I high byte change |
Fx29 |
68 + 4 + 16 | 88 | |
Fx33 |
68 + 4 + 80 | 152 | +(digit sum) * 16 |
Fx55 |
68 + 4 + 14 | 86 | + 14 * (number of registers) |
Fx65 |
68 + 4 + 14 | 86 | + 14 * (number of registers) |
If one follows the simple pattern of the other opcodes
to emulate the clear screen opcode, a problem is that
typically the visible deletion of a frame happens at least
one frame too early. So for this opcode, it is
important to first increment cycles by calling addCycles(3118)
to emit the current frame before erasing its content,
else flickering could be much worse than on the real
machine.
The timing of Dxyn is quite complex. It is made up of preparation time, waiting time and drawing time. We look at all of these:
Dxyn
first draws the sprite into a two byte wide, sixteen rows height buffer.
The time needed for this is: 136 + lines * (46 + 20 * (x&7))
so it heavily
depends on the amount of shifting needed.
prepareCycles = 68 + 68 + lines * (46 + 20 * (x&7))
while prepareCycles > 0:
addCycles(cycles left in frame)
prepareCycle -= cycles left in frame
The first 68 cycles are the fetch and decode part for Dxyn.
The time needed to copy the sprite into the the screen buffer is then calculated during drawing:
In pseudo-code:
drawingCycles = 24;
for each line not clipped:
col1 = col2 = 0
if first byte of line collides:
col1 = 1
if second byte of line collides:
col2 = 1
drawingCycles += (34 + col1 + (x < 56 ? 16 : 0) + col2)
addCycles(drawingCycles)
The collision indicators col1
and col2
in there depend on emulating as if the
sprite bytes are actually shifted into a two byte buffer and
each byte is then XORed to the screen memory. This can still be done by keeping track of pixel offset, but it might be easiest
to actually implement the byte splitting.
The key waiting is dependent on key input, so there is no
fixed timing. It of course has a fetch and decode prefix of
68 + 4
machine cycles but then it behaves as first looping
to wait for a key to be pressed, and the first key it sees as pressed
it will use to wait for its release, while constantly setting
the sound-timer to 4 in that release wait loop. When the key
is released, it waits for the sound-timer to run down to 0 and then takes at most 10 machine cycles after the interrupt decrementing the sound timer to 0 to continue. As the outer
influence (key activity) is hugely dominating and randomizing
its timing, it practically will not matter much if one emulates
those 10 cycles or not, but they are there.
The comment in the table talks about (digit sum)
and
what this means is the cross sum of the conversion result,
so if the number is 123
than the sum is 1 + 2 + 3 = 6
.
This work, and a lot of other of my work related to CHIP-8 builds on the work of others, and I want to thank them for their groundwork that made my life so much easier:
Gooitzen S. van der Wal and J. W. Wentworth, who analyzed and documented the working of the CHIP-8 interpreter on the COSMAC VIP and the operating system in its 512 byte ROM in VIPER Volume I, Issue 2, August 1978 and VIPER Volume I, Issue 3, September 1978. (And thanks to Matt Mikolay for putting up the scans for non-commercial use.)
Laurence Scotford for his work on Chip-8 on the COSMAC VIP, where he in-detail explains the inner workings of the original CHIP-8 interpreter as published for the COSMAC VIP. He also did cycle analyses but after some inaccuracies, and e.g. Dxyn not being detailed enough, I still recalculated them for all opcodes again myself, but I would not have started the endeavor of making a cycle accurate high level emulated VIP CHIP-8, if it wasn't for his work.
And all the people I had fruitful discussions with, on the Emulation Development Discord.
- Added fetch and decode time details.
- Initial Publish