Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Screen corruption on some AGA demos #11

Closed
apolkosnik opened this issue Jul 15, 2019 · 50 comments
Closed

Screen corruption on some AGA demos #11

apolkosnik opened this issue Jul 15, 2019 · 50 comments

Comments

@apolkosnik
Copy link
Contributor

I was looking at Andromeda's Nexus7 AGA demo and the Goraud Pulse part has wrong colors, on MISTER, MIST, VAMPIRE (Gold3 Alpha release from lat year). On FS-UAE, it looks fine. YouTube link shows how it should look like... https://youtu.be/0Jdi3I3Ep6k?t=103

It looks like the code is using BFINS to render the doughnut. Looking at some UAE code for BFINS, I think that the BFINS implementation might be wrong in tg68.

bf_set2 <= std_logic_vector(unsigned(bf_ext_in & OP2out) sll to_integer(unsigned(bf_loffset(2 downto 0))));

Since, I wasn't able to set up a dev env to test my theory, I was only able to corrupt the textures some more by replacing BFINS with BFCLR and switching the register for width in HRTMON.
The loop was using a couple of BFINS that are referencing data registers for offset and width
e.g.
BFINS D3,($7000,A1){D4:D2}

I'm just guessing that the line in question should be probably something along the lines of:
bf_set2 <= std_logic_vector(unsigned(bf_ext_in & OP2out) sll to_integer(unsigned(bf_loff_dir)));

@sorgelig
Copy link
Member

sorgelig commented Jul 16, 2019

for me this demo doesn't work.
I run nexus7.exe in WB3.1. It shows titles in the beginning and when main demo is supposed to run, it just continue to spin the galaxy while music is playing. So cannot test myself. Does it requires something specific? I use CPU 68020, CHIP 2MB, FAST,SLOW: NO.

@sorgelig
Copy link
Member

I've forgot about demo and let it stay running. The demo started to show after about of 30mins!

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 16, 2019

I actually used an .adf, CPU 020, 2MB CHIP, 24MB FAST, TURBO BOTH, Floppy turbo ON. It works the best from the floppy. Another crazy bug happened when I copied it to HDD, running the demo from HDD caused it to play weird white noise at least with Kickstart 3.1.4. on Kickstart 3.1 it didn't even run for me from the HDD, but it booted ok from the floppy.

@sorgelig
Copy link
Member

can you give a link to floppy? pouet has only exe file.

@sorgelig
Copy link
Member

sorgelig commented Jul 16, 2019

as for BFxxx commands: as far as i see it doesn't check the direction of shift (when offset is pointed by Dx).
I don't have quick fix for this. Probably it has other BFxxx bugs. I'm not sure if bit field is wrapped or not, therefore it must be either ROL/ROR or SLL/SRL

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 16, 2019 via email

@nretro
Copy link
Contributor

nretro commented Jul 16, 2019

I did a quick test with 4 data registers. Negative offsets seem to work as expexted:
move.l #$aa,d0
moveq,l #0,d1
move.l #-16,d2
moveq.l #6,d3
bfins d0,d1{d2:d3}
gives
d1=$0000a800

I did some more tests and did not find a bug. Also
lea data,a0
....
bfins d0,(4,a0){d2,d3}
sets data to the same value as it probably should. Well, it was to be expected that it works most of the time. Maybe it fails when executed in some special state. Do you know the instruction before the bfins?

@apolkosnik
Copy link
Contributor Author

Would larger values that exceed 0xff in d3 make things go weird??

@sorgelig
Copy link
Member

what result will be with move.l #16,d2 ?

@sorgelig
Copy link
Member

btw, a800 is shifted 8 times, not 16 isn't?

@sorgelig
Copy link
Member

I still cannot run this demo!
Neither from adf nor as exe from WB. I only see rotating galaxy an hear the music. Nothing else.
What's wrong with this demo??

@nretro
Copy link
Contributor

nretro commented Jul 17, 2019

It is shifted 16 times to the left, which is what an offset of -16 should be doing. +16 should result in $00008a00 I guess. d3 is the width... so $ff would be an interesting value. I can try it later. My guess would be, that only the lowest 5 bit will be used.

BTW. If an indirect memory address is used, (4,A0){-16:6} actually writes to the byte at (2,A0).

@sorgelig
Copy link
Member

AA -> AA00 is 8 times shifted. 16 would give AA0000.

@nretro
Copy link
Contributor

nretro commented Jul 17, 2019

Can't test right now, but I think 0 times shifted would be $a8000000. Then 16 times rol produces $0000a800. (width is set to 6 not 8!).

@apolkosnik
Copy link
Contributor Author

I can share my minimig.cfg if that could help any to get it running.

@sorgelig
Copy link
Member

yeah, share please

@sorgelig
Copy link
Member

Can't test right now, but I think 0 times shifted would be $a8000000. Then 16 times rol produces $0000a800. (width is set to 6 not 8!).

width just cut off the least 2 bits. It doesn't affect amount of shift. 0 times shifted would be $a8000000? then i don't understand the logic...

@nretro
Copy link
Contributor

nretro commented Jul 17, 2019

I guess they had memory access in mind. if shift is zero, and width is 8 it should just write a byte to the given address... which corresponds to the highest byte in the register.

@sorgelig
Copy link
Member

So, result equals to real HW/UAE?

@apolkosnik
Copy link
Contributor Author

yeah, share please

minimig.zip

@apolkosnik
Copy link
Contributor Author

I've transcribed some of the code that uses BFINS, there's a few more places but this piece mostly shows up when I activate HRTMON

$400C3C22:
	MOVE.L D1,D2
	CLR.W D2
	SWAP
	MOVE.L D0,D4
	CLR.W D4
	SWAP D4
	SUB.W D4,D2
	BEQ.B $400C3C98
	MOVE.L D6,D3
	ANDI.l #$FFE00000,D3
	ADD.L D7,D3
	SWAP D3
	LEA (0,A0,D3.W*4),A6
	MOVE.L D2,D3
	ROR.W #5,D3
	ADDA.L D3,A6
	ADDA.L D3,A6
	MOVE.L (A6),D3
	BFINS D3,(-$7000,A1){D4:D2}
	ADDA.L #$20000,A6
	MOVE.L (A6),D3
	BFINS D3,(-$3800,A1){D4:D2}
	ADDA.L #$20000,A6
	ADD.L A4,D0
	MOVE.l (A6),D3
	BFINS D3,(A1){D4:D2}
	ADDA.L #$20000,A6
	ADD.L A5,D1
	ADD.L A2,D6
	ADD.L A3,D7
	MOVE.L (A6),D3
	BFINS D3,($3800,A1){D4:D2}
	ADDA.L #$20000,A6
	MOVE.L (A6),D3
	BFINS D3,($7000,A1){D4:D2}
	LEA ($38,A1),A1
	DBF D5,$400C3C22
$400C3C98:
	RTS

@apolkosnik
Copy link
Contributor Author

The same glitch was referenced in MIST repo rkrajnc/minimig-mist#67 .
The last set of tg68k fixes for Bitfield operations are contained in this commit rkrajnc/minimig-mist@0a73378

I've also found WF68k30L core for comparisent (at https://download.experiment-s.de/Configware/2K19A/rtl/vhdl.zip).

@nretro
Copy link
Contributor

nretro commented Jul 17, 2019

OK... just checked... these easy cases produce the same results on real hardware (A1200 with DCE Typhoon 030@40MHz). Interestingly, with offset=0 and width=$ff, the result is $00000154. I have no idea why that is. I'll have both computers run through a couple of random numbers and compare the results. But I would guess it only fails in some special cases.

@sorgelig
Copy link
Member

minimig.zip

it works, thanks!
It's the option Turbo which must be enabled at least for ChipRam

@sorgelig
Copy link
Member

I've also found WF68k30L core for comparisent (at https://download.experiment-s.de/Configware/2K19A/rtl/vhdl.zip).

good luck in porting that to Minimig :)

@apolkosnik
Copy link
Contributor Author

I got to figure out how to load frozen state under hrtmon, since I could load the demo on Mister and under FS-UAE and trace and compare.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 17, 2019

I've also found WF68k30L core for comparisent (at https://download.experiment-s.de/Configware/2K19A/rtl/vhdl.zip).

good luck in porting that to Minimig :)

I didn't ask to port it, just to compare the implementation.

//from WF68k30L core:

    RESULT_BITFIELD <= (BF_OFFSET + BFFFO_CNT) & x"00";
when BFINS =>
    for i in RESULT_BITFIELD'range loop
        if i <= BF_WIDTH - 1 and (BF_UPPER_BND - i) >= 0 then
            RESULT_BITFIELD(BF_UPPER_BND - i) <= OP1(BF_WIDTH - i - 1);
        end if;
    end loop;
when BFSET =>
    for i in RESULT_BITFIELD'range loop
        if i >= BF_LOWER_BND and i <= BF_UPPER_BND then
            RESULT_BITFIELD(i) <= '1';
        end if;
    end loop;

@nretro
Copy link
Contributor

nretro commented Jul 17, 2019

I just ran bfins 10000 times with random 32bit numbers on Mister and an A1200. At least when the destination is a register, the results always match. So it either only fails on rare cases or we've got a different problem. Or the problem arises only when the target is indexed indirect memory. OK, I can also check the latter.

@nretro
Copy link
Contributor

nretro commented Jul 17, 2019

It's the option Turbo which must be enabled at least for ChipRam

I don't see an easy reason for this to make a difference. Something to look into.

@sorgelig
Copy link
Member

I don't see an easy reason for this to make a difference. Something to look into.

As i see it only switches the access speed between ram (max) and chipset (7MHz). I also don't see a reason for such option. Probably it was a workaround against cache work as it didn't support the updates from chipset side.

I would rather to make a general CPU speed option like 7MHz, 14MHz and max for better compatibility with some specific games/apps.

@apolkosnik
Copy link
Contributor Author

I'm trying to use HRTMON to dump the memory (e.g. sa dh0:dump ) on Mister and try to load it on FS-UAE, but it looks like it just stays on the spinning galaxy screen on Mister without any fast mem. It runs just fine on FS-UAE without fast mem. The reason for trying to run without fast mem is that the hrtmon dump just dumps 2MB of chipmem; When I was loading after a reboot (e.g. la dh0:dump), it would load, but it wouldn't resume, as the PC would be in fastmem, and that part of fastmem would be zeroed out.

In the meantime, I've noticed something weird in the disassembly:

...
ADDA.L #20000,A6
LEA ($38,A1),A1
MOVE.L (A6),D5
DC.W $EFE9 ;weird
ADD.B D4,-(A2)
BLE.B $400C8C54
DBF D3,$400C8C1C

Not sure what $EFE9 is, and BLE jumps into a middle of BFINS, which starts at $400C8C52.

@apolkosnik
Copy link
Contributor Author

I was able to do 3 mem dumps (with 2MB chip mem only) on FS-UAE onto a .hdf, then I copied over the hdf to Mister SD card, and loaded the third one, that was taken when the Goraud Pulse was going on. After loading and running, the doughnut looked ok for a bit, and then it got the corrupted look. I'll try to step through and see where results change, but it might take a bit of time.

@nretro
Copy link
Contributor

nretro commented Jul 18, 2019

I did notice a short flicker (like invalid sprite settings) shortly before the torus. This was absent on my A1200. Together with the fact, that the problem also exists on Vampire, I would guess, that it is unlikely to be a straight forward CPU Problem and more likely to be chipset related.

And since the torus looks good when loaded differently, the problem looks more like some kind of side effect.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 18, 2019

I meant that the torus was already rendered on the screen when I dumped the memory, and after resuming the execution on MiSTer, it got corrupted as soon as it started moving. I'm also curious what might be causing the demo not to advance to the next parts. It was doing the same thing on Vampire (gold 3 alpha) unless I enabled the turtle mode. Though, when it runs on Vampire it shows even more glitches.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 19, 2019

Capture of properly rendered torus:
torus good
same as above, just I shifted left the 7th bitplane (from fs-uae) to show the correct shape on 7th bitplane:
torus good_7th+bpl_shifted_left
Corrupted 7th bitplane glitch(again, I shifted it to show that's only the 7th bitplane) (captured with a camera):
corrupted_bpl_shifted_left

After some poking around, it looks like only the data on the 7th bitplane gets mangled. All the artifacts should not be scattered around the torus, but should be around the area illuminated inside the torus.

@sorgelig
Copy link
Member

so probably only some edge conditions cause the problem.
if it's BFINS then need to test all edge cases to see for example if bitfield is properly wrapped/unwrapped, bits on the end not get trimmed or repeated and so on. There fore should it be arithmetic or logic shifts.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 20, 2019

I'm pretty confused now....It looks like just waiting for vertical-blank triggers the issue.
If I set a breakpoint to $18e38e:

$0018E386 BTST #0,($5,A6)
$0018E38C BEQ.B $18E386
$0018E38E BTST #0,($5,A6)
$0018E394 BNE.B $18E38E
$0018E396 BSR.W $192D8E

where A6 = $dff000, it breaks before the instruction, I type either "cop" or"p 2" to show the bitplaness, and all is good....
once I step through to the next instruction, the torus' 7th bitplane is corrupted.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 20, 2019

I'm guessing that it might be due to non-word aligned access to the custom regs...

I found this in winuaechangelog.txt

Beta 6 (final?)
- 68020+ word moves from non-word aligned custom registers fixed (for example move.w $dff005,d0 uuargh..)
  (fixes 4k AGA intro, Fuck the Pc! by Anorganic/Promise!) 

It looks like the UAE peeps made an useful comment in their code.

https://github.com/amigrave/PUAE/blob/c2f3278d6fd33da198c471a21a7109a43935c4be/src/custom.c#L5656

I ran that demo, and the screen looks super messy on Mister
It's available as an exe at: https://demozoo.org/productions/32792/

Edit:
After initiating HRTMon, and going back, the drawing of the lines gets fixed, and the demo looks as intended.

@apolkosnik apolkosnik changed the title BFINS possibly generating wrong values in tg68k Screen corruption on some AGA demos Jul 21, 2019
@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 22, 2019

I've patched the Nexus7 demo to jump to a different location to wait for vertical blank, where I did word/long reads to a register and tested the bit. The same issue happened, so I'm thinking that this could be some timing glitch in the chipset or something weird is happening in vblank interrupt.

@sorgelig
Copy link
Member

I'm affraid i cannot help with this issue.
You are welcome to provide fixes.
May be if you will find more specific place in chipset which is glitching, then may be i be able to fix.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 22, 2019

Sounds good. It's a learning experience for me, since I just recently started poking around, and I can already build Minimig.rbf to test stuff. It will be plenty of fun.

@nretro
Copy link
Contributor

nretro commented Jul 24, 2019

OK, looks like UAE reads both words from the custom registers when the CPU addresses an odd word. That should be doable in minimig as well. Maybe the odd word read behavior is documented in the 68020 manuals.

@sorgelig
Copy link
Member

sorgelig commented Jul 24, 2019

it can be troublesome as reading the RAM in HDL is not like reading variable in C++ ;)

P.S.: but if related only to registrers, then should be easier.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 24, 2019

As I mentioned earlier, I did rewwithe the loops with aligned read from $dff004 into a data register (d7), and I did btst.l d0, d7 (against the bit 8) and the screen corruption was occurring. Any idea if the $dff000 area is marked as non-cacheable?

I saw that there was an alignment fix for movem, but I couldn't figure out how that is supposed to work. At some point I might try to go through the whole thing and add comments.

@nretro
Copy link
Contributor

nretro commented Jul 24, 2019

The cache sits directly at the ram interface. By design, it can only cache ram access. So custom registers can't be cached.

@nretro
Copy link
Contributor

nretro commented Jul 24, 2019

calling
move.w $dff005,$dff180
repeatedly, produces the same pattern on MiSTer and an A1200.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Jul 25, 2019

Sometime next week, I'll post the memory dump from HRTMon along with the instructions to reproduce the issue, so that more people would be able to test on their own either in UAE, real hardware, minimig cores or whatever that runs HRTMon.

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Aug 3, 2019

I've attached a mem dump from HRTmon. Initially I've made it in FS-UAE, then loaded it onto both FS-UAE and MiSTer, to compare execution.
To load it, unzip the attachment, place it onto a harddrive (HDF) in your MiSTer, then boot the Minimig core
F9 activates the HRTmon on Minimig core, FS-UAE uses ALT-C on Ubuntu,
To load the memory use
la dh0:nx7_4
Then you can trace the execution with: t <# of steps>
In my setup, "t 3210", usually gets me a good torus (use "cop" or "p 2", with p command you need to hit Esc to get back to the HRTmon command line). Then, I can step to the next instruction with "t 1", and it's corrupted.
If I load the memory and go for "t 3220" the torus is already corrupted oftentimes. It looks like when the trace ends on $0018E38E BTST #0, (5,A6) ;$00DFF005, it still looks good, if the trace ended on something else, the torus is already corrupted.
Interestingly, after "t 1" from a uncorrupted torus, you end up with corruption manifesting itself.

nx7_4.zip

@apolkosnik
Copy link
Contributor Author

apolkosnik commented Aug 3, 2019

Interesting thing to note, on FS-UAE, I've set uae_immediate_blits = 1 , and repeated the process above... it looks like the corruption showed up (see pic below), when going with "cop" or "p2" commands, but if I let it run with "g", it disappears.
Update, that actually happens even without the immediate blits enabled. Now, I'm feeling really confused.
Screenshot from 2019-08-03 11-29-11

apolkosnik added a commit to apolkosnik/Minimig-AGA_MiSTer that referenced this issue Oct 17, 2019
sorgelig added a commit that referenced this issue Oct 17, 2019
Fixes to the tg68k, fixes Issue #11. Goraud Pulse in Nexus7 fixed.
@apolkosnik
Copy link
Contributor Author

The #23 fixed Nexus7 and the other demo mentioned in this issue. I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants