Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange freezes when punching self in the face in ROOM.WAD #1076

Closed
SoDOOManiac opened this issue Jun 23, 2023 · 54 comments
Closed

Strange freezes when punching self in the face in ROOM.WAD #1076

SoDOOManiac opened this issue Jun 23, 2023 · 54 comments

Comments

@SoDOOManiac
Copy link
Collaborator

SoDOOManiac commented Jun 23, 2023

Background

Version of Crispy Doom: Chocolate Doom or any of its forks

Operating System and version: Windows 7 or 10

Game: Doom 2

Any loaded WADs and mods (please include full command line): ROOM https://www.moddb.com/mods/room/downloads/room-version-2
crispy-doom -iwad doom2.wad -merge room.wad -deh room.deh
more convenient for testing: so-doom -iwad doom2.wad -merge room.wad -deh room.deh
(in So Doom you can use HPS cheat to restore health, not sure if the bug is reproducible with IDDQD)

Bug description

Observed behavior: game sometimes freezes when Doomguy punches himself in the face.
@JNechaevsky says gdb doesn't throw any error, probably an infinite loop occurs.

@fabiangreffrath
Copy link
Owner

Doesn't the PWAD have an embedded DEHACKED lump?

@SoDOOManiac
Copy link
Collaborator Author

Doesn't the PWAD have an embedded DEHACKED lump?

No, the dehacked patch is a separate file. ROOM is from 2000, it was a common practice back then.

@turol
Copy link
Collaborator

turol commented Jun 26, 2023

Unable to reproduce on Linux.

Does this happen on Chocolate Doom?

You typoed the command line, missing the dot in what is supposed to be "room.wad"

Running under Valgrind with memory pooling disabled gives the following:

Invalid read of size 4
   at 0x14632F: DEH_LoadFile (deh_main.c:456)
   by 0x6: ???
   by 0x146526: DEH_ParseCommandLine (deh_main.c:565)
   by 0x148FCB: D_DoomMain (d_main.c:1800)
   by 0x12601E: main (i_main.c:87)
 Address 0x67f653c is 108 bytes inside a block of size 120 free'd
   at 0x484317B: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x14632B: DEH_LoadFile (deh_main.c:454)
   by 0x146526: DEH_ParseCommandLine (deh_main.c:565)
   by 0x148FCB: D_DoomMain (d_main.c:1800)
   by 0x12601E: main (i_main.c:87)
 Block was alloc'd at
   at 0x48407B4: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x145301: Z_Malloc (z_native.c:262)
   by 0x1456AB: DEH_NewContext (deh_io.c:72)
   by 0x1456AB: DEH_OpenFile (deh_io.c:100)
   by 0x14630B: DEH_LoadFile (deh_main.c:442)
   by 0x146526: DEH_ParseCommandLine (deh_main.c:565)
   by 0x148FCB: D_DoomMain (d_main.c:1800)
   by 0x12601E: main (i_main.c:87)

This happens during load time. Pretty clearly a real positive, the context is accessed after freeing. Also code is identical in Chocolate doom, someone (else 😄) should go and fix that. But this is not our culprit for this bug.

Invalid read of size 1
   at 0x12773D: M_StringCopy (m_misc.c:858)
   by 0x12773D: M_StringCopy (m_misc.c:843)
   by 0x172705: R_InitTextures (r_data.c:812)
   by 0x17337E: R_InitData (r_data.c:1261)
   by 0x175EB8: R_Init (r_main.c:960)
   by 0x1499BF: D_DoomMain (d_main.c:2327)
   by 0x12601E: main (i_main.c:87)
 Address 0x6962b54 is 0 bytes after a block of size 3,796 alloc'd
   at 0x48407B4: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x145301: Z_Malloc (z_native.c:262)
   by 0x143BBB: W_CacheLumpNum (w_wad.c:435)
   by 0x172594: R_InitTextures (r_data.c:763)
   by 0x17337E: R_InitData (r_data.c:1261)
   by 0x175EB8: R_Init (r_main.c:960)
   by 0x1499BF: D_DoomMain (d_main.c:2327)
   by 0x12601E: main (i_main.c:87)

Not sure what this is.

Invalid read of size 4
   at 0x171651: P_DeathThink (p_user.c:240)
   by 0x17128F: P_Ticker (p_tick.c:149)
   by 0x152094: G_Ticker (g_game.c:1372)
   by 0x12A01C: TryRunTics (d_loop.c:816)
   by 0x1482B1: D_RunFrame (d_main.c:540)
   by 0x148406: D_DoomLoop (d_main.c:610)
   by 0x149D8C: D_DoomMain (d_main.c:2405)
   by 0x12601E: main (i_main.c:87)
 Address 0x10ab85e4 is 68 bytes inside a block of size 288 free'd
   at 0x484317B: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x1711E9: P_RunThinkers (p_tick.c:108)
   by 0x171242: P_Ticker (p_tick.c:151)
   by 0x152094: G_Ticker (g_game.c:1372)
   by 0x12A01C: TryRunTics (d_loop.c:816)
   by 0x1482B1: D_RunFrame (d_main.c:540)
   by 0x148406: D_DoomLoop (d_main.c:610)
   by 0x149D8C: D_DoomMain (d_main.c:2405)
   by 0x12601E: main (i_main.c:87)
 Block was alloc'd at
   at 0x48407B4: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x145301: Z_Malloc (z_native.c:262)
   by 0x1621F5: P_SpawnMobjSafe (p_mobj.c:654)
   by 0x163BE9: P_SpawnMobj (p_mobj.c:746)
   by 0x163BE9: P_SpawnPlayerMissile (p_mobj.c:1316)
   by 0x163DAA: P_SetPsprite (p_pspr.c:109)
   by 0x16505C: P_MovePsprites (p_pspr.c:988)
   by 0x1718E2: P_PlayerThink (p_user.c:453)
   by 0x17128F: P_Ticker (p_tick.c:149)
   by 0x152094: G_Ticker (g_game.c:1372)
   by 0x12A01C: TryRunTics (d_loop.c:816)
   by 0x1482B1: D_RunFrame (d_main.c:540)
   by 0x148406: D_DoomLoop (d_main.c:610)

Invalid read of size 4
   at 0x171654: P_DeathThink (p_user.c:240)
   by 0x17128F: P_Ticker (p_tick.c:149)
   by 0x152094: G_Ticker (g_game.c:1372)
   by 0x12A01C: TryRunTics (d_loop.c:816)
   by 0x1482B1: D_RunFrame (d_main.c:540)
   by 0x148406: D_DoomLoop (d_main.c:610)
   by 0x149D8C: D_DoomMain (d_main.c:2405)
   by 0x12601E: main (i_main.c:87)
 Address 0x10ab85e0 is 64 bytes inside a block of size 288 free'd
   at 0x484317B: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x1711E9: P_RunThinkers (p_tick.c:108)
   by 0x171242: P_Ticker (p_tick.c:151)
   by 0x152094: G_Ticker (g_game.c:1372)
   by 0x12A01C: TryRunTics (d_loop.c:816)
   by 0x1482B1: D_RunFrame (d_main.c:540)
   by 0x148406: D_DoomLoop (d_main.c:610)
   by 0x149D8C: D_DoomMain (d_main.c:2405)
   by 0x12601E: main (i_main.c:87)
 Block was alloc'd at
   at 0x48407B4: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x145301: Z_Malloc (z_native.c:262)
   by 0x1621F5: P_SpawnMobjSafe (p_mobj.c:654)
   by 0x163BE9: P_SpawnMobj (p_mobj.c:746)
   by 0x163BE9: P_SpawnPlayerMissile (p_mobj.c:1316)
   by 0x163DAA: P_SetPsprite (p_pspr.c:109)
   by 0x16505C: P_MovePsprites (p_pspr.c:988)
   by 0x1718E2: P_PlayerThink (p_user.c:453)
   by 0x17128F: P_Ticker (p_tick.c:149)
   by 0x152094: G_Ticker (g_game.c:1372)
   by 0x12A01C: TryRunTics (d_loop.c:816)
   by 0x1482B1: D_RunFrame (d_main.c:540)
   by 0x148406: D_DoomLoop (d_main.c:610)

Now these look interesting. There's some kind of issue when the player dies. In normal mode with memory pooling enabled this would not crash or show up in a normal debugger. I don't see how this code could result in an infinite loop but this same issue could hit some other code and cause all kinds of funky bugs.

@turol
Copy link
Collaborator

turol commented Jun 26, 2023

Also is the issue more likely on certain levels? I only tried MAP01 under Valgrind. I think that one has no monsters so no other thinkers than the player. That might be a necessary condition to trigger the freeze.

@fabiangreffrath
Copy link
Owner

@turol Thank for investigating this! Interestingly, MBF does not remove mobj thinkers immediately [1], but sets their function pointer to a delayed removal function [2]. Maybe I should adapt that same approach for Crispy. But, anyway, it would be nice to have a reproducible test case for that.

[1] https://github.com/fabiangreffrath/woof/blob/c86904de68642c81c2cc7b61482fe200bf0b481e/src/p_tick.c#L169
[2] https://github.com/fabiangreffrath/woof/blob/c86904de68642c81c2cc7b61482fe200bf0b481e/src/p_tick.c#L141-L153

@SoDOOManiac
Copy link
Collaborator Author

SoDOOManiac commented Jun 27, 2023

Also is the issue more likely on certain levels? I only tried MAP01 under Valgrind. I think that one has no monsters so no other thinkers than the player. That might be a necessary condition to trigger the freeze.

I suppose that the maps MAP05-MAP07 with multiple monsters are the most likely ones where the freeze will happen, but I encountered the freeze in MAP03 with a single monster as well, not sure if that zombieman must be alerted to reproduce this or not.

Yes, the freeze happens in Chocolate Doom, but I suggest using So Doom for testing as there you can recover health and armor with cheat codes and punch yourself infinitely until the freeze happens.

you typoed in the command line

Fixed that, thanks!

@turol
Copy link
Collaborator

turol commented Jun 27, 2023

If this happens in Chocolate then it should be reported (and hopefully fixed) there. But then it's also likely that it would also happen in vanilla.

Ran a bunch of test in Chocolate and also got these:

Invalid read of size 1
   at 0x1595E8: R_DrawColumn (r_draw.c:139)
   by 0x15CC7B: R_DrawMaskedColumn (r_things.c:374)
   by 0x15CD75: R_DrawVisSprite (r_things.c:431)
   by 0x15D6F6: R_DrawSprite (r_things.c:942)
   by 0x15D917: R_DrawMasked (r_things.c:966)
   by 0x15AEB3: R_RenderPlayerView (r_main.c:886)
   by 0x13D1CE: D_Display (d_main.c:233)
   by 0x13D534: D_RunFrame (d_main.c:443)
   by 0x13D616: D_DoomLoop (d_main.c:496)
   by 0x13E8FD: D_DoomMain (d_main.c:1998)
   by 0x11DEB6: main (i_main.c:77)
 Address 0x1ce17fbd is 19 bytes before a block of size 380 alloc'd
   at 0x48407B4: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x13AE51: Z_Malloc (z_native.c:262)
   by 0x1399DB: W_CacheLumpNum (w_wad.c:418)
   by 0x159049: R_InitSpriteLumps (r_data.c:665)
   by 0x159106: R_InitData (r_data.c:701)
   by 0x15AC88: R_Init (r_main.c:768)
   by 0x13E707: D_DoomMain (d_main.c:1921)
   by 0x11DEB6: main (i_main.c:77)

Invalid read of size 4
   at 0x15F888: ST_updateFaceWidget (st_stuff.c:776)
   by 0x15FC07: ST_updateWidgets (st_stuff.c:924)
   by 0x15FC9A: ST_Ticker (st_stuff.c:955)
   by 0x1438BC: G_Ticker (g_game.c:1013)
   by 0x121D7C: TryRunTics (d_loop.c:784)
   by 0x13D4F1: D_RunFrame (d_main.c:436)
   by 0x13D616: D_DoomLoop (d_main.c:496)
   by 0x13E8FD: D_DoomMain (d_main.c:1998)
   by 0x11DEB6: main (i_main.c:77)
 Address 0x1ebaafe4 is 68 bytes inside a block of size 264 free'd
   at 0x484317B: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x157D79: P_RunThinkers (p_tick.c:107)
   by 0x157DD2: P_Ticker (p_tick.c:147)
   by 0x1438B7: G_Ticker (g_game.c:1012)
   by 0x121D7C: TryRunTics (d_loop.c:784)
   by 0x13D4F1: D_RunFrame (d_main.c:436)
   by 0x13D616: D_DoomLoop (d_main.c:496)
   by 0x13E8FD: D_DoomMain (d_main.c:1998)
   by 0x11DEB6: main (i_main.c:77)
 Block was alloc'd at
   at 0x48407B4: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x13AE51: Z_Malloc (z_native.c:262)
   by 0x14D6C0: P_SpawnMobj (p_mobj.c:528)
   by 0x14E9D8: P_SpawnPlayerMissile (p_mobj.c:1053)
   by 0x14EAFF: P_SetPsprite (p_pspr.c:84)
   by 0x14F98D: P_MovePsprites (p_pspr.c:878)
   by 0x158213: P_PlayerThink (p_user.c:327)
   by 0x157E27: P_Ticker (p_tick.c:145)
   by 0x1438B7: G_Ticker (g_game.c:1012)
   by 0x121D7C: TryRunTics (d_loop.c:784)
   by 0x13D4F1: D_RunFrame (d_main.c:436)
   by 0x13D616: D_DoomLoop (d_main.c:496)

The first one looks like something that could result in infinite loop but in this case it didn't. These did not happen reliably. What we need now is a reproducible test case, preferably a demo whose playback freezes.

@fabiangreffrath
Copy link
Owner

The second one is suspicious: We are reading data in ST_updateFaceWidget() -- which can only be one of the Doomguy face graphics -- which was allocated in P_SpawnMobj() called from P_SpawnPlayerMissile() before. So, is there a weapon in this mod which shoots player faces or something like this?

@mikeday0
Copy link
Collaborator

I have nothing technical to add, but I did notice this in the wad's txt file:

Known Bugs: currently you can't exit MAP08... rare crashing with the fist, impossible to exit levels on the baby skill

@SoDOOManiac
Copy link
Collaborator Author

If this happens in Chocolate then it should be reported (and hopefully fixed) there. But then it's also likely that it would also happen in vanilla.

That's why I reported this in Crispy tracker, just in case it's present in vanilla and is to be preserved in Choco according to philosophy.

I have nothing technical to add, but I did notice this in the wad's txt file:

The WAD author knew about this, couldn't fix but at least let us know.

@SoDOOManiac
Copy link
Collaborator Author

SoDOOManiac commented Jul 8, 2023

I set A_VileAttack damage to be 0 in the code to enable unlimited self-punching and reproduced freezing in MAP03 a couple of times while recording the demo, but alas, in case of the freeze no demo file is saved.

@turol
Copy link
Collaborator

turol commented Jul 8, 2023

If you can reliably reproduce this can you get a backtrace from the hang?

Can we figure out a way to flush the demo file from this state?

@SoDOOManiac
Copy link
Collaborator Author

I will try to get some output from GDB if the debug build from Ninja freezes.

@SoDOOManiac
Copy link
Collaborator Author

SoDOOManiac commented Jul 8, 2023

Ninja debug build won't show anything in GDB upon freeze. How can I get the demo file out? It isn't saved after game freezes.
Catching the freeze should be the easiest on MAP03 after alerting the zombieman in the secret area,

also make sure to zero the direct damage from arch-vile attack in p_enemy.c, void A_VileAttack (mobj_t* actor):
P_DamageMobj (actor->target, actor, actor, 0);

Desktop Screenshot 2023 07 08 - 19 48 35 82

@SoDOOManiac
Copy link
Collaborator Author

The freeze occurs in PUNGC state which has FireBFG action assigned, and BFG projectile's death (explosion) state is BFE1C which has VileAttack action assigned. A_VileAttack() does damage twice, only first damage infliction (of 20) is applied to the player.

@turol
Copy link
Collaborator

turol commented Jul 10, 2023

You should be able to force writing the demo by doing this in GDB:

call G_CheckDemoStatus()

This also terminates the game, just like if you pressed q to terminate recording. This might behave badly since it's probably in some completely unpredictable state.

Press Ctrl+C in gdb window to breakpoint if the game has frozen. Or we could add some way to sync write the demo but that would require some surgery of that code.

Can you get the backtrace? I really want to see where exactly the infinite loop is happening.
A_VileAttack() calls P_RadiusAttack() which will call P_BlockThingsIterator(). I'm interested if it hangs there or somewhere else.

@SoDOOManiac
Copy link
Collaborator Author

SoDOOManiac commented Jul 10, 2023

You should be able to force writing the demo by doing this in GDB:

call G_CheckDemoStatus()

This also terminates the game, just like if you pressed q to terminate recording. This might behave badly since it's probably in some completely unpredictable state.

Press Ctrl+C in gdb window to breakpoint if the game has frozen.

When the game freezes under GDB (I managed to freeze te game in Choco this time), I cannot either type or paste anything into the MSYS2 terminal where GDB is running :(

Other than that, it looks like that in
`void A_VileAttack (mobj_t* actor)
{
mobj_t* fire;
int an;

if (!actor->target)
return;

A_FaceTarget (actor);

if (!P_CheckSight (actor, actor->target) )
return;

S_StartSound (actor, sfx_barexp);
P_DamageMobj (actor->target, actor, actor, 0);
actor->target->momz = 1000*FRACUNIT/actor->target->info->mass;

an = actor->angle >> ANGLETOFINESHIFT;

fire = actor->tracer;

if (!fire)
return;
	
// move the fire between the vile and the player
fire->x = actor->target->x - FixedMul (24*FRACUNIT, finecosine[an]);
fire->y = actor->target->y - FixedMul (24*FRACUNIT, finesine[an]);	
P_RadiusAttack (fire, actor, 70 );

}`

P_RadiusAttack isn't called because there is no pointer to fire:
if (!fire) return;

@turol
Copy link
Collaborator

turol commented Jul 10, 2023

Does Ctrl+C work while the game is still running normally? It works on Linux...

You might have to install either WSL (not sure if it supports graphics though) or a Linux VM.

@fabiangreffrath Any thoughts on how to make the game flush the demo synchronously?

@SoDOOManiac
Copy link
Collaborator Author

Does Ctrl+C work while the game is still running normally? It works on Linux...

On my Win10 system even when the game is running normally, MSYS2 console doesn't respond to any typing or pasting.

@turol
Copy link
Collaborator

turol commented Jul 10, 2023

Try these: https://stackoverflow.com/questions/711086/in-gdb-on-mingw-how-do-i-make-ctrl-c-stop-the-program

@SoDOOManiac
Copy link
Collaborator Author

@fabiangreffrath
Copy link
Owner

Got it:

Thread 1 "crispy-doom" received signal SIGINT, Interrupt.
0x00005555555abbf6 in PIT_CheckThing (thing=0x7ffff4c0ac40) at p_map.c:296
296	    if (!(thing->flags & (MF_SOLID|MF_SPECIAL|MF_SHOOTABLE) ))
(gdb) bt
#0  0x00005555555abbf6 in PIT_CheckThing (thing=0x7ffff4c0ac40) at p_map.c:296
#1  0x00005555555adeee in P_BlockThingsIterator
    (x=x@entry=1, y=y@entry=8, func=func@entry=0x5555555abbf0 <PIT_CheckThing>)
    at p_maputl.c:529
#2  0x00005555555acab4 in P_CheckPosition
    (thing=thing@entry=0x7ffff4be1cb8, x=x@entry=-3757650, y=y@entry=-35771695)
    at p_map.c:540
#3  0x00005555555acb8f in P_TryMove
    (thing=thing@entry=0x7ffff4be1cb8, x=-3757650, y=-35771695) at p_map.c:576
#4  0x00005555555aee6f in P_XYMovement (mo=mo@entry=0x7ffff4be1cb8)
    at p_mobj.c:214
#5  0x00005555555af388 in P_XYMovement (mo=0x7ffff4be1cb8) at p_mobj.c:170
#6  P_MobjThinker (mobj=0x7ffff4be1cb8) at p_mobj.c:585
#7  0x00005555555bd54b in P_RunThinkers () at p_tick.c:113
#8  0x00005555555bd5d3 in P_Ticker () at p_tick.c:151
#9  0x000055555559e415 in G_Ticker () at g_game.c:1372
#10 0x000055555557601d in TryRunTics () at d_loop.c:816
#11 0x0000555555594632 in D_RunFrame () at d_main.c:540
#12 0x0000555555594787 in D_DoomLoop () at d_main.c:610
#13 0x000055555559610d in D_DoomMain () at d_main.c:2405
#14 0x000055555557201f in main (argc=<optimized out>, argv=0x7fffffffdf18)
    at i_main.c:87
(gdb) n
P_BlockThingsIterator (x=x@entry=1, y=y@entry=8, func=func@entry=0x5555555abbf0 <PIT_CheckThing>)
    at p_maputl.c:527
527		 mobj = mobj->bnext)
(gdb) n
526		 mobj ;
(gdb) n
529		if (!func( mobj ) )
(gdb) n
527		 mobj = mobj->bnext)
(gdb) n
526		 mobj ;
(gdb) n
529		if (!func( mobj ) )
(gdb) n
527		 mobj = mobj->bnext)
(gdb) n
526		 mobj ;
(gdb) n
529		if (!func( mobj ) )
(gdb) n
527		 mobj = mobj->bnext)

@turol
Copy link
Collaborator

turol commented Jul 11, 2023

Did you dump the demo from that? I whipped this up, can you give it a try:

diff --git a/src/doom/p_maputl.c b/src/doom/p_maputl.c
index 332fc5544..70ffafe40 100644
--- a/src/doom/p_maputl.c
+++ b/src/doom/p_maputl.c
@@ -512,6 +512,7 @@ P_BlockThingsIterator
   boolean(*func)(mobj_t*) )
 {
     mobj_t*		mobj;
+    mobj_t *fast = NULL;
 	
     if ( x<0
 	 || y<0
@@ -522,12 +523,34 @@ P_BlockThingsIterator
     }
     
 
-    for (mobj = blocklinks[y*bmapwidth+x] ;
-	 mobj ;
-	 mobj = mobj->bnext)
+    mobj = blocklinks[y*bmapwidth+x] ;
+    if (mobj) {
+        fast = mobj->bnext;
+    }
+    for (;
+         ;
+         )
     {
-	if (!func( mobj ) )
-	    return false;
+        if (!mobj) {
+            break;
+        }
+
+        if (mobj == fast) {
+            printf("error infinite loop");
+            abort();
+        }
+
+        if (!func( mobj ) )
+            return false;
+
+        mobj = mobj->bnext;
+        if (fast) {
+            fast = fast->bnext;
+            if (fast) {
+                fast = fast->bnext;
+            }
+
+        }
     }
     return true;
 }

Still doesn't seem to hang/abort for me reliably.

@SoDOOManiac
Copy link
Collaborator Author

SoDOOManiac commented Aug 1, 2023

Did you dump the demo from that? I whipped this up, can you give it a try: ...
Still doesn't seem to hang/abort for me reliably.

@turol , what does this code do? Is it meant to fix the freezes printing to the console every time a freeze is prevented?

Sorry for the closing/reopening, I accidentally clicked the button on the phone screen.

@turol
Copy link
Collaborator

turol commented Aug 1, 2023

It aborts if the loop is infinite instead of hanging. There's a better version merged now. Enable DEBUG_LINKED_LISTS define in src/m_misc.h and check if it still hangs.

Also someone please dump the demo if you can reproduce this.

@SoDOOManiac
Copy link
Collaborator Author

It aborts if the loop is infinite instead of hanging. There's a better version merged now. Enable DEBUG_LINKED_LISTS define in src/m_misc.h and check if it still hangs.

Is #define DEBUG_LINKED_LISTS 1 enough to enable debug?

Also someone please dump the demo if you can reproduce this.

Does it work the way that if debug is enabled, then the game aborts and the demo file is saved?

@turol
Copy link
Collaborator

turol commented Aug 1, 2023

Is #define DEBUG_LINKED_LISTS 1 enough to enable debug?

Yes

Does it work the way that if debug is enabled, then the game aborts and the demo file is saved?

No, it just aborts. There's a call command somewhere above which can be used to dump the demo from a debugger. Or you could edit the code yourself to call that function before aborting.

@SoDOOManiac
Copy link
Collaborator Author

I have recorded a freezing demo. To play it back properly, check out this branch.
In that branch the first part of Arch-Vile attack damage is set to 0 instead of the original 20 to let me punch myself in the face infinitely until the game freezes.

Here's the demo, ROOM.WAD, ROOM.DEH and the batch file containing the command line to play the demo back:
ROOM_freezing_demo.zip

@turol
Copy link
Collaborator

turol commented Aug 2, 2023

It reproduces. Using timedemo is faster. With Valgrind and memory pooling disabled:

https://gist.github.com/turol/a4c48d3866b13bb6fb447958b909a6a6

Lots of problems. Not sure which one it is or why but definitely use after free. Could be modify-while-iterating bug.

Also still kinda slow even with timedemo under Valgrind. A shorter demo would still be useful.

You could add more calls of that debug macro to places mentioned in those traces. This should help catch it as close to the cause as possible.

@SoDOOManiac
Copy link
Collaborator Author

Also still kinda slow even with timedemo under Valgrind. A shorter demo would still be useful.

I managed to record a much shorter demo
ROOMdemo3_faster.zip

The location and orientation where I catch the infinite loop most easily, just in case, may have to do with the room corner and the ceiling lamp sector border
DOOM0000

Feel free to add the loop debugging code everywhere you deem necessary in this branch (where I added the demo dumping line to it),
I'm burning out at work and feel uncapable to do that in the near future.

@fabiangreffrath
Copy link
Owner

Does this fix it? (Sorry, can't test myself right now)

--- a/src/doom/p_tick.c
+++ b/src/doom/p_tick.c
@@ -113,6 +113,10 @@ void P_RunThinkers (void)
                currentthinker->function.acp1 (currentthinker);
             nextthinker = currentthinker->next;
        }
+
+       if (nextthinker == currentthinker)
+           break;
+
        currentthinker = nextthinker;
     }

@SoDOOManiac
Copy link
Collaborator Author

Does this fix it?

Unfortunately not, the game still freezes.

@fabiangreffrath
Copy link
Owner

I now suspect that the culprit for the loop lies in P_UnsetThingPosition(), which flips around the roles of mobj and mobj->next in P_BlockThingsIterator() so that after one iteration through the loop the object pointed to by mobj->next is now mobj and its mobj->next is pointing to the object that formerly pointed to itself.

https://github.com/fabiangreffrath/crispy-doom/blob/master/src/doom/p_maputl.c#L344

@SoDOOManiac
Copy link
Collaborator Author

Great to hear that you have an idea! I'll gladly test any patches that you come up with.

@fabiangreffrath
Copy link
Owner

This seems to do the trick for me:

--- a/src/doom/p_maputl.c
+++ b/src/doom/p_maputl.c
@@ -513,6 +513,7 @@ P_BlockThingsIterator
   boolean(*func)(mobj_t*) )
 {
     mobj_t*             mobj;
+    mobj_t*             first;
 
     if ( x<0
          || y<0
@@ -524,12 +525,14 @@ P_BlockThingsIterator
 
     LINKED_LIST_CHECK_NO_CYCLE(mobj_t, blocklinks[y*bmapwidth+x], bnext);
 
-    for (mobj = blocklinks[y*bmapwidth+x] ;
+    for (mobj = blocklinks[y*bmapwidth+x], first = mobj ;
          mobj ;
          mobj = mobj->bnext)
     {
         if (!func( mobj ) )
             return false;
+        if (mobj->bnext == first)
+            break;
     }
     return true;
 }

@fabiangreffrath
Copy link
Owner

Nope, desyncs. But the idea is there...

@fabiangreffrath
Copy link
Owner

Seems to be working better in this variant:

diff --git a/src/doom/p_maputl.c b/src/doom/p_maputl.c
index ffccb8f5..83db1ac4 100644
--- a/src/doom/p_maputl.c
+++ b/src/doom/p_maputl.c
@@ -513,6 +513,7 @@ P_BlockThingsIterator
   boolean(*func)(mobj_t*) )
 {
     mobj_t*             mobj;
+    mobj_t*             first;
 
     if ( x<0
          || y<0
@@ -524,10 +525,12 @@ P_BlockThingsIterator
 
     LINKED_LIST_CHECK_NO_CYCLE(mobj_t, blocklinks[y*bmapwidth+x], bnext);
 
-    for (mobj = blocklinks[y*bmapwidth+x] ;
+    for (mobj = blocklinks[y*bmapwidth+x], first = mobj ;
          mobj ;
          mobj = mobj->bnext)
     {
+        if (mobj->bnext == first)
+            break;
         if (!func( mobj ) )
             return false;
     }

@turol
Copy link
Collaborator

turol commented Oct 20, 2023

I don't think break is correct here. Vanilla hangs so for compatibility and to prevent creating demos which hang we'd need to error out here.

@SoDOOManiac
Copy link
Collaborator Author

SoDOOManiac commented Oct 20, 2023

I don't think break is correct here. Vanilla hangs so for compatibility and to prevent creating demos which hang we'd need to error out here.

If I understand correctly, in this case we don't attempt to continue playing in the situation that occurs in my demo in ROOM, right?

Is it possible to continue playing in the situation that leads to freeze in Vanilla/Choco?

@fabiangreffrath
Copy link
Owner

I think that's one of the cases where the solution will differ a bit for Chocolate and Crispy. Other ports, e.g. MBF, won't hang here as well.

@turol
Copy link
Collaborator

turol commented Oct 20, 2023

If I understand correctly, in this case we don't attempt to continue playing in the situation that occurs in my demo in ROOM, right?

Yes

Is it possible to continue playing in the situation that leads to freeze in Vanilla/Choco?

Not in a way that's vanilla compatible.

@SoDOOManiac
Copy link
Collaborator Author

SoDOOManiac commented Oct 20, 2023

Not in a way that's vanilla compatible.

Should we stick to Vanilla compatibility in this specific case?

I and people who appreciate this kind if humor would love to be able to play ROOM, recording demos or not, in Crispy without freezes. Currently ROOM can cause a freeze even not recording a demo.

@fabiangreffrath
Copy link
Owner

What we mean is: Chocolate will probably get a "fix" that will detect the infinite loop and error out (Vanilla would loop forever). Crispy will get a slightly different fix that will break the loop and continue. No compatibility lost.

Actually, there are thousands of Vanilla complevel demos out there recorded with e.g. PrBoom+ that would cause Vanilla to crash, or hang or do whatever strange stuff that simply does not trigger in a limit-revoming port.

@turol
Copy link
Collaborator

turol commented Oct 20, 2023

Actually, there are thousands of Vanilla complevel demos out there recorded with e.g. PrBoom+ that would cause Vanilla to crash, or hang or do whatever strange stuff that simply does not trigger in a limit-revoming port.

Are there? I think triggering this requires some very strange DeHackery and there are very few wads which do that.

@fabiangreffrath
Copy link
Owner

I did not mean crash or hang by this specific bug, but there are dozens of ways to kill the Vanilla engine that ports starting with Boom circumvent while remaining demo compatible. This specific issue for example is unable to trigger in MBF (and thus PrBoom+), yet the ports remains Vanilla compatible.

@SoDOOManiac
Copy link
Collaborator Author

What we mean is: Chocolate will probably get a "fix" that will detect the infinite loop and error out (Vanilla would loop forever). Crispy will get a slightly different fix that will break the loop and continue. No compatibility lost.

I hope time for this comes soon :)

@SoDOOManiac
Copy link
Collaborator Author

Seems to be working better in this variant:

@fabiangreffrath, I hope as the major hard-to-apply changes like Sigil 2 support have been implemented, maybe it's time for this?

@fabiangreffrath
Copy link
Owner

@fabiangreffrath, I hope as the major hard-to-apply changes like Sigil 2 support have been implemented, maybe it's time for this?

Well, one does not have anything to do with the other.

The problem with the patch here is that it does not fix the underlying bug, but merely puts band-aid over its symptoms. However, there is a discussion currently going on in the DW forums about some code glitch which might be the reason for this bug, so there are chances that this may get fixed in a sane way.

@SoDOOManiac
Copy link
Collaborator Author

there is a discussion currently going on in the DW forums about some code glitch which might be the reason for this bug

Could you please point me to it?

@fabiangreffrath
Copy link
Owner

Sure, the issue is discussed in detail here:
https://www.doomworld.com/forum/topic/142646-p_checkmissilespawn-bug/

In short words, P_CheckMissileSpawn() moves the missile object "a little bit forward so an angle can be computed", but this "little bit" may be sufficient to move the missile into the next block in the blockmap. However, it is never unlinked from the current block and linked into the new one, so the blockmap is corrupted and the infinite loop occurs.

A possible fix might be something like this, but "unfortunately" we are not allowed to change Vanilla Doom's game logic and so any other demo will desync with this change:

--- a/src/doom/p_mobj.c
+++ b/src/doom/p_mobj.c
@@ -931,9 +931,11 @@ void P_CheckMissileSpawn (mobj_t* th)
     
     // move a little forward so an angle can
     // be computed if it immediately explodes
+    P_UnsetThingPosition (th);
     th->x += (th->momx>>1);
     th->y += (th->momy>>1);
     th->z += (th->momz>>1);
+    P_SetThingPosition (th);
 
     if (!P_TryMove (th, th->x, th->y))
        P_ExplodeMissile (th);

@SoDOOManiac
Copy link
Collaborator Author

this "little bit" may be sufficient to move the missile into the next block in the blockmap. However, it is never unlinked from the current block and linked into the new one, so the blockmap is corrupted and the infinite loop occurs.

For blockmap please see the last 2 comments, maybe I can fix it in ROOM.DEH, how do I make the created BFG missile MF_NOBLOCKMAP?

@fabiangreffrath
Copy link
Owner

And indeed, this little change makes the freeze disappear:

--- a/room.deh
+++ b/room_noblockmap.deh
@@ -34,7 +34,7 @@ Missile damage = 0
 Height = 0
 Initial frame = 118
 Width = 0
-Bits = 67080
+Bits = 67096
 Alert sound = 83
 Speed = -65536
 

@fabiangreffrath
Copy link
Owner

I'd consider this freeze consistent with Vanilla, the issue is not easy (impossible?) to fix in source code and a change of a mere number in the DEHACKED patch solves this issue for all ports and EXEs.

@SoDOOManiac
Copy link
Collaborator Author

Yes, nice that this bug turned out so simple to workaround in Dehacked!

Re-uploaded my re-release of ROOM with this fix
https://www.moddb.com/mods/room/downloads/room-berserk-edition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants