Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctrl-c after idf.py gdb isn't properly caught and is behaving as if its in the background in the terminal (IDFGH-10638) #11871

Closed
3 tasks done
cmorganBE opened this issue Jul 13, 2023 · 7 comments
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally

Comments

@cmorganBE
Copy link

cmorganBE commented Jul 13, 2023

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

Ubuntu Linux 23.04 x86-64, esp-idf 5.1, using blinky example and esp32-s3-devkitc-1.

Plugged into 'usb' micro-usb-b port, able to idf.py flash but when debugging:

Running idf.py openocd in one terminal, then idf.py gdb in another terminal. The idf.py gdb was waiting for something so I hit ctrl-c so gdb would interrupt the running program and drop me back into a gdb prompt but instead I was dropped out to a shell prompt.

Oddly I continue to get gdb output in the same terminal. It's as if gdb is running in the background and never caught the ctrl-c.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Jul 13, 2023
@github-actions github-actions bot changed the title ctrl-c after idf.py gdb isn't properly caught and is behaving as if its in the background in the terminal ctrl-c after idf.py gdb isn't properly caught and is behaving as if its in the background in the terminal (IDFGH-10638) Jul 13, 2023
@cmorganBE
Copy link
Author

Here is the text from the terminal where you can see something is odd. I hit ctrl-c after a moment of the line '(gdb)' after the '79' showed up. ctrl-c resulted in 'Quit' and dropped me back to a shell where I hit enter and "A debugging session is active" and the following text appeared.

Hardware assisted breakpoint 1 at 0x42007bc8: file /home/cmorgan/esp/esp-idf/examples/get-started/blink/main/blink_example_main.c, line 79.
[esp32s3.cpu1] Debug controller was reset.
[esp32s3.cpu1] Core was reset.
[esp32s3.cpu1] Target halted, PC=0x40041A7C, debug_reason=00000000
[esp32s3.cpu1] Reset cause (3) - (Software core reset)
[esp32s3.cpu0] Target halted, PC=0x42007BC8, debug_reason=00000001
Set GDB target to 'esp32s3.cpu0'
[esp32s3.cpu1] Target halted, PC=0x40379346, debug_reason=00000000
[New Thread 1070174136]
[New Thread 1070177904]
[New Thread 1070176020]
[New Thread 1070165184]
[New Thread 1070169628]
[New Thread 1070163556]
[Switching to Thread 1070174136]

Thread 2 "main" hit Temporary breakpoint 1, app_main () at /home/cmorgan/esp/esp-idf/examples/get-started/blink/main/blink_example_main.c:79
79	{
(gdb) Quit
(gdb) cmorgan@cmorgan-bec:~/esp/esp-idf/examples/get-started/blink$ 
cmorgan@cmorgan-bec:~/esp/esp-idf/examples/get-started/blink$ 

quit
cmorgan@cmorgan-bec:~/esp/esp-idf/examples/get-started/blink$ A debugging session is active.

	Inferior 1 [Remote target] will be detached.

Quit anyway? (y or n) 

@cmorganBE
Copy link
Author

idf_py_gdb.webm

Here is a screen recording of the issue.

@fhrbata
Copy link
Collaborator

fhrbata commented Jul 15, 2023

Hello @cmorganBE ,
thank you for the report, detailed explanation and video. It's very much appreciated. It seems that the following cpython commit a22be4943c119fecf5433d999227ff78fc2e5741

commit a22be4943c119fecf5433d999227ff78fc2e5741
Author: Victor Stinner <[email protected]>
Date:   Mon Sep 27 14:20:31 2021 +0200

    bpo-45274: Fix Thread._wait_for_tstate_lock() race condition (GH-28532)

changed the behavior of the Thread.join() method. The KeyboardInterrupt and other exceptions are now handled in the threading module while waiting on the thread to finish. It also marks the thread as stopped, but it's IMHO false.
Meaning the second join() call, after the KeyboardInterrupt, in our code returns without waiting on the gdb process. This would explain the behavior your are seeing. It can be easily reproduced. We will have to adjust our code to this change.
Thank you

@fhrbata
Copy link
Collaborator

fhrbata commented Jul 16, 2023

Looking into this more closely I think this is problem in cpython. The if lock.locked(): doesn't seem right, because there is IMHO no way how to find out if the lock was actually acquired and is not held by something else(e.g. still held from the thread bootstrap code.
This is a simple reproducer for the problem.

#!/usr/bin/env python3

import threading
import time

def fun(count):
    while count:
        print('sleeping ...')
        time.sleep(1)
        count -= 1
    print('thread exited')

t = threading.Thread(target=fun, args=(30,))
t.start()

while True:
    try:
        print('join started')
        t.join()
        print('thread joined')
        break
    except KeyboardInterrupt:
        print('SIGINT')
        continue

join() after KeyboardInterrupt will return instead of waiting for the _tstate_lock, because in the first join() call the thread is wrongly marked as stopped, which is not true.

@fhrbata
Copy link
Collaborator

fhrbata commented Jul 16, 2023

There is already a bug reported about this in cpython python/cpython#90882. I guess we will have to workaround this somehow.

@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development Status: Reviewing Issue is being reviewed Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Opened Issue is new Status: Selected for Development Issue is selected for development Status: Reviewing Issue is being reviewed labels Jul 17, 2023
@cmorganBE
Copy link
Author

@fhrbata thank you for looking into this in such depth and for fixing it, I had reasonably low expectations, its one issue reported among hundreds or thousands, but I'm appreciative that this was resolved in what looks like a clean and correct manner

@fhrbata
Copy link
Collaborator

fhrbata commented Jul 21, 2023

@cmorganBE Thank you and hopefully the fix is working for you. I would like to thank you one more time for such detailed bug report. It's very much appreciated!

ilutchenko pushed a commit to ilutchenko/esp-idf that referenced this issue Jul 24, 2023
idf.py spawns gdb process within a thread and uses Thread.join() to wait
for the gdb process to finish. As CTRL+C(SIGINT) is used by gdb to interrupt the
running program, we catch the SIGINT while waiting on the gdb to finish,
and try Thread.join() again.

With cpython's commit

	commit a22be4943c119fecf5433d999227ff78fc2e5741
	Author: Victor Stinner <[email protected]>
	Date:   Mon Sep 27 14:20:31 2021 +0200

	    bpo-45274: Fix Thread._wait_for_tstate_lock() race condition (GH-28532)

this logic doesn't work anymore, because cpython internally marks the
thread as stopped when join() is interrupted with an exception. IMHO
this is broken in cpython and there is a bug report about this
python/cpython#90882. Problem is that
waiting on a thread to finish is based on acquiring a lock. Meaning
join() is waiting on _tstate_lock. If this wait is interrupted, the
above referenced commit adds a logic that checks if the lock is help,
meaning the thread is done and marks the thread as stopped. But there is
no way to tell if the lock was acquired by us running join() or if it's
held by someone else e.g. still by the thread bootstrap code. Meaning
the thread is still running.

I may be missing something, but I don't see any reason why to spawn gdb
process within a thread. This change removes the thread and spawns gdb
directly. Instead waiting on a thread, we wait on the process to finish,
replacing join() with wait() and avoiding this problem.

Closes espressif#11871

Signed-off-by: Frantisek Hrbata <[email protected]>
ilutchenko pushed a commit to ilutchenko/esp-idf that referenced this issue Jul 24, 2023
idf.py spawns gdb process within a thread and uses Thread.join() to wait
for the gdb process to finish. As CTRL+C(SIGINT) is used by gdb to interrupt the
running program, we catch the SIGINT while waiting on the gdb to finish,
and try Thread.join() again.

With cpython's commit

	commit a22be4943c119fecf5433d999227ff78fc2e5741
	Author: Victor Stinner <[email protected]>
	Date:   Mon Sep 27 14:20:31 2021 +0200

	    bpo-45274: Fix Thread._wait_for_tstate_lock() race condition (GH-28532)

this logic doesn't work anymore, because cpython internally marks the
thread as stopped when join() is interrupted with an exception. IMHO
this is broken in cpython and there is a bug report about this
python/cpython#90882. Problem is that
waiting on a thread to finish is based on acquiring a lock. Meaning
join() is waiting on _tstate_lock. If this wait is interrupted, the
above referenced commit adds a logic that checks if the lock is help,
meaning the thread is done and marks the thread as stopped. But there is
no way to tell if the lock was acquired by us running join() or if it's
held by someone else e.g. still by the thread bootstrap code. Meaning
the thread is still running.

I may be missing something, but I don't see any reason why to spawn gdb
process within a thread. This change removes the thread and spawns gdb
directly. Instead waiting on a thread, we wait on the process to finish,
replacing join() with wait() and avoiding this problem.

Closes espressif#11871

Signed-off-by: Frantisek Hrbata <[email protected]>
espressif-bot pushed a commit that referenced this issue Jul 27, 2023
idf.py spawns gdb process within a thread and uses Thread.join() to wait
for the gdb process to finish. As CTRL+C(SIGINT) is used by gdb to interrupt the
running program, we catch the SIGINT while waiting on the gdb to finish,
and try Thread.join() again.

With cpython's commit

	commit a22be4943c119fecf5433d999227ff78fc2e5741
	Author: Victor Stinner <[email protected]>
	Date:   Mon Sep 27 14:20:31 2021 +0200

	    bpo-45274: Fix Thread._wait_for_tstate_lock() race condition (GH-28532)

this logic doesn't work anymore, because cpython internally marks the
thread as stopped when join() is interrupted with an exception. IMHO
this is broken in cpython and there is a bug report about this
python/cpython#90882. Problem is that
waiting on a thread to finish is based on acquiring a lock. Meaning
join() is waiting on _tstate_lock. If this wait is interrupted, the
above referenced commit adds a logic that checks if the lock is help,
meaning the thread is done and marks the thread as stopped. But there is
no way to tell if the lock was acquired by us running join() or if it's
held by someone else e.g. still by the thread bootstrap code. Meaning
the thread is still running.

I may be missing something, but I don't see any reason why to spawn gdb
process within a thread. This change removes the thread and spawns gdb
directly. Instead waiting on a thread, we wait on the process to finish,
replacing join() with wait() and avoiding this problem.

Closes #11871

Signed-off-by: Frantisek Hrbata <[email protected]>
espressif-bot pushed a commit that referenced this issue Sep 1, 2023
idf.py spawns gdb process within a thread and uses Thread.join() to wait
for the gdb process to finish. As CTRL+C(SIGINT) is used by gdb to interrupt the
running program, we catch the SIGINT while waiting on the gdb to finish,
and try Thread.join() again.

With cpython's commit

	commit a22be4943c119fecf5433d999227ff78fc2e5741
	Author: Victor Stinner <[email protected]>
	Date:   Mon Sep 27 14:20:31 2021 +0200

	    bpo-45274: Fix Thread._wait_for_tstate_lock() race condition (GH-28532)

this logic doesn't work anymore, because cpython internally marks the
thread as stopped when join() is interrupted with an exception. IMHO
this is broken in cpython and there is a bug report about this
python/cpython#90882. Problem is that
waiting on a thread to finish is based on acquiring a lock. Meaning
join() is waiting on _tstate_lock. If this wait is interrupted, the
above referenced commit adds a logic that checks if the lock is help,
meaning the thread is done and marks the thread as stopped. But there is
no way to tell if the lock was acquired by us running join() or if it's
held by someone else e.g. still by the thread bootstrap code. Meaning
the thread is still running.

I may be missing something, but I don't see any reason why to spawn gdb
process within a thread. This change removes the thread and spawns gdb
directly. Instead waiting on a thread, we wait on the process to finish,
replacing join() with wait() and avoiding this problem.

Closes #11871

Signed-off-by: Frantisek Hrbata <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

3 participants