Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyterm on stdio_cdc_acm stops working after a few seconds #16077

Closed
nicoHarel opened this issue Feb 23, 2021 · 13 comments
Closed

pyterm on stdio_cdc_acm stops working after a few seconds #16077

nicoHarel opened this issue Feb 23, 2021 · 13 comments
Assignees
Labels
Area: tools Area: Supplementary tools Area: USB Area: Universal Serial Bus Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@nicoHarel
Copy link

Hello everyone,

Context

So we've decided to update our RIOT to the newest version available (ours was a 2019 version) and we're encountering a very unexpected issue.

We were originally using the Particle Xenon board with an UART debugger but since the terminal can be accessed through the USB port now, we decided to try it.

Description

Unfortunately we're encountering a really weird issue:
The terminal stops displaying anything after a few seconds. To test this we used the very simple hello-world example and added a 3-4 seconds delay at the start of the main. Without the delay, the messages are displayed correctly, with it, nothing (except the "this is riot" text at boot).
Same thing is happening with our project, after a few seconds, nothing is displayed anymore, we only get a few seconds worth of messages. And everything is still connected. I honestly do not get it.

Of course everything still works fine when using the stdio_uart module.

Funny thing is, this seems to happen only when using pyterm on Fedora... No problem on Ubuntu. Leading me to think this might be a problem related to pyterm ?

We'd use our VM on Ubuntu, but each reset/reprogramming disconnect the board when using the USB port, meaning we miss the xenon's boot and have to redirect the USB device to our VM every single time...

I can't test with different boards at the moment except other particle xenons, I have none. Anyone ever encountered such an issue ?

PS: We use the particle JTAG programmer, so no flashing through the USB yet. One problem at a time =)

@benpicco benpicco added Area: tools Area: Supplementary tools Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) labels Feb 23, 2021
@benpicco benpicco added the Area: USB Area: Universal Serial Bus label Feb 23, 2021
@chrysn
Copy link
Member

chrysn commented Feb 23, 2021

To narrow down the problem, please report on the following:

  • What does sudo dmesg -Hw show from the point where you plug in the device to the point where the connection stops working?
  • Does the same happen when, instead of starting the terminal through make term, you run socat - /dev/ttyACM0,cfmakeraw? (Note that unlike pyterm this doesn't autoconnect, so you'll need to plug the device in and press Enter on the command briefly afterwards)
  • Does lsusb still report the device as present when things hang?

@nicoHarel
Copy link
Author

nicoHarel commented Feb 24, 2021

The worst thing is, nothing is disconnected from what I see:

  • sudo dmesg -Hw, everything looks normal (I manually disconnected the device at the end, that's why you see the USB disconnected message):
[Feb23 16:45] usb 2-3.4: new full-speed USB device number 112 using xhci_hcd
[  +0.081111] usb 2-3.4: New USB device found, idVendor=1209, idProduct=7d00, bcdDevice= 0.00
[  +0.000003] usb 2-3.4: New USB device strings: Mfr=3, Product=2, SerialNumber=4
[  +0.000002] usb 2-3.4: Product: USB device
[  +0.000002] usb 2-3.4: Manufacturer: RIOT-os.org
[  +0.000001] usb 2-3.4: SerialNumber: FBBC089FA940AC7C
[  +0.008450] cdc_acm 2-3.4:1.0: ttyACM0: USB ACM device
[Feb23 16:46] usb 2-3.4: USB disconnect, device number 112
  • lsusb, device shows whether I get an output or not :
Bus 002 Device 115: ID 1209:7d00 Generic
  • socat - /dev/ttyACM0,cfmakeraw, as opposed to make term, this works perfectly !

  • This is all I get from make term:

make BOARD=particle-xenon PORT=/dev/ttyACM0 term 
/home/nharel/RIOT/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200"  
2021-02-24 09:28:10,768 # Connect to serial port /dev/ttyACM0
Welcome to pyterm!
Type '/exit' to exit.
2021-02-24 09:28:11,773 # main(): This is RIOT! (Version: 2021.04-devel-704-g2f919)
  • With the sockets, I get everything:
socat - /dev/ttyACM0,cfmakeraw
main(): This is RIOT! (Version: 2021.04-devel-704-g2f919)
Hello World!
You are running RIOT on a(n) particle-xenon board.
This board features a(n) nrf52 MCU.

@chrysn chrysn changed the title Particle Xenon: USB std out stops working after a few seconds pyterm on stdio_cdc_acm stops working after a few seconds Feb 24, 2021
@chrysn
Copy link
Member

chrysn commented Feb 24, 2021

OK thanks, that narrows it down already a bit.

Just to verify that it's not any other build system related issue, please try running just ./dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200" (instead of make ... term) in the same way you ran socat -- ie. plug in board then start the command.

If that also runs into the error conditions, next steps based from there are to try:

  • leaving out the baud rate setting
  • adding --no-reconnect (just to see if it's around there)
  • prefixing the whole command with strace to see what it is doing in detail (with 2>/tmp/pyterm-strace.log appended as that's ... verbose)

@nicoHarel
Copy link
Author

nicoHarel commented Feb 24, 2021

Ok so, with ./dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200 I get very incoherent results. Most of the time I encounter the same problem, but there were very few occasions were I got the full text output. I don't know if the cmd works better or it's just luck at this point. Removing the baud rate setting doesn't seem to change anything. Same for --no-reconnect.

I took the liberty of adding an infinite loop at the end of the main with a puts command printing text ("STD out test") every two seconds. And guess what, after a while, I get an output again:

~/RIOT/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200"
2021-02-24 14:22:00,658 # Connect to serial port /dev/ttyACM0
Welcome to pyterm!
Type '/exit' to exit.
2021-02-24 14:22:01,660 # main(): This is RIOT! (Version: 2021.04-devel-704-g2f919)
2021-02-24 14:22:19,373 # HSSSSTD out test
2021-02-24 14:22:21,385 # STD out test
2021-02-24 14:22:23,397 # STD out test
2021-02-24 14:22:25,409 # STD out test
2021-02-24 14:22:27,420 # STD out test
2021-02-24 14:22:29,431 # STD out test

As you can see, at 14:22:19 it recovers but obviously something happened before. It takes approximately 18 seconds everytime. I tried it a few dozen times.

I tried using strace but the log file is huge, 6000 lines. I've tried runing the strace utility on both by Fedora Machine, and the Ubuntu VM. Well, it's hard to compare since it's two different OS. I'm guessing the part which is interesting is after this line:

write(1, "Welcome to pyterm!\nType '/exit' "..., 41) = 41

So, there's the Fedora log after that line, it's a bit hard to read, but we can see the /exit cmd that I used to close pyterm (starts with read(0, "/", 1) = 1) , between the start of pyterm and the exit command, not much is happening (I waited for a few "STD out test" message to be printed):

strace output
> write(1, "Welcome to pyterm!\nType '/exit' "..., 41) = 41
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, TIOCGWINSZ, {ws_row=24, ws_col=141, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(0, TIOCSWINSZ, {ws_row=24, ws_col=141, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, 8) = 0
pselect6(1, [0], NULL, NULL, NULL, {[], 8}) = 1 (in [0])
read(0, "/", 1)                         = 1
select(1, [0], NULL, [0], {tv_sec=0, tv_usec=0}) = 0 (Timeout)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "/", 1)                        = 1
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, 8) = 0
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, 8) = 0
pselect6(1, [0], NULL, NULL, NULL, {[], 8}) = 1 (in [0])
read(0, "e", 1)                         = 1
select(1, [0], NULL, [0], {tv_sec=0, tv_usec=0}) = 0 (Timeout)
write(1, "e", 1)                        = 1
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, 8) = 0
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, 8) = 0
pselect6(1, [0], NULL, NULL, NULL, {[], 8}) = 1 (in [0])
read(0, "x", 1)                         = 1
select(1, [0], NULL, [0], {tv_sec=0, tv_usec=0}) = 0 (Timeout)
write(1, "x", 1)                        = 1
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, 8) = 0
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, 8) = 0
pselect6(1, [0], NULL, NULL, NULL, {[], 8}) = 1 (in [0])
read(0, "i", 1)                         = 1
select(1, [0], NULL, [0], {tv_sec=0, tv_usec=0}) = 0 (Timeout)
write(1, "i", 1)                        = 1
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, 8) = 0
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, 8) = 0
pselect6(1, [0], NULL, NULL, NULL, {[], 8}) = 1 (in [0])
read(0, "t", 1)                         = 1
select(1, [0], NULL, [0], {tv_sec=0, tv_usec=0}) = 0 (Timeout)
write(1, "t", 1)                        = 1
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, 8) = 0
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, 8) = 0
pselect6(1, [0], NULL, NULL, NULL, {[], 8}) = 1 (in [0])
read(0, "\r", 1)                        = 1
write(1, "\n", 1)                       = 1
ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
rt_sigaction(SIGWINCH, {sa_handler=0x7f7bd580bc30, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7bd5519970, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f7be2e45860}, 8) = 0
getpid()                                = 126536
write(6, "2021-02-24 14:50:52,422 # Exitin"..., 41) = 41
write(1, "2021-02-24 14:50:52,422 # Exitin"..., 41) = 41
stat("/home/nharel/.history", {st_mode=S_IFREG|0600, st_size=6, ...}) = 0
readlink("/home/nharel/.history", 0x7fff4e329870, 4096) = -1 EINVAL (Invalid argument)
getpid()                                = 126536
openat(AT_FDCWD, "/home/nharel/.history-26536.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0600) = 12
write(12, "/exit\n", 6)                 = 6
close(12)                               = 0
readlink("/home/nharel/.history", 0x7fff4e329870, 4096) = -1 EINVAL (Invalid argument)
rename("/home/nharel/.history-26536.tmp", "/home/nharel/.history") = 0
chown("/home/nharel/.history", 663613, 200471) = 0
close(6)                                = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, {sa_handler=0x7f7be2b87d75, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f7be2e45860}, 8) = 0
sigaltstack(NULL, {ss_sp=0x55b29bdd9e80, ss_flags=0, ss_size=16384}) = 0
sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}, NULL) = 0
munmap(0x7f7bd4e11000, 593920)          = 0
exit_group(0)                           = ?
+++ exited with 0 +++

[edit: formatting of strace output]

@dylad
Copy link
Member

dylad commented Feb 24, 2021

As you can see, at 14:22:19 it recovers but obviously something happened before. It takes approximately 18 seconds everytime. I tried it a few dozen times.

If you add more verbose to your output, does it still take ~ 18 seconds or less ?

@nicoHarel
Copy link
Author

If you add more verbose to your output, does it still take ~ 18 seconds or less ?

From what I can see it doesn't change anything.

Ok, so further testing, if I start pyterm while the board has already been running for a while, everything works fine:

ello-world]$ ~/RIOT/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200"
2021-02-24 16:01:12,031 # Connect to serial port /dev/ttyACM0
Welcome to pyterm!
Type '/exit' to exit.
2021-02-24 16:01:13,035 # STD out test
2021-02-24 16:01:14,824 # STD out test
2021-02-24 16:01:16,838 # STD out test
2021-02-24 16:01:18,851 # STD out test
/exit

@chrysn
Copy link
Member

chrysn commented Feb 25, 2021

Phew, the behavior sounds like it's waiting for a buffer to fill without a timeout, but then doesn't.

@OlegHahm, any do you have a clue?

@nicoHarel
Copy link
Author

Just so you know, we've started using the RS232 debugger again, it's not optimal but it is working fine for us. It's a bit of a struggle now since every time we flash an riot example/test on the board with have to add the stdio_uart module.

@chrysn
Copy link
Member

chrysn commented Mar 5, 2021 via email

@nicoHarel
Copy link
Author

nicoHarel commented Mar 5, 2021

It is a viable option, it is true.

But it's a bit less handy since every time you flash or reboot the board, socat disconnects and you have to restart it manually unlike the RS232 debugger where you can flash without disconnecting anything.

Since it will probably identify the problem, I'm going to use socat for a while see if anything weird occurs.

Edit: I'm going to try and add some options witch socat so it reconnects automatically =)

@nicoHarel
Copy link
Author

Ok so none of the options I tried work with socat, (forever, retry...), I get this output:
option "interval" not supported with this address type meaning I still get disconnected when I flash the board and that I have to restart socat manually every single time (most importantly we lose some valuable information on the first seconds of booting).

I'll try to make some sort of udev rule on my machine so it automatically launches a socat command when plugged in.

@chrysn
Copy link
Member

chrysn commented Mar 16, 2021

Before a udev rule, a simple while true; do sleep 1; socat - /dev/ttyACM0,cfmakeraw; done rather than make term should do just as fine. I've been asking around on IRC for more ideas on what could be done to track this down further.

@MrKevinWeiss MrKevinWeiss added this to the Release 2021.07 milestone Jun 21, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@maribu
Copy link
Member

maribu commented Jan 5, 2023

Sorry for the long latency. If I understand correctly, the issue is that pyterm gets stuck after a few seconds. The RIOT node itself works fine all the time?

I'd say this is likely an issue in pyserial that is used by pyterm. Maybe an update solves the issues?

I'll close this for now since this has been stale for so long. If there still is an issue, please re-open :)

@maribu maribu closed this as completed Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: tools Area: Supplementary tools Area: USB Area: Universal Serial Bus Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

7 participants