Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32L496xx: MCU restarts after running st-util, loading image #980

Closed
mykmelez opened this issue Jun 8, 2020 · 19 comments · Fixed by #1124
Closed

STM32L496xx: MCU restarts after running st-util, loading image #980

mykmelez opened this issue Jun 8, 2020 · 19 comments · Fixed by #1124

Comments

@mykmelez
Copy link

mykmelez commented Jun 8, 2020

When I run st-util 1.6.1, the board restarts rather than waiting for me to start arm-none-eabi-gdb, load the image, and continue. And it often (although not always) also restarts right after I load the image, before I enter continue. In st-util 1.6.0, the board wouldn't restart until after I started arm-none-eabi-gdb, loaded the image, and entered continue.

  • Programmer/board type: ST-LINK/V2
  • Programmer firmware version: V2J29S7
  • Operating system and version: macOS Catalina 10.15.5
  • Stlink tools version: v1.6.1 (installed via Homebrew)
  • Stlink commandline tool name: st-util
  • Target chip (and board if applicable): STM32L496xx

Commandline-Output:

Running st-util 1.6.1 shows this output:

> st-util
st-util
2020-06-08T14:56:16 INFO common.c: L496x/L4A6x: 256 KiB SRAM, 1024 KiB flash in at least 2 KiB pages.
2020-06-08T14:56:16 INFO gdb-server.c: Listening at *:4242...

And then the board restarts. Starting arm-none-eabi-gdb and then loading the image often (but not always) also results in the board restarting after the image has been loaded, but before I enter continue.

Expected/description:

st-util 1.6.0 showed this instead:

st-util 1.6.0
2020-06-08T14:57:16 INFO common.c: Loading device parameters....
2020-06-08T14:57:16 INFO common.c: Device connected is: L496x/L4A6x device, id 0x20006461
2020-06-08T14:57:16 INFO common.c: SRAM size: 0x40000 bytes (256 KiB), Flash: 0x100000 bytes (1024 KiB) in pages of 2048 bytes
2020-06-08T14:57:16 INFO gdb-server.c: Chip ID is 00000461, Core ID is  xxxxxxxx.
2020-06-08T14:57:16 INFO gdb-server.c: Listening at *:4242...

And the board didn't restart until I started arm-none-eabi-gdb, loaded the image, and entered continue.

@Nightwalker-87
Copy link
Member

Honestly I don't really understand this issue report.
Maybe the correct workflow should be updated in our documentation, but for that we need someone who is clearly aware of how things should proceed step by step during a gdb debugging routine.

@Ant-ON
Copy link
Collaborator

Ant-ON commented Mar 26, 2021

@Nightwalker-87
@mykmelez st-util has the --no-reset argument (#1114 add synonim --hot-plug). This argument disables the reset on connect

@Nightwalker-87
Copy link
Member

There may be no response from the author any more. I still can't see the error yet.

@mykmelez
Copy link
Author

There may be no response from the author any more. I still can't see the error yet.

I'm still responsive, just not able to respond immediately. I'll dig in further to see how I can describe the issue better and/or isolate it to a specific change between 1.6.0 and 1.6.1.

@mykmelez
Copy link
Author

mykmelez commented Apr 1, 2021

I bisected between v1.6.0 and v1.6.1 and identified the regressing commit:

09ea99aea51f741ce9c8c0e2ad755d40b3d86021 is the first bad commit
commit 09ea99aea51f741ce9c8c0e2ad755d40b3d86021
Author: Miklós Márton <[email protected]>
Date:   Mon Feb 24 21:34:36 2020 +0100

    Add support for STLink V3
    Flash programming is broken

 include/stlink.h             |  68 +++++++-
 include/stlink/commands.h    |  48 +++---
 include/stlink/tools/flash.h |   2 +-
 include/stlink/usb.h         |  14 +-
 src/common.c                 | 100 +++++++-----
 src/flash_loader.c           |   4 +-
 src/gdbserver/gdb-server.c   |   2 +-
 src/logging.c                |   2 +-
 src/sg.c                     |  20 +--
 src/tools/flash.c            |  10 +-
 src/usb.c                    | 368 ++++++++++++++++++++++++++++++++++---------
 11 files changed, 479 insertions(+), 159 deletions(-)

Unfortunately, that's rather a large commit, and I'm not very familiar with this codebase (nor STM32 programming more generally), so it isn't immediately obvious to me where the problem might be nor how to debug it.

I can run my build of st-util in a debugger, however, and I'm happy to do so, if you have some ideas about where the issue might be that you'd like me to investigate.

FWIW, I use a .gdbinit file that looks like this:

set remotetimeout 20
target ext :4242
file name-of-file.elf
load
mem 0x20040000 0x20050000 rw nocache

(I know that last line is redundant in st-link 1.6.1, but my team is still pinned to 1.6.0 because of this issue.)

Here's what gdb output looks like when I start gdb with a build of st-util from before the regressing commit:

GNU gdb (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 8.0.50.20171128-git
…
0x08000718 in ?? ()
Loading section …
…
Start address 0x8009544, load size 815732
Transfer rate: 19 KB/sec, 15106 bytes/write.
(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08002918 in ?? ()
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: name-of-file.elf 

Then my program starts running on the target.

When I start gdb with a build of st-util on/after the regressing commit, however, I see this:

GNU gdb (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 8.0.50.20171128-git
…
0xfffffffe in ?? ()
Loading section …
…
Start address 0x8009544, load size 815732
Transfer rate: 19 KB/sec, 15106 bytes/write.
(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0xfffffff0 in ?? ()
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: name-of-file.elf 

Program received signal SIGTRAP, Trace/breakpoint trap.
0xfffffff8 in ?? ()
(gdb) 

In other words: the program fails to start, even on the second try. And subsequent tries continue to fail.

Note that the program sometimes fails to start on the first try even before the regressing commit. But when that happens, it always starts the second time.

After the regressing commit, however, sometimes it successfully starts on the first try; but when it doesn't, then it never succeeds.

@Nightwalker-87
Copy link
Member

@mykmelez Thanks for the update and the details.
The mentioned commit appears to have introduced some other issues as well in the past.
I'll try to find a more specific hint.

@Nightwalker-87
Copy link
Member

Here is the graphical view of the mentioned commit: 09ea99a

I may assume the relevant change is in src/gdbserver/gdb-server.c as it is related to the use of st-util.
Can you try if reverting that change there does have any effect?

@Nightwalker-87
Copy link
Member

Obviously also related to #692.

@mykmelez
Copy link
Author

mykmelez commented Apr 3, 2021

I tried reverting the change in gdb-server.c on top of the regressing commit, but that didn't have any effect. Then I started reverting additional chunks by hand, one at a time, and I discovered that reverting the changes to just two functions in usb.c resolved the issues. Here's the set of changes I made to restore the original behavior:

diff --git a/src/usb.c b/src/usb.c
index 12945d6..3a67749 100644
--- a/src/usb.c
+++ b/src/usb.c
@@ -505,10 +505,10 @@ int _stlink_usb_reset(stlink_t * sl) {
     int i = fill_command(sl, SG_DXFER_FROM_DEV, rep_len);
 
     cmd[i++] = STLINK_DEBUG_COMMAND;
-    if (sl->version.jtag_api == STLINK_JTAG_API_V1)
+    // if (sl->version.jtag_api == STLINK_JTAG_API_V1)
         cmd[i++] = STLINK_DEBUG_APIV1_RESETSYS;
-    else
-        cmd[i++] = STLINK_DEBUG_APIV2_RESETSYS;
+    // else
+    //     cmd[i++] = STLINK_DEBUG_APIV2_RESETSYS;
 
     size = send_recv(slu, 1, cmd, slu->cmd_len, data, rep_len);
     if (size == -1) {
@@ -718,14 +718,15 @@ int _stlink_usb_read_all_regs(stlink_t *sl, struct stlink_reg *regp) {
     unsigned char* const cmd = sl->c_buf;
     unsigned char* const data = sl->q_buf;
     ssize_t size;
-    uint32_t rep_len = sl->version.jtag_api == STLINK_JTAG_API_V1 ? 84 : 88;
+    // uint32_t rep_len = sl->version.jtag_api == STLINK_JTAG_API_V1 ? 84 : 88;
+    uint32_t rep_len = 84;
     int i = fill_command(sl, SG_DXFER_FROM_DEV, rep_len);
 
     cmd[i++] = STLINK_DEBUG_COMMAND;
-    if (sl->version.jtag_api == STLINK_JTAG_API_V1)
+    // if (sl->version.jtag_api == STLINK_JTAG_API_V1)
         cmd[i++] = STLINK_DEBUG_APIV1_READALLREGS;
-    else
-        cmd[i++] = STLINK_DEBUG_APIV2_READALLREGS;
+    // else
+    //     cmd[i++] = STLINK_DEBUG_APIV2_READALLREGS;
     size = send_recv(slu, 1, cmd, slu->cmd_len, data, rep_len);
     if (size == -1) {
         printf("[!] send_recv STLINK_DEBUG_READALLREGS\n");
@@ -734,7 +735,8 @@ int _stlink_usb_read_all_regs(stlink_t *sl, struct stlink_reg *regp) {
 
     /* V1: regs data from offset 0 */
     /* V2: status at offset 0, regs data from offset 4 */
-    int reg_offset = sl->version.jtag_api == STLINK_JTAG_API_V1 ? 0 : 4;
+    // int reg_offset = sl->version.jtag_api == STLINK_JTAG_API_V1 ? 0 : 4;
+    int reg_offset = 0;
     sl->q_len = (int) size;
     stlink_print_data(sl);
     for(i=reg_offset; i<16; i++)

The changes in _stlink_usb_reset() fixed the issue that the MPU restarts the program rather than halting when I invoke st-util; the changes in _stlink_usb_read_all_regs() fixed the issue that starting the program in GDB fails repeatedly.

Looking at those changes, my first thought was that perhaps my ST-LINK/V2 actually supports STLINK_JTAG_API_V1 rather than STLINK_JTAG_API_V2, so I can resolve these problems (and others) by simply setting slv->jtag_api to STLINK_JTAG_API_V1 in __parse_version()_. But that introduced new problems:

[!] send_recv read reply failed: LIBUSB_ERROR_TIMEOUT
[!] send_recv STLINK_DEBUG_ENTER

2021-04-03T13:24:36 WARN flash_loader.c: Failed to write flash loader to sram!
2021-04-03T13:24:36 ERROR common.c: stlink_flash_loader_init() == -1

So I guess it isn't that simple.

I also considered the possibility that I'm running older firmware on the ST-LINK/V2, and newer firmware would resolve the problem. So I updated the device from V2J29S7 to V2J37S7 (the newest version available on ST's website). But that doesn't seem to have made a difference.

I'm not sure what to try next. Perhaps my device really does support STLINK_JTAG_API_V2, but the code in usb.c is using the V2 API incorrectly in some cases? If so, then it wouldn't be correct to treat my device as supporting STLINK_JTAG_API_V1. Rather, it would be necessary to identify the specific cases of incorrect V2 API usage and correct them. I'm not sure if that's really the right approach, though.

(Of course I also know that any changeset would have to be developed against the tip of the develop branch, not on top of the regressing commit. I only started there in the hopes that it would help to narrow the regression.)

@mykmelez
Copy link
Author

mykmelez commented Apr 3, 2021

After thinking about it a bit more, I realized that maybe the issue is that some of the code that was introduced to support STLINK_JTAG_API_V3 is specific to that JTAG API version, and it doesn't apply to V2. In that case, using the V1 version of the code for both V1 and V2 might be the solution, i.e. something like this change on top of the tip of the develop branch:

diff --git a/src/stlink-lib/usb.c b/src/stlink-lib/usb.c
index ce0d672..a8cd0fa 100644
--- a/src/stlink-lib/usb.c
+++ b/src/stlink-lib/usb.c
@@ -523,10 +523,10 @@ int _stlink_usb_reset(stlink_t * sl) {
     i = fill_command(sl, SG_DXFER_FROM_DEV, rep_len);
     cmd[i++] = STLINK_DEBUG_COMMAND;
 
-    if (sl->version.jtag_api == STLINK_JTAG_API_V1) {
-        cmd[i++] = STLINK_DEBUG_APIV1_RESETSYS;
-    } else {
+    if (sl->version.jtag_api == STLINK_JTAG_API_V3) {
         cmd[i++] = STLINK_DEBUG_APIV2_RESETSYS;
+    } else {
+        cmd[i++] = STLINK_DEBUG_APIV1_RESETSYS;
     }
 
     size = send_recv(slu, 1, cmd, slu->cmd_len, data, rep_len);
@@ -796,15 +796,15 @@ int _stlink_usb_read_all_regs(stlink_t *sl, struct stlink_reg *regp) {
     unsigned char* const cmd = sl->c_buf;
     unsigned char* const data = sl->q_buf;
     ssize_t size;
-    uint32_t rep_len = sl->version.jtag_api == STLINK_JTAG_API_V1 ? 84 : 88;
+    uint32_t rep_len = sl->version.jtag_api == STLINK_JTAG_API_V3 ? 88 : 84;
     int i = fill_command(sl, SG_DXFER_FROM_DEV, rep_len);
 
     cmd[i++] = STLINK_DEBUG_COMMAND;
 
-    if (sl->version.jtag_api == STLINK_JTAG_API_V1) {
-        cmd[i++] = STLINK_DEBUG_APIV1_READALLREGS;
-    } else {
+    if (sl->version.jtag_api == STLINK_JTAG_API_V3) {
         cmd[i++] = STLINK_DEBUG_APIV2_READALLREGS;
+    } else {
+        cmd[i++] = STLINK_DEBUG_APIV1_READALLREGS;
     }
 
     size = send_recv(slu, 1, cmd, slu->cmd_len, data, rep_len);
@@ -814,9 +814,9 @@ int _stlink_usb_read_all_regs(stlink_t *sl, struct stlink_reg *regp) {
         return((int)size);
     }
 
-    /* V1: regs data from offset 0 */
-    /* V2: status at offset 0, regs data from offset 4 */
-    int reg_offset = sl->version.jtag_api == STLINK_JTAG_API_V1 ? 0 : 4;
+    /* V1–2: regs data from offset 0 */
+    /* V3: status at offset 0, regs data from offset 4 */
+    int reg_offset = sl->version.jtag_api == STLINK_JTAG_API_V3 ? 4 : 0;
     sl->q_len = (int)size;
     stlink_print_data(sl);
 

That change does fix the behavior when starting st-util, although I haven't been able to confirm that it fixes loading and starting a program in gdb, as I run into what looks like #995 when I try to do so:

> make clean && make debug && `find . -name st-util`

st-util
2021-04-03T14:02:19 WARN usb.c: NRST is not connected
2021-04-03T14:02:19 INFO common.c: L496x/L4A6x: 256 KiB SRAM, 1024 KiB flash in at least 2 KiB pages.
2021-04-03T14:02:19 WARN usb.c: NRST is not connected
2021-04-03T14:02:19 INFO gdb-server.c: Listening at *:4242...
2021-04-03T14:02:35 WARN usb.c: NRST is not connected
2021-04-03T14:02:35 INFO gdb-server.c: Found 6 hw breakpoint registers
2021-04-03T14:02:35 INFO gdb-server.c: GDB connected.
2021-04-03T14:02:35 WARN usb.c: NRST is not connected
2021-04-03T14:02:35 INFO gdb-server.c: flash_erase: block 08009000 -> 3d000
2021-04-03T14:02:35 INFO gdb-server.c: flash_erase: page 08009000
EraseFlash - Page:0x12 Size:0x800 2021-04-03T14:02:35 INFO gdb-server.c: flash_erase: page 08009800

EraseFlash - Page:0x8b Size:0x800 2021-04-03T14:02:38 INFO common.c: Starting Flash write for F2/F4/F7/L4
2021-04-03T14:02:38 INFO flash_loader.c: Successfully loaded flash loader in sram
2021-04-03T14:02:38 INFO gdb-server.c: flash_do: block 08009000 -> 3d000
2021-04-03T14:02:38 INFO gdb-server.c: flash_do: page 08009000
2021-04-03T14:03:28 ERROR flash_loader.c: Flash loader run error
2021-04-03T14:03:28 WARN flash_loader.c: Loader state: R2 0x80027CD R15 0x8002838
2021-04-03T14:03:28 WARN flash_loader.c: MCU state: DHCSR 0x1010009 DFSR 0x1 CFSR 0x0 HFSR 0x0
2021-04-03T14:03:28 ERROR common.c: stlink_flash_loader_run(0x8009000) failed! == -1

@Nightwalker-87
Copy link
Member

Thanks for investigating further. This is indeed very helpful 😉

STLINK_JTAG_API_V1 is solely used by the original STLINK/V1 programmer.
AFAIK the latter is only present on the STM32VLDiscovery and as a stand-alone programmer (now obsolete).
Your programmer does require the STLINK_JTAG_API_V2 API in order to work correctly.

With this information we may now be able to address not only this but also other similar issues.

Of course I also know that any changeset would have to be developed against the tip of the develop branch, not on top of the regressing commit. I only started there in the hopes that it would help to narrow the regression.

This is also the best possible approach from my point of view.

@Nightwalker-87
Copy link
Member

We should involve @Ant-ON in this discussion, as he edited this part of the code (programmer APIs) recently.

@Ant-ON
Copy link
Collaborator

Ant-ON commented Apr 4, 2021

@mykmelez Can you try develop branch? There really was a problem with stlink_usb_read_all_regs, but in a slightly different (#1027). About the differences in reset behavior I'll try to see it in the evening.

@mykmelez
Copy link
Author

mykmelez commented Apr 4, 2021

@Ant-ON, I tried the develop branch, and I see the same problems there. The second change I mentioned (#980 (comment)) is actually against the tip of the develop branch.

And it seems to fix both problems for me there (although I'm not entirely positive because I'm running into #995 when I try to test it).

@Ant-ON
Copy link
Collaborator

Ant-ON commented Apr 4, 2021

@mykmelez Current develop branch have small bug in the L4 flash loader. Bug with BSY flag waiting (Ant-ON@e4c8240). I fixed in https://github.com/Ant-ON/stlink/commits/flash_loader_fix but I haven't merge the changes yet. This is most likely the cause of error #995.

st-util does a reset at the startup. In version 1.6.0 and in version 1.6.1 too. It is unusual what the reset does not occur...

if (state.reset) {
stlink_reset(sl);
}

if (state.reset) {
stlink_reset(sl);
}

In the developer, you can start debugging without resetting by st-util --hot-plug

@mykmelez
Copy link
Author

mykmelez commented Apr 5, 2021

@mykmelez Current develop branch have small bug in the L4 flash loader. Bug with BSY flag waiting (Ant-ON/stlink@e4c8240). I fixed in https://github.com/Ant-ON/stlink/commits/flash_loader_fix but I haven't merge the changes yet. This is most likely the cause of error #995.

Indeed! With your fix, and my fixes in the comment above, I'm able to flash and run my large program successfully onto a device with an STM32L496 MCU.

st-util does a reset at the startup. In version 1.6.0 and in version 1.6.1 too. It is unusual what the reset does not occur...

A reset does occur for me in both 1.6.0 and 1.6.1, but what happens after the reset is different.

  • 1.6.0: after reset, the processor halts, so the program on the device stops running, and it doesn't start again (until I flash a new version of it and start it from gdb).
  • 1.6.1: after reset, the existing program on the device restarts and runs again.

With my fixes, the behavior returns to the behavior on 1.6.0. And flashing works again.

In the developer, you can start debugging without resetting by st-util --hot-plug

That's good to know, but I actually do want to reset when running st-util, since I start st-util in order to flash a new program onto my device. Thus the behavior in 1.6.0 is actually the ideal behavior for me: when starting st-util, halt the processor and wait for gdb to start and flash a new program.

However, if it's intentional for the device to restart rather than halting when starting st-util, then that's fine, even if it isn't ideal. In that case, perhaps only one of my fixes is necessary in order to fix the issue with flashing.

@Nightwalker-87
Copy link
Member

We should implement the most useful approach here, I think.

@Ant-ON
Copy link
Collaborator

Ant-ON commented Apr 6, 2021

@mykmelez As I understand it, different versions of the reset functions have the different behaviour. But in general, both work. I think we should save them as they are.

I have unified the reset function. Removed unnecessary reset calls from st-util. Changes can be found in https://github.com/Ant-ON/stlink/tree/fl_fix
To halt after connecting (as in 1.6.0) now need to use st-util --connect-under-reset. To run after connecting (as in 1.6.1) need to use st-util. To connect without reset need to use st-util --hot-plug.

@mykmelez
Copy link
Author

@Ant-ON, in my testing, your fl_fix branch seems to have fixed this issue for me:

  1. When starting st-util, if I specify --connect-under-reset, then I get the "halt MCU" behavior that was the default in 1.6.0.
  2. When flashing a binary using gdb, the binary flashes correctly and starts successfully in the debugger.

I also tested st-util without any flags and with --hot-plug, and those variants also seem to work as you described. So I think your branch has addressed this issue, and it can be closed once your branch has landed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants