Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32 - TIMEOUT issue since CMSIS5 merge #4459

Closed
jeromecoutant opened this issue Jun 6, 2017 · 33 comments
Closed

STM32 - TIMEOUT issue since CMSIS5 merge #4459

jeromecoutant opened this issue Jun 6, 2017 · 33 comments

Comments

@jeromecoutant
Copy link
Collaborator

Description

Since #4294 merge, we got many issues with STM32 targets.

Tests are still on going, but it seems that:

  • OS2 tests status are all TIMEOUT for F4/F0/F7
  • OS5 tests status are OK with F4/F7
  • OS5 tests status are TIMEOUT with F1

Toolchain version:
All

mbed-cli version:
1.1.1

@jeromecoutant
Copy link
Collaborator Author

ex :

mbedgt: greentea test automation tool ver. 1.2.5
mbedgt: test specification file 'C:\github\mbed\BUILD\tests\NUCLEO_F103RB\GCC_ARM\test_spec.json' (specified with --test-spec option)
mbedgt: using 'C:\github\mbed\BUILD\tests\NUCLEO_F103RB\GCC_ARM\test_spec.json' from current directory!
mbedgt: detecting connected mbed-enabled devices...
mbedgt: detected 1 device
+---------------+----------------------+-------------+-------------+--------------------------+
| platform_name | platform_name_unique | serial_port | mount_point | target_id |
+---------------+----------------------+-------------+-------------+--------------------------+
| NUCLEO_F103RB | NUCLEO_F103RB[0] | COM26 | D: | 07000221D12A3E0FEE6EA832 |
+---------------+----------------------+-------------+-------------+--------------------------+
mbedgt: processing target 'NUCLEO_F103RB' toolchain 'GCC_ARM' compatible platforms... (note: switch set to --parallel 1)
+---------------+----------------------+-------------+-------------+--------------------------+
| platform_name | platform_name_unique | serial_port | mount_point | target_id |
+---------------+----------------------+-------------+-------------+--------------------------+
| NUCLEO_F103RB | NUCLEO_F103RB[0] | COM26:9600 | D: | 07000221D12A3E0FEE6EA832 |
+---------------+----------------------+-------------+-------------+--------------------------+
mbedgt: test case filter (specified with -n option)
test filtered in 'tests-mbed_drivers-echo'
mbedgt: running 1 test for platform 'NUCLEO_F103RB' and toolchain 'GCC_ARM'
use 1 instance of execution threads for testing
mbedgt: checking for 'host_tests' directory above image directory structure
found 'host_tests' directory in: 'TESTS\host_tests'
mbedgt: selecting test case observer...
calling mbedhtrun: mbedhtrun -m NUCLEO_F103RB -p COM26:9600 -f "BUILD/tests/NUCLEO_F103RB/GCC_ARM/TESTS/mbed_drivers/echo/echo.bin" -e "TESTS\host_tests" -d D: -C 4 -c shell -t 07000221D12A3E0FEE6EA832
mbedgt: mbed-host-test-runner: started
[1496743800.38][HTST][INF] host test executor ver. 1.1.8
[1496743800.38][HTST][INF] copy image onto target...
[1496743800.38][COPY][INF] Waiting up to 60 sec for '07000221D12A3E0FEE6EA832' mount point (current is 'D:')...
1 file(s) copied.
[1496743808.29][HTST][INF] starting host test process...
[1496743809.16][CONN][INF] starting connection process...
[1496743809.16][CONN][INF] notify event queue about extra 60 sec timeout for serial port pooling
[1496743809.16][CONN][INF] initializing serial port listener...
[1496743809.16][PLGN][INF] Waiting up to 60 sec for '07000221D12A3E0FEE6EA832' serial port (current is 'COM26')...
[1496743809.19][HTST][INF] setting timeout to: 60 sec
[1496743809.73][SERI][INF] serial(port=COM26, baudrate=9600, read_timeout=0.01, write_timeout=5)
[1496743809.76][SERI][INF] reset device using 'default' plugin...
[1496743810.01][SERI][INF] waiting 1.00 sec after reset
[1496743811.01][SERI][INF] wait for it...
[1496743811.05][SERI][TXD] mbedmbedmbedmbedmbedmbedmbedmbedmbedmbed
[1496743811.05][CONN][INF] sending up to 2 __sync packets (specified with --sync=2)
[1496743811.05][CONN][INF] sending preamble '29c6f0df-fe5a-43e9-a149-534722c7569f'
[1496743811.09][SERI][TXD] {{__sync;29c6f0df-fe5a-43e9-a149-534722c7569f}}
[1496743812.10][CONN][INF] resending new preamble '7f49e61d-8986-4d7a-b0c8-48fe59ba9511' after 1.00 sec
[1496743812.14][SERI][TXD] {{__sync;7f49e61d-8986-4d7a-b0c8-48fe59ba9511}}
[1496743870.02][HTST][INF] test suite run finished after 60.83 sec...
[1496743870.03][CONN][INF] received special even '__host_test_finished' value='True', finishing
[1496743870.04][HTST][INF] CONN exited with code: 0
[1496743870.04][HTST][INF] No events in queue
[1496743870.04][HTST][INF] stopped consuming events
[1496743870.04][HTST][INF] host test result(): None
[1496743870.04][HTST][WRN] missing __exit event from DUT
[1496743870.05][HTST][WRN] missing __exit_event_queue event from host test
[1496743870.05][HTST][ERR] missing __exit_event_queue event from host test and no result from host test, timeout...
[1496743870.05][HTST][INF] calling blocking teardown()
[1496743870.05][HTST][INF] teardown() finished
[1496743870.05][HTST][INF] {{result;timeout}}
mbedgt: checking for GCOV data...
mbedgt: mbed-host-test-runner: stopped and returned 'TIMEOUT'
mbedgt: test case summary event not found
no test case report present, assuming test suite to be a single test case!
test suite: tests-mbed_drivers-echo
test case: tests-mbed_drivers-echo
mbedgt: test on hardware with target id: 07000221D12A3E0FEE6EA832
mbedgt: test suite 'tests-mbed_drivers-echo' ......................................................... TIMEOUT in 70.80 sec
test case: 'tests-mbed_drivers-echo' ......................................................... ERROR in 70.80 sec
mbedgt: test case summary: 0 passes, 1 failure
mbedgt: all tests finished!
mbedgt: shuffle seed: 0.5906039481
mbedgt: test suite report:
+-----------------------+---------------+-------------------------+---------+--------------------+-------------+
| target | platform_name | test suite | result | elapsed_time (sec) | copy_method |
+-----------------------+---------------+-------------------------+---------+--------------------+-------------+
| NUCLEO_F103RB-GCC_ARM | NUCLEO_F103RB | tests-mbed_drivers-echo | TIMEOUT | 70.8 | shell |
+-----------------------+---------------+-------------------------+---------+--------------------+-------------+

@jeromecoutant
Copy link
Collaborator Author

@bcostm
Copy link
Contributor

bcostm commented Jun 6, 2017

OS2 tests are also in TIMEOUT for all L4 devices

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 6, 2017

Thanks for the report, I'll test one of the devices . I test today mbed os 5 test, not mbed 2. will check. What is interesting it is all toolchains and some devices.

All tests also (rtos, or even baremetal for mbed 2 ) ?

@LMESTM
Copy link
Contributor

LMESTM commented Jun 6, 2017

Yes, also bare metal mbed2 tests are showing the problem.
It looks like we end up in the Default_Handler - has there changes in CMSIS concerning interrupts ? vector ?

@LMESTM
Copy link
Contributor

LMESTM commented Jun 6, 2017

@bulislaw @0xc0170
1 update, as we end up in Default_Handler I suspected an issue with the interrupt vector.

In #4294, there is a change to move all platforms from target specific implementation to CMSIS implementation.
b97ffe8

But those implementations are not the same. I actually moved back to the previous implementation for my test target and I can start my OS2 basic tests again (tested on cortex-M4, NUCLEO_L476RG)

Also all targets did not have the same implementation in the target specific file. Can you explain how CMSIS update is supposed to cope with this difference between target specific implementations to align every target to reference CMSIS one ?

The target specific one seems to be in charge of
// Copy and switch to dynamic vectors if the first time called
not sure where this is supposed to be done now ...

... to be continued. feedback welcome

@LMESTM
Copy link
Contributor

LMESTM commented Jun 7, 2017

@bulislaw @0xc0170 I'd need your feedback / help
I found out that the copy is now supposed to be done in mbed_cpy_nvic from mbed_boot.c file.
But I'm not sure how and when it will actually be called
in case of MBED2 test ?
also because it is conditionally call only if
#if !(defined(FEATURE_UVISOR) && defined(TARGET_UVISOR_SUPPORTED))
so what if TARGET_UVISOR_SUPPORTED is not defined ? Is this supposed to be defined for all targets ?

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 7, 2017

@LMESTM I am looking at this. One related issue is also : #4486

The nvic copy was moved and have to check if we did not keep it in the mbed 2 code. It might be the cas,e I'll fix it and provide it also and test. I started looking at it yesterday however was having tools issues to setup the test cases to debug the timeouts (were able to reproduce).

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 7, 2017

@LMESTM Thanks for the description above. Seems like this is the issue, and there might be more. I am currently reviewing all previous vtor reallocations.

To align with all these, we should provide default implementation for vtor reallocation (as it is in mbed 5), and targets that do not support it, should provide own implementation (do not define NVIC_RAM_VECTOR_ADDRESS in case its non cortex-m0). This is up to a target, therefore the startup should invoke this function

Plus if needed, overwrite the default NVIC_Set/GetVectors functions use these 2 macros:

            "CMSIS_VECTAB_VIRTUAL",
            "CMSIS_VECTAB_VIRTUAL_HEADER_FILE=\"cmsis_nvic.h\""

Does this clear the air a bit?

@LMESTM
Copy link
Contributor

LMESTM commented Jun 7, 2017

@0xc0170 ok thanks.waiting for the outputs of your review?

To align with all these, we should provide default implementation for vtor reallocation (as it is in mbed 5),

So you'll add-up this implementation ? and make it called in case of MBED2 as well ?
Even in MBED5, I think this is not called for now because of the UVISOR related compilation switches.

and targets that do not support it, should provide own implementation (do not define NVIC_RAM_VECTOR_ADDRESS in case its non cortex-m0). This is up to a target, therefore mbed sdk init should HAL should invoke own vtor realloc function.

This part is not so clear yet. Maybe you'll provide more details about the default implementation and list of targets that do not support this default implementation (and where the hook will be)

Plus if needed, overwrite the default NVIC_Set/GetVectors functions use these 2 macros:
"CMSIS_VECTAB_VIRTUAL",
"CMSIS_VECTAB_VIRTUAL_HEADER_FILE="cmsis_nvic.h""
Does this clear the air a bit?

Id' prefer to avoid this if possible.

Do you think all of the above points will be solved in short term ? Or do you plan to revert the CMSIS5 branch in the meantime ?

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 8, 2017

I got a default implementatin that I send soon, that should fix lot of targets. The rest needs to be investigated, I'll provide some details here so we can find a solution.

(I was debugging failures since yesterday, just found another issue that I'll address separately)

@jeromecoutant
Copy link
Collaborator Author

Thx
You should also find how this big issue could pass the CI without any failure...

@LMESTM
Copy link
Contributor

LMESTM commented Jun 9, 2017

Related PRs:
#4511
#4506
#4503

@LMESTM
Copy link
Contributor

LMESTM commented Jun 12, 2017

@0xc0170 -today I tested on master after the list of related PRs were merged. MBED2 tests boot ok on NUCLEO_L476RG as I reported last week, but I still fail to boot on NUCLEO_F334R8. If I roll back before CMSIS5 update, this is ok again ...

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 13, 2017

@0xc0170 -today I tested on master after the list of related PRs were merged. MBED2 tests boot ok on NUCLEO_L476RG as I reported last week, but I still fail to boot on NUCLEO_F334R8.

mbed 2 boot for NUCLEO_F334R8 ?

@LMESTM
Copy link
Contributor

LMESTM commented Jun 13, 2017

mbed 2 boot for NUCLEO_F334R8 ?

Not only. I also started automated tests on my analogout branch yesterday
The CI shield tests run on NUCLEO_F303ZE on our test bench all timed out.
I rebased mbed back before #4294 and the tests were then OK.
Edit: this was using ARM toolchain.

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 13, 2017

Quick debug session shows the callstack as:

SystemInit -> HAL TickInit -> Nvic SetVector -> hardfault

As I noticed VTOR points to the flash area (0x0800 0000 address) thus writing to it results in the hard fault). Is that correct?

We are looking at this, and will provide a solution here. The ideal would be not to setup ISR before even other things are setup (sdk init, rtx init, heap/stack set). As I recall TickInit is called in SystemInit because of C++ ctor ? Or is there any other reason? I'll go into git history to refresh my memory.

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 13, 2017

cc @c1728p9

@LMESTM
Copy link
Contributor

LMESTM commented Jun 13, 2017

As I recall TickInit is called in SystemInit because of C++ ctor ? Or is there any other reason? I'll go into git history to refresh my memory.

Yes. HAL_TickInit needs to be called before C++ ctors

@c1728p9
Copy link
Contributor

c1728p9 commented Jun 13, 2017

I did some digging and found that HAL_Init is called twice, both before C++ global constructors are called. HAL_Init is called in both SystemInit and in sdk_init. The call to HAL_Init in SystemInit is what causes many devices to crash. Can this be removed?

@c1728p9
Copy link
Contributor

c1728p9 commented Jun 13, 2017

Looking at version of the F1 cube library 1.4.0 (V4.1.0 in the source code) there isn't a call to HAL_Init in SystemInit. It looks like the one in mbed is F1 cube V1.5.0 (V4.2.0 in the souce code). Do you know where I can find this version? The latest I can download is 1.4.0. Was the addition of HAL_Init to SystemInit done in version 1.5.0, or are these mbed-os specific changes?

@c1728p9
Copy link
Contributor

c1728p9 commented Jun 13, 2017

@c1728p9
Copy link
Contributor

c1728p9 commented Jun 13, 2017

Created PR #4543 for this. Let me know if you have any feedback on it

@jeromecoutant
Copy link
Collaborator Author

Hi

It looks like the one in mbed is F1 cube V1.5.0 (V4.2.0 in the souce code). Do you know where I can find this version? The latest I can download is 1.4.0.

Yes, you are right. V1.5.0 is official but not public yet... We have introduced it in advance in MBED as we needed Low Layers drivers which are introduced in this version for this F1 family.

Was the addition of HAL_Init to SystemInit done in version 1.5.0, or are these mbed-os specific changes?

MBED specific change.

@jeromecoutant
Copy link
Collaborator Author

Hi
Last status with the master branch:

  • Seems OK for OS5 tests for all families, all tool chains
  • For OS2, seems that uARM doesn't work...

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 21, 2017

Thanks @jeromecoutant , I'll retest uARM. Any specific target you tested so I can reproduce quickly?

Doesn't work ? Not reaching main or what is the error?

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 21, 2017

@jeromecoutant I can't reproduce. uARM for Nucleo L476RG - MBED_A21 - works for me. latest master. Please provide more details, otherwise we can't reproduce anything neither to fix

@jeromecoutant
Copy link
Collaborator Author

ex with the nucleo you have:

| OK | NUCLEO_L476RG | uARM | DTCT_1 | Simple detect test | 0.53 | 10 | 1/1 |
| OK | NUCLEO_L476RG | uARM | EXAMPLE_1 | /dev/null | 3.45 | 20 | 1/1 |
| OK | NUCLEO_L476RG | uARM | MBED_10 | Hello World | 0.39 | 5 | 1/1 |
| TIMEOUT | NUCLEO_L476RG | uARM | MBED_11 | Ticker Int | 60.28 | 30 | 0/1 |
| OK | NUCLEO_L476RG | uARM | MBED_12 | C++ | 1.41 | 10 | 1/1 |
| FAIL | NUCLEO_L476RG | uARM | MBED_16 | RTC | 10.4 | 20 | 0/1 |
| OK | NUCLEO_L476RG | uARM | MBED_2 | stdio | 0.8 | 20 | 1/1 |
| TIMEOUT | NUCLEO_L476RG | uARM | MBED_23 | Ticker Int us | 60.27 | 30 | 0/1 |
| TIMEOUT | NUCLEO_L476RG | uARM | MBED_24 | Timeout Int us | 60.28 | 30 | 0/1 |
| TIMEOUT | NUCLEO_L476RG | uARM | MBED_25 | Time us | 30.36 | 15 | 0/1 |
| OK | NUCLEO_L476RG | uARM | MBED_26 | Integer constant division | 1.39 | 20 | 1/1 |
| TIMEOUT | NUCLEO_L476RG | uARM | MBED_34 | Ticker Two callbacks | 60.28 | 30 | 0/1 |
| IOERR_SERIAL | NUCLEO_L476RG | uARM | MBED_37 | Serial NC RX | 6.43 | 20 | 0/1 |
| OK | NUCLEO_L476RG | uARM | MBED_38 | Serial NC TX | 5.94 | 20 | 1/1 |
| OK | NUCLEO_L476RG | uARM | MBED_A1 | Basic | 1.37 | 20 | 1/1 |
| OK | NUCLEO_L476RG | uARM | MBED_A21 | Call function before main (mbed_main) | 1.41 | 20 | 1/1 |
| TIMEOUT | NUCLEO_L476RG | uARM | MBED_A28 | CAN loopback test | 40.37 | 20 | 0/1 |
| OK | NUCLEO_L476RG | uARM | MBED_A30 | CAN API | 1.39 | 20 | 1/1 |
| TIMEOUT | NUCLEO_L476RG | uARM | MBED_A9 | Serial Echo at 115200 | 40.43 | 20 | 0/1 |
| TIMEOUT | NUCLEO_L476RG | uARM | MBED_BUSOUT | BusOut | 60.32 | 30 | 0/1 |

@jeromecoutant
Copy link
Collaborator Author

Hi any updates about uARM issue ?
Thx

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 29, 2017

I'll have a look soon to reproduce the above timeouts.

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 29, 2017

I believe I found it, @jeromecoutant , this was once discovered : https://github.com/ARMmbed/mbed-os/pull/2160/files (you can read about microlib not having post stack/heap hook) so was in retarged open called. This would explain what I am seeing, nvic is not copied neither mbed sdk init called.

I'll send a patch shortly for review

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 29, 2017

Done, look at #4671 please

@jeromecoutant
Copy link
Collaborator Author

Non regression tests with last label are back to a good level
Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants