-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
murdock: enable esp32-wroom-32 for CI testing #11449
murdock: enable esp32-wroom-32 for CI testing #11449
Conversation
a2c392e
to
7e95b97
Compare
hrmpf, the reset circuitry isn't working properly. the boards needs capacitors between EN and GND. :( |
tests/shell/tests/01-run.py
Outdated
'\t 1 | pending Q | 15', | ||
'\t 2 | running Q | 7' | ||
r'\s+pid | state Q | pri', | ||
r'\s+1 | pending Q |\s+\d+', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to ESP32? SCHED_PRIO_LEVELS
seems to be not architecture specific but a common variable that might change from application to application as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ESP32 related. I'll cut out the commit into another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the esp Makefile states # if any WiFi interface is used, the number of priority levels has to be 32
.
Do you remember the reason why?
A couple of tests fail because the priorities change with SCHED_PRIO_LEVELS=32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, number of priority levels is simply to small if esp_wifi
is used. If esp_wifi
is used there are 3 additional high-priority threads:
- WiFi hardware handler
wifi
with priority 1 - WiFi event loop
wifi-event-loop
with priority 4 which is defined by the ESP32 SDK. - WiFi netdev
netdev-esp-wifi
with priority 10 which is simply the priority defined byGNRC_NETIF_PRIO = (THREAD_PRIORITY_MAIN - 5)
.
> ps
pid | name | state Q | pri | stack ( used) | base addr | current
- | isr_stack | - - | - | 2048 ( 400) | 0x3ffb0220 | 0x3ffb0a20
1 | wifi-event-loop | bl rx _ | 4 | 2100 ( 1020) | 0x3ffaee18 | 0x3ffaf430
2 | idle | pending Q | 31 | 2048 ( 448) | 0x3ffb4b80 | 0x3ffb51e0
3 | main | running Q | 15 | 3072 ( 1616) | 0x3ffb5380 | 0x3ffb5d00
4 | pktdump | bl rx _ | 14 | 3072 ( 560) | 0x3ffb9da0 | 0x3ffba770
5 | ipv6 | bl rx _ | 12 | 2048 ( 752) | 0x3ffb7890 | 0x3ffb7e40
6 | udp | bl rx _ | 13 | 2048 ( 576) | 0x3ffbb3f0 | 0x3ffbb9b0
7 | wifi | bl rx _ | 1 | 3636 ( 1848) | 0x3ffbe004 | 0x3ffbec10
8 | netdev-esp-wifi | bl rx _ | 10 | 2048 ( 848) | 0x3ffb6490 | 0x3ffb6a10
9 | RPL | bl rx _ | 13 | 2048 ( 492) | 0x3ffbabec | 0x3ffbb200
| SUM | | | 24168 ( 8560)
If you would use the default number of priority levels, THREAD_PRIORITY_MAIN
would be 7 and GNRC_NETIF_PRIO
would be 2. This could lead to a deadlock situations because upper layer netdev-esp-wifi
has higher priority than lower layer thread wifi-event-loop
which is dealing with events from hardware driver thread wifi
. At least, there will be instabilities if thread wifi-event-loop
can't handle events from WiFi hardware just in time.
Following log shows the priorities with default number of priority levels.
> ps
pid | name | state Q | pri | stack ( used) | base addr | current
- | isr_stack | - - | - | 2048 ( 400) | 0x3ffb0220 | 0x3ffb0a20
1 | wifi-event-loop | bl rx _ | 4 | 2100 ( 1020) | 0x3ffaee18 | 0x3ffaf430
2 | idle | pending Q | 15 | 2048 ( 416) | 0x3ffb4b40 | 0x3ffb51a0
3 | main | running Q | 7 | 3072 ( 1616) | 0x3ffb5340 | 0x3ffb5cc0
4 | pktdump | bl rx _ | 6 | 3072 ( 560) | 0x3ffb9d60 | 0x3ffba730
5 | ipv6 | bl rx _ | 4 | 2048 ( 752) | 0x3ffb7850 | 0x3ffb7e00
6 | udp | bl rx _ | 5 | 2048 ( 576) | 0x3ffbb3b0 | 0x3ffbb970
7 | wifi | bl rx _ | 1 | 3636 ( 1864) | 0x3ffbdfc4 | 0x3ffbebd0
8 | netdev-esp-wifi | bl rx _ | 2 | 2048 ( 848) | 0x3ffb6450 | 0x3ffb69d0
9 | RPL | bl rx _ | 5 | 2048 ( 492) | 0x3ffbabac | 0x3ffbb1c0
| SUM | | | 24168 ( 8544)
It seems that there are too many threads with same priority. I don't know remember where, but I have read anywhere in Wiki or the online doc that each thread should have it's own priority since threads with same priority don't lead to a context switch as long as both are in state pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I have seen that there is still an inconsistency in cpu/esp32/Makefile.include
. SCHED_PRIO_LEVELS
is always set to 32.
CFLAGS += -DSCHED_PRIO_LEVELS=32
...
# if any WiFi interface is used, the number of priority levels has to be 32
ifneq (,$(filter esp_wifi_any,$(USEMODULE)))
CFLAGS += -DSCHED_PRIO_LEVELS=32
endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look to the tests that failed. All of them are tests that don't require module esp_wifi
. So, I would prefer to reactivate the conditional setting of SCHED_PRIO_LEVELS
.
CFLAGS += -DSCHED_PRIO_LEVELS=32
-CFLAGS += -DSCHED_PRIO_LEVELS=32
...
# if any WiFi interface is used, the number of priority levels has to be 32
ifneq (,$(filter esp_wifi_any,$(USEMODULE)))
CFLAGS += -DSCHED_PRIO_LEVELS=32
endif
This should help the tests which insist on certain PIDs and will work for networking applications as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I would prefer to reactivate the conditional setting of
SCHED_PRIO_LEVELS
.
ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did the trick for most tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ps
regex is currently wrong, as |
is a regex character. It should be escaped to have a proper test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are not necessary any longer. After changing the thread priority levels in PR #12753 everything is working without changes.
cpu/esp32/Makefile.include
Outdated
@@ -140,6 +140,8 @@ LINKFLAGS += -Wl,--warn-unresolved-symbols | |||
FLASH_MODE ?= dout # FIX configuration, DO NOT CHANGE | |||
FLASH_FREQ = 40m # FIX configuration, DO NOT CHANGE | |||
FLASH_SIZE ?= 2MB | |||
FLASHFILE ?= $(ELFFILE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For which purpose is this define required? Is it something used by murdock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the "test-murdock" target needs to know which file to send to the RasPi worker nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When setting FLASHFILE
, the flash/preflash targets should actually be using FLASHFILE
instead of ELFFILE
. I have two commits in #8838 that I can PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now has its PR #11648
boards/common/esp32/Makefile.include
Outdated
# used by Murdock | ||
export RESET = dtr $(PORT) 0 && dtr $(PORT) 1 | ||
else | ||
export RESET ?= esptool.py --before default_reset run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it do the job you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not, it doesn't work with the "make test" target. In that use-case, the test terminal program has the tty open, then "make reset" is called. esptool tries to connect to the bootloader instead of just resetting via RTS/DTR, and fails because the test terminal is hijacking the communication.
Also, on the RPi workers, spawning python takes seconds. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there not a python alternative for RIOT that could be included in the repo ?
I mean, flashing is done with python, the tests are run with python, so using python for this would be easier than having to install external programs.
And the RPI could use its own C implementation anyway by overwriting RESET
and RESET_FLAGS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally local testing is not much different from what murdock does. And for Murdock, spawning python is slow. We should maybe just integrate the (5 line) c tool into the RIOT repo.
I guess it depends on the board you really have. The original ESP32 DevKitC V4 which is the master for all clones with ESP32-WROOM-32 modules has a capacitor between EN and GND. |
They look like this, but I got them cheaply from China, so they might be clones. The printing on the backside is a little off.
I tried adding one, but only had 100uF caps lying around. Weirdly, esptool.py can now properly change to boot mode, reset or flash the device and get it's chip id etc. but the user code doesn't run anymore (board outputs only the garbage as seen in this PR's failed tests). |
How exactly looks the garbage? Maybe, there is a timing problem when changing the system clock and the UART hasn't the right baudrate anymore. |
That's how it looked (wrong baudrate). Anyhow, I changed the caps to 2.2uF (from 100uF) and now it seems to work fine! Let's see how many tests run through... |
So was just the timing. |
I've pulled out a general fix for tests/posix_semaphore in #11467. |
@gschorcht could you maybe take over this PR? I think all the CI specific parts should be working. Some tests are still failing for reasons you might be better suited to fix... |
@kaspar030 I took a look to the remaining tests that fail. For most of them it is not really clear to me why they fail. Even though ESP32 produce the right output, the python script runs into timeout. For example,
Even though the test application prints out
|
4a11c1e
to
cf786ee
Compare
might be the line endings ( |
67ac0a9
to
b5a4841
Compare
This really does seem line ending related. Somehow sometimes there's a "0x0a" missing. |
@kaspar030 CI tests succeed. The PR still needs squashing. |
#12816 is just temporarily merged here, but not merged yet. need to wait... |
aa9fdf7
to
2bbb504
Compare
tests/pkg_spiffs failed with timeout... |
It doesn't seem to be related to the test. The log only shows
whatever it means. |
It failed again on |
That means the test took longer than 5 minutes to complete and has been cancelled. Currently Murdock cannot capture the test output in that case. |
I see. Might it be a problem of the ESP32 node on |
Well, the ESP is a new one. Lets hope that Pi isn't eating esp boards... Here's the local uart output of the test application:
|
Ups, that's a crash. Let me check. |
I can reproduce the crash which is new. SPIFFS was definitely working with the last tests. |
It seems that the problem came in with the merge of PR #12810. Even though ESP32 flash driver was touched by this PR, the changes are only changes in log output during intialization. The problem doesn't occur when compiled with debug option. I have to investigate it further. |
This is one of the problems I really hate. The test works with debug compile option, it works in QEMU, it works when board configuration is printed at startup and it works if any Sure, we could enable the output of the startup information as it was before as a workaround for this test. However, this would probably not solve the cause of the problem. My guess is that the problem is caused by an asynchronous UART interrupt while the cache for the program code in SPI flash is disabled due to a write operation to the SPI flash drive. |
@kaspar030 Problem solved. I had to place the code of VFS module in IRAM because the cache for program code in SPI flash memory is disabled during SPI flash write operations. PR #12890 fixes the problem. I have also reduced the test timeouts for |
2bbb504
to
f88d55c
Compare
Nice catch! I've rebased with #12890 merged. |
Wow, all light are green. Do we wanna to merge it now? |
Yeah! I'll let maintainers@ know to watch out for esp32 CI results for a while. |
Good job @gschorcht! |
Contribution description
I've added three esp32-wroom-32 boards to the CI. This PR enables them for testing.
Testing procedure
Let CI run with "run tests" set.
Issues/PRs references
none