-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite PicoW LWIP interface, major stability increase #916
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes random crashes while on the PicoW with WiFi. The system malloc/etc. is not re-entrant, so it is not safe to do any memory (de)allocation from an interrupt. The LWIP stack is configured not to use the memory manager, but it may end up calling callback functions in userspace which may well use new or malloc() (i.e. to manage a new connection info or process a UDP packet like in MDNS). This could cause unpredicatable crashes due to memory corruption at some random time after the interrupt/malloc occurred. Avoid the issue completely by locking interrupts out while in the memory manager. Inter-core will be protected by the malloc_lock already in Newlib. This should not have any effect on absolute performance of the system.
Brute-force protect all LWIP calls from re-rentrancy
Was causing memory corruption during long, high frequency, multi-client workloads. It is illegal to call netif->input() from an IRQ while in LWIP code. For now, throw them out and let upper layer handle a re-transmit.
A |
earlephilhower
force-pushed
the
nogoboomnow
branch
from
October 15, 2022 05:54
aaa8f27
to
b581ff1
Compare
earlephilhower
force-pushed
the
nogoboomnow
branch
from
October 15, 2022 05:56
b581ff1
to
8785bab
Compare
During a ClientContext we set TCP connection arguments to nullptr and then turn off all the callbacks. It is possible for an interrupt to occur between setting the arg nullptr and the clearing of all CBs. The the CB function will get nullptr as (this), causing a crash like: ```` Thread 1 received signal SIGTRAP, Trace/breakpoint trap. isr_hardfault () at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/rp2_common/pico_standard_link/crt0.S:98 98 decl_isr_bkpt isr_hardfault (gdb) where e/Arduino/hardware/pico/rp2040/libraries/WiFi/src/include/ClientContext.h:619 ries/WiFi/src/include/ClientContext.h:612 raries/WiFi/src/include/ClientContext.h:671 ) at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/lib/lwip/src/core/tcp_in.c:541 ) at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/lib/lwip/src/core/ipv4/ip4.c:743 rdware/pico/rp2040/pico-sdk/lib/lwip/src/netif/ethernet.c:186 buf=0x20001b7a <cyw43_state+226> "(\315\301") at /home/earle/Arduino/hardware/pico/rp2040/libraries/lwIP_CYW43/src/utility/CYW43shim.cpp:132 2040/pico-sdk/lib/cyw43-driver/src/cyw43_ll.c:1212 cores/rp2040/lwip_wrap.cpp:163 e/ClientContext.h:85 /libraries/WiFi/src/WiFiClient.cpp:81 /libraries/WiFi/src/WiFiClient.cpp:78 rver.cpp:284 ico/rp2040/libraries/WebServer/src/WebServerTemplate.h:86 ```` Avoid by checking and silently ignoring CBs who have a nullptr arg.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rewrite PicoW LWIP interface for major stability increase
Random crashes, infinite loops, and other lockups were happening to the PicoW
while under high load from multiple clients.
This seems to have been due to two issues:
the core was in LWIP code. LWIP is not re-entrant or multi-cire/thread safe
so this is a bad thing. Some calls may not have been locked with a manual
addition of the LWIPMutex object to hit this.
legal in an interrupt, but actually calling netif->input() from an IRQ to
queue up the Ethernet packet for processing is illegal if LWIP is already
in progress.
Rearchitect the LWIP interface to fix these problems:
periodic LWIP timeout check interrupting and potentially calling user
code which did a memory operation
manually eyeballing and adding in protection in user code.
just throw the packet away for now. The upper layers can handle retransmit.
(A possible optimization would be to set the packet aside and add a
housekeeping portion to the LWIP wrappers to netif->input() them when safe).
passed from LWIP == nullptr