You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The interrupt examples use an enableInterrupt variable to suppress interrupt handling while the result of a previous interrupt is handled and a new request (TX/RX/CAD/etc) is being configured. Typically, the flow is something like:
if (IrqFlag) { // Check flag set by ISR
enableInterrupt = false;
irqFlag = false;
... // Handle event
state = radio.startChannelScan();
... // print state
enableInterrupt = true;
}
Here, a race condition occurs when the IRQ is triggered after the call to e.g. startChannelScan() and before enableInterrupt is set again. Usually this does not happen, but since timings like these can vary in different circumstances, there is a window for problems. And if this problem occurs, the IRQ will not be handled and the code will lock up, indefinitely waiting for the IRQ that already happenedd.
On the STM32WL with CAD I've ran into this problem in practice. I've seen that the IRQ happens after around 20ms or so (so that's a lot quicker than a TX and, in most cases an RX interrupt) and (for some reason I have not investigated), the "print state" step is usually fast but consistently takes around 50ms on the 18th (or so) try.
To confirm this is wat happened, I applied the following patch:
--- a/examples/STM32WLx/STM32WLx_Channel_Activity_Detection_Interrupt/STM32WLx_Channel_Activity_Detection_Interrupt.ino+++ b/examples/STM32WLx/STM32WLx_Channel_Activity_Detection_Interrupt/STM32WLx_Channel_Activity_Detection_Interrupt.ino@@ -47,6 +47,9 @@ void setup() {
Serial.print(F("failed, code "));
Serial.println(state);
}
+ pinMode(2, OUTPUT);+ pinMode(3, OUTPUT);+ pinMode(4, OUTPUT);
}
// flag to indicate that a packet was detected or CAD timed out
@@ -60,6 +63,7 @@ volatile bool enableInterrupt = true;
// IMPORTANT: this function MUST be 'void' type
// and MUST NOT have any arguments!
void setFlag(void) {
+ digitalWrite(3, !digitalRead(3));
// check if the interrupt is enabled
if(!enableInterrupt) {
return;
@@ -67,6 +71,8 @@ void setFlag(void) {
// something happened, set the flag
scanFlag = true;
++ digitalWrite(4, !digitalRead(4));
}
void loop() {
@@ -99,6 +105,7 @@ void loop() {
// start scanning the channel again
Serial.print(F("[STM32WLx] Starting scan for LoRa preamble ... "));
+ digitalWrite(2, HIGH);
state = radio.startChannelScan();
if (state == RADIOLIB_ERR_NONE) {
Serial.println(F("success!"));
@@ -106,6 +113,7 @@ void loop() {
Serial.print(F("failed, code "));
Serial.println(state);
}
+ digitalWrite(2, LOW);
// enable interrupt service routine
enableInterrupt = true;
Which produced the following trace on my logic analyzer:
In other words: slowness in serial printing (which might happen for various reasons, especially if USB Serial is involved) causes the code to lock up.
To Reproduce
This can be easily reproduced by taking e.g. the SX126x_Channel_Activity_Detection_Interrupt example and adding a delay to fake slow serial printing:
Alternatively, slow down serial printing by repeating "success" a few times (this depends on the board and serial driver, I've tested this on AVR HardwareSerial (Arduino Uno), it might not work on native USB serial ports or serial ports with larger buffers).
To fix
To fix, the enableInterrupt = true line should be moved up. At least before the serial printing, but that still leaves a small race condition (e.g. when some other IRQ happens at the right moment that takes 20ms). To really fix this, enableInterrupt must be before the call to startChannelScan(), but I'm not sure if this would introduce other problems.
In general, I wonder if enableInterrupt is really needed at all, though. If the hardware guarantees that only one IRQ is triggered for every operation requested, then I think there would be no need to suppress interrupts? But I guess it was introduced for a reason....
The text was updated successfully, but these errors were encountered:
In general, I wonder if enableInterrupt is really needed at all, though.
You're corect, it is not needed. I think the original idea was in Receive_Interrupt, to make sure reception of a second packet doesn't start before the previous one was processed. But that doesn't really make sense, since to receive a packet, the device would first have to set back to Rx mode e.g. by calling startReceive(), which obviously cannot happen until the first packet is processed. I guess I then cargo-culted this flag variable into every other interrupt example.
I tested all the examples for SX127x without the enable flag, and they are working correctly. I was also able to force the race condition by including a delay, and fix it by removeing the enable flag.
Thanks for reporting this, I will remove the enable flag from all the examples.
Describe the bug
The interrupt examples use an
enableInterrupt
variable to suppress interrupt handling while the result of a previous interrupt is handled and a new request (TX/RX/CAD/etc) is being configured. Typically, the flow is something like:Here, a race condition occurs when the IRQ is triggered after the call to e.g.
startChannelScan()
and beforeenableInterrupt
is set again. Usually this does not happen, but since timings like these can vary in different circumstances, there is a window for problems. And if this problem occurs, the IRQ will not be handled and the code will lock up, indefinitely waiting for the IRQ that already happenedd.On the STM32WL with CAD I've ran into this problem in practice. I've seen that the IRQ happens after around 20ms or so (so that's a lot quicker than a TX and, in most cases an RX interrupt) and (for some reason I have not investigated), the "print state" step is usually fast but consistently takes around 50ms on the 18th (or so) try.
To confirm this is wat happened, I applied the following patch:
Which produced the following trace on my logic analyzer:
In other words: slowness in serial printing (which might happen for various reasons, especially if USB Serial is involved) causes the code to lock up.
To Reproduce
This can be easily reproduced by taking e.g. the SX126x_Channel_Activity_Detection_Interrupt example and adding a delay to fake slow serial printing:
Alternatively, slow down serial printing by repeating "success" a few times (this depends on the board and serial driver, I've tested this on AVR HardwareSerial (Arduino Uno), it might not work on native USB serial ports or serial ports with larger buffers).
With either of these patches, the sketch can do CAD once and then locks up:
To fix
To fix, the
enableInterrupt = true
line should be moved up. At least before the serial printing, but that still leaves a small race condition (e.g. when some other IRQ happens at the right moment that takes 20ms). To really fix this,enableInterrupt
must be before the call tostartChannelScan()
, but I'm not sure if this would introduce other problems.In general, I wonder if
enableInterrupt
is really needed at all, though. If the hardware guarantees that only one IRQ is triggered for every operation requested, then I think there would be no need to suppress interrupts? But I guess it was introduced for a reason....The text was updated successfully, but these errors were encountered: