Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF24 library does not use interrupt and starts polling when waiting for data to be sent #877

Closed
stefan123t opened this issue Nov 4, 2022 · 6 comments · Fixed by #947
Closed

Comments

@stefan123t
Copy link

stefan123t commented Nov 4, 2022

Thanks for your great library and endless effort you put into maintaining this!

Please read about common issues first. It addresses the most common problems that people have (whether they know it or not).
We have a very short and concise ISR callback which sets a flag only.

Describe the bug

The library is used to send and receive datagrams to our Solar PV inverter (Hoymiles, TSUN and MBOG brands) via NRF24L01+ at 250kBps data rate. We use the IRQ output of the NRF module to trigger our receive code. During night time there is little chance that the solar inverter is able to answer our requests and so only sending of packets occurs. This is when we see for sure a lot of communication via the SPI which according to our analysis looks like constant polling of the status register 0x07 whether the send buffer is clear for the next datagram to be sent.

Please include:

  1. Code to reproduce
    For code you can see the our issue at NRF24 polling trotz aktiviertem IRQ lumapu/ahoy#83 (sorry mostly german)
    The code is under the tools/esp8266 path in a platformio project https://github.com/lumapu/ahoy/tree/main/tools/esp8266
    If you need to see any specific code we can answer you in our issue there or link the relevant sections from here.

  2. Expected behaviour
    We would assume that the IRQ is used for both sending and receiving.
    Apparently the Interrupt is only triggered when new messages are received from the NRF24 module.
    But not when we wait for the send buffer to be emptied / processed.
    Here constant polling by querying the status register command 0x07 is used.
    See the following screenshot by our project lead @lumapu who traced this behaviour using his oszilloscope:
    trace of SPI during sending only times

  3. What device(s) are you using? Please specify make, model, and Operating System if applicable.
    We use nRF24L01+ modules for the 250kBps low data rate which has a higher yield to travel far enough reaching the manufacturers inverters. The high data rates of 1MBps and 2MBps are not supported by the inverters firmware. We recommend our users using LNA+PA modules with external antennase which usually work fine in PA_MIN / PA_LOW mode. Whereas the modules with circuit board antennas may require PA_MAX / PA_HIGH to send / receive at the same distance. We also recommend our users to stabilize voltage during sending on the NRF24 modules VCC / GND pins 1&2 using a electrolytic capacitor ~47..100uF.
    On the MCU side we use ESP8266 modules (NodeMCU v3 and Wemos D1 mini / Pro) as well as ESP32 modules.

Additional context

The problem occurs when there is a lot of sending and the library starts to poll whether the send buffer of the NRF24L01+ module has been emptied. There are different interrupts which could be enabled according to the Nordic Semiconductor data sheets which would allow the Interrupt to be used for both Sending & Receiving as far as we investigated.

@TMRh20
Copy link
Member

TMRh20 commented Nov 4, 2022

If you want to use interrupts for sending, you can use the startWrite() function The normal write() function will poll until data is sent, but startWrite() will just write the packet to the FIFO buffer and return you to your code. You can then use interrupts to determine if the packet was sent succcessfully or not.

@2bndy5
Copy link
Member

2bndy5 commented Nov 4, 2022

This will sound like a info dump, but I really don't know the exact cause of the problem here. I figure if I just put everything on the table, something might lead to a solution 🤷🏼‍♂️ .

Constant "polling" of the status register (0x07) indicates (to me) that whatHappend() is getting called constantly or the app is stuck constantly transmitting. There are few places where we actually write to the 0x07 offset. Reading the data from that register would actually look like 0x27 over MOSI. In fact, we usually get the STATUS byte from the 0x07 offset using the radio's non-op command (the 0xFF on MOSI) because we get the STATUS byte quicker that way (full duplex SPI transactions). The only time we need to write to the 0x07 offset is to reset the IRQ flags, which is done during most write methods and in whatHappened(). Since your app disabled auto-ack, its hard to tell if it is stuck transmitting or constantly calling whatHappened(). With auto-ack enabled, write() would spam the radio with non-op commands until the auto-ack was returned from the receiver or the max auto-retries count was reached.

After calling whatHappened(), the IRQ pin should reset until triggered by another event. If there is another event that triggers the IRQ immediately, then this could lead to constant polling of the 0x07 offset. However, I see your project's hmRadio.h file calls maskIRQ(true, true, false), so it is unlikely that another event is getting triggered immediately. I don't know much about disabling ISRs on the ESP8266, but I would double check the macros your project is using to manipulate the MCU (DISABLE_IRQ and RESTORE_IRQ).

problems I noticed with the code

I see your project is using the *etPayloadSize() functions (for statically sized payloads), but you have dynamic payloads enabled. This is erroneous if the received payload is not exactly 32 bytes. It would be better to use getDynamicPayloadSize() because that will tell you the actual size of the payload you're about to read from the RX FIFO. So, if the following snippet seems erroneous to me:

            mNrf24.setPayloadSize(MAX_RF_PAYLOAD_SIZE); // not used for dynamic payloads
            mNrf24.enableDynamicPayloads(); // payload size will be the amount of data passed to write*()

                        len = mNrf24.getPayloadSize(); // Does nothing over SPI; returns the int from setPayloadSize()
                        if(len > MAX_RF_PAYLOAD_SIZE)  // ??
                            len = MAX_RF_PAYLOAD_SIZE; // should never get executed

I also found this comment which makes me think there's wiring/connection problems as well. It is possible that long wires can cause data to get corrupted in transit from radio to MCU (or vice versa).
Furthermore,

            if(!mNrf24.isChipConnected()) {
                DPRINTLN(DBG_WARN, F("WARNING! your NRF24 module can't be reached, check the wiring"));
            }

this warning could be detected earlier using RF24::begin() instead of RF24::isChipConnected(), just FYI.

@stefan123t
Copy link
Author

Thanks for the responses we will follow the suggested ideas and will come back to you.

@TMRh20
Copy link
Member

TMRh20 commented Feb 18, 2024

Closing, all related issues appear to be closed. Please update if further info etc needed.

@TMRh20 TMRh20 closed this as completed Feb 18, 2024
@stefan123t
Copy link
Author

As far as I followed and understood the solution now is to use startFastWrite() instead of startWrite(). The reason being that ACKs were off for some considerable time when switching between write and read mode using startWrite(). When using startFastWrite() this seems to be much quicker transitioning from write to read mode.
So switching between the modes left the PA+LNA switched off which resulted in lost ACK packets and therefor led to retransmissions.

Maybe @lumapu or @tictrick can comment on the solution which was merged downstream in lumapu/ahoy#1414

@TMRh20 & @2bndy5 thanks for your valuable insights and suggestions!

I do not know if you want to add some warning about this switching behaviour in normal mode when ACK is activated, some caveat notes to the documentation or simply make startFastWrite() the recommended default upstream ?

@TMRh20
Copy link
Member

TMRh20 commented Feb 19, 2024

Re-opening issue as a reminder to put more info in the docs regarding the difference between write() functions. We have a pinned issue too, so obviously this is an issue to note better.

@TMRh20 TMRh20 reopened this Feb 19, 2024
@TMRh20 TMRh20 self-assigned this Feb 24, 2024
TMRh20 added a commit that referenced this issue Feb 24, 2024
- Update comment regarding troubleshooting and CE Pin
- Add info on the different write() functions

#816 #877
TMRh20 added a commit that referenced this issue Feb 24, 2024
* Update COMMON_ISSUES re: write() functions

- Update comment regarding troubleshooting and CE Pin
- Add info on the different write() functions

#816 #877

* Formatting & IRQ info
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants