drivers/enc28j60: fix of #9784 #9806

gschorcht · 2018-08-21T07:27:15Z

Fixes the problem described in #9784. There are no locks anymore. Stress test can be done by flooding the enc28j60 node with IMCP echo request packets with maximum size of 65.507 byte, e.g.:

sudo ping6 -f fe80::b48a:eeff:fe23:2323 -I eth0 -s 65507

Thanks to @gebart, I could fix the problem by dropping packets. The fix contains following changes:

Packets are dropped in the nd_recv function when buf = NULL and max_len != 0.
According to the EC28J60 data sheet, section 12.0, the host controller should clear the global enable bit for the interrupt pin before servicing the interrupt. This is done now in the on_int function. The global enable interrupt bit is enabled again in the nd_isr function once all pending interrupts were served.
The pending packet interrupt flag (PKTIF) is cleared now explicitly in the nd_isr function.
Additional debugging outputs left in nd_recv and nd_isr functions.

maribu · 2018-08-21T08:49:15Z

@gschorcht: Thanks for your work. It does solve the problem with the infinite loop. Sadly, a different bug seems to be also present. I guess when the on-device RX buffer is overflown, the content needs to be still actively dropped. Below is the log. I didn't have the time to confirm that theory with the datasheet, though. Do you want to investigate that as well? Otherwise, I hope that I will have some time tomorrow or Thursday.

https://gist.github.com/maribu/423b08ab675c553a766b617678ec5ea3

maribu · 2018-08-21T08:50:39Z

Is there a way to add a scrollbar to a long pastes? Dumping huge logs like above makes the thread quite unreadable :-(

aabadie · 2018-08-21T08:52:59Z

Is there a way to add a scrollbar to a long pastes?

You can put the log in a gist and give the link instead.

gschorcht · 2018-08-21T09:43:22Z

@maribu Unfortunatly, I can't reproduce the problem.

Once the packets are arriving faster than they can be processed, function nd_isr remains in the while loop

        do {
            DEBUG("[enc28j60] isr: packet received\n");
            netdev->event_callback(netdev, NETDEV_EVENT_RX_COMPLETE);
        } while (cmd_rcr(dev, REG_B1_EPKTCNT, 1) > 0);

and no other interrupt is handled anymore as long as there are packets in the receive buffer. Normally, the enc28j60 generates an interrupt (EIR_RXERIF) when its receive buffer overflows and arriving packets are dropped. However, since we are in the do ... while loop all the time, this interrupt is not handled. As soon as the traffic stops and RIOT has processed all the packets, the EIR_RXERIF interrupt is handled exactly once. In my test environment, I can see only this one EIR_RXERIF interrupt. After that everything is working as expected.

maribu · 2018-08-21T10:46:12Z

The code related to the remaining problem is here:

RIOT/drivers/enc28j60/enc28j60.c

Lines 474 to 477 in ead6c1f

    
           if (eir & EIR_RXERIF) { 
        
               DEBUG("[enc28j60] isr: incoming packet dropped - RX buffer full\n"); 
        
               cmd_bfc(dev, REG_EIR, -1, EIR_RXERIF); 
        
           }

I think in section 12.1.2 on page 68 in the datasheet the required information is written. RXERIF may occur when:

The RX buffer is overflown, or
The packet counter is overflown

I can confirm the overflow of the packet counter is not the problem by applying:

diff --git a/drivers/enc28j60/enc28j60.c b/drivers/enc28j60/enc28j60.c
index f8b8dec76..dbe9387e5 100644
--- a/drivers/enc28j60/enc28j60.c
+++ b/drivers/enc28j60/enc28j60.c
@@ -31,7 +31,7 @@
 #include "enc28j60.h"
 #include "enc28j60_regs.h"
 
-#define ENABLE_DEBUG    (0)
+#define ENABLE_DEBUG    (1)
 #include "debug.h"
 
 /**
@@ -485,7 +485,9 @@ static void nd_isr(netdev_t *netdev)
             cmd_bfc(dev, REG_EIR, -1, EIR_PKTIF);
         }
         if (eir & EIR_RXERIF) {
-            DEBUG("[enc28j60] isr: incoming packet dropped - RX buffer full\n");
+            DEBUG("[enc28j60] isr: incoming packet dropped - "
+                  "RX buffer / EPKTCNT overflow (EPKTCNT=%i)\n",
+                  (int)cmd_rcr(dev, REG_B1_EPKTCNT, 1));
             cmd_bfc(dev, REG_EIR, -1, EIR_RXERIF);
         }
         if (eir & EIR_TXIF) {

the console looks like this:

2018-08-21 12:34:58,997 - INFO # [enc28j60] isr: incoming packet dropped - RX buffer / EPKTCNT overflow (EPKTCNT=0)
2018-08-21 12:34:59,293 - INFO # [enc28j60] isr: incoming packet dropped - RX buffer / EPKTCNT overflow (EPKTCNT=0)
2018-08-21 12:34:59,640 - INFO # [enc28j60] isr: incoming packet dropped - RX buffer / EPKTCNT overflow (EPKTCNT=0)
2018-08-21 12:34:59,979 - INFO # [enc28j60] isr: incoming packet dropped - RX buffer / EPKTCNT overflow (EPKTCNT=0)

maribu · 2018-08-21T10:48:39Z

And another issue: Now according to GDB the system froze in:

RIOT/cpu/stm32_common/periph/spi.c

Lines 155 to 161 in 69b2bae

    
           static inline void _wait_for_end(spi_t bus) 
        
           { 
        
               /* make sure the transfer is completed before continuing, see reference 
        
                * manual(s) -> section 'Disabling the SPI' */ 
        
               while (!(dev(bus)->SR & SPI_SR_TXE)) {} 
        
               while (dev(bus)->SR & SPI_SR_BSY) {} 
        
           }

(The first loop is spinning infinitely.)

Maybe on chip debugging breaks SPI communication.

gschorcht · 2018-08-21T10:58:58Z

2018-08-21 12:34:58,997 - INFO # [enc28j60] isr: incoming packet dropped - RX buffer / EPKTCNT overflow (EPKTCNT=0)
2018-08-21 12:34:59,293 - INFO # [enc28j60] isr: incoming packet dropped - RX buffer / EPKTCNT overflow (EPKTCNT=0)
2018-08-21 12:34:59,640 - INFO # [enc28j60] isr: incoming packet dropped - RX buffer / EPKTCNT overflow (EPKTCNT=0)
2018-08-21 12:34:59,979 - INFO # [enc28j60] isr: incoming packet dropped - RX buffer / EPKTCNT overflow

Could you check whether the nd_isr is left everytime or if it is iterating endless in the while (eir != 0) { ... } loop by placing a debug message outside the while loop? At least the EIR_RXERIF everytime.

gschorcht · 2018-08-21T10:59:42Z

And another issue: Now according to GDB the system froze in:

Seems to be a platform specific SPI problem.

maribu · 2018-08-21T11:07:20Z

@gschorcht: The loop is left, so the RXERIF bit is successfully unset.

diff --git a/drivers/enc28j60/enc28j60.c b/drivers/enc28j60/enc28j60.c
index f8b8dec76..eba287f9f 100644
--- a/drivers/enc28j60/enc28j60.c
+++ b/drivers/enc28j60/enc28j60.c
@@ -31,7 +31,7 @@
 #include "enc28j60.h"
 #include "enc28j60_regs.h"
 
-#define ENABLE_DEBUG    (0)
+#define ENABLE_DEBUG    (1)
 #include "debug.h"
 
 /**
@@ -458,6 +458,19 @@ static int nd_init(netdev_t *netdev)
     return 0;
 }
 
+static void drop_rx_buffer(enc28j60_t *dev)
+{
+    uint16_t rx_rd_ptr, next;
+    uint8_t head[6];
+    rx_rd_ptr = cmd_r_addr(dev, ADDR_RX_READ);
+    cmd_w_addr(dev, ADDR_READ_PTR, ERXRDPT_TO_NEXT(rx_rd_ptr));
+    /* read packet header */
+    cmd_rbm(dev, head, 6);
+    /* TODO: care for endianess */
+    next = (uint16_t)((head[1] << 8) | head[0]);
+    cmd_w_addr(dev, ADDR_RX_READ, NEXT_TO_ERXRDPT(next));
+}
+
 static void nd_isr(netdev_t *netdev)
 {
     enc28j60_t *dev = (enc28j60_t *)netdev;
@@ -486,6 +499,9 @@ static void nd_isr(netdev_t *netdev)
         }
         if (eir & EIR_RXERIF) {
             DEBUG("[enc28j60] isr: incoming packet dropped - RX buffer full\n");
+            /* Drop partly received packet */
+            drop_rx_buffer(dev);
+            /* Clear RXERIF bit */
             cmd_bfc(dev, REG_EIR, -1, EIR_RXERIF);
         }
         if (eir & EIR_TXIF) {

The time until the problem occurs seems to be increased by the patch above. But it doesn't solve it :-(

maribu · 2018-08-21T11:12:01Z

Seems to be a platform specific SPI problem.

Yes, it is unrelated and seems to only occur while interrupting the MCU during on-chip debugging.

gschorcht · 2018-08-21T11:15:45Z

I guess you have also to decrement the packet counter in your drop function. But I'm not sure.

cmd_bfs(dev, REG_ECON2, -1, ECON2_PKTDEC);

maribu · 2018-08-21T11:17:41Z

Yes, that was the issue. I just had the same idea and so far the driver is working fine. Let me wait another 5 minutes to be sure :-)

maribu · 2018-08-21T11:29:13Z

diff --git a/drivers/enc28j60/enc28j60.c b/drivers/enc28j60/enc28j60.c
index f8b8dec76..aa405c2f7 100644
--- a/drivers/enc28j60/enc28j60.c
+++ b/drivers/enc28j60/enc28j60.c
@@ -314,10 +314,37 @@ static int nd_send(netdev_t *netdev, const iolist_t *iolist)
 #define NEXT_TO_ERXRDPT(n) ((n == BUF_RX_START || n - 1 > BUF_RX_END) ? BUF_RX_END : n - 1)
 #define ERXRDPT_TO_NEXT(e) ((e >= BUF_RX_END) ? BUF_RX_START : e + 1)
 
+static void get_rx_header(enc28j60_t *dev, uint16_t *size, uint16_t *next)
+{
+    uint16_t rx_rd_ptr;
+    uint8_t head[6];
+
+    /* set read pointer to RX read address */
+    rx_rd_ptr = cmd_r_addr(dev, ADDR_RX_READ);
+    cmd_w_addr(dev, ADDR_READ_PTR, ERXRDPT_TO_NEXT(rx_rd_ptr));
+    /* read packet header */
+    cmd_rbm(dev, head, 6);
+    /* TODO: care for endianess */
+    if (next != NULL) {
+        *next = (uint16_t)((head[1] << 8) | head[0]);
+    }
+
+    if (size != NULL) {
+        *size = (uint16_t)((head[3] << 8) | head[2]) - 4;  /* discard CRC */
+    }
+}
+
+static void drop_rx_buffer(enc28j60_t *dev)
+{
+    uint16_t next;
+    get_rx_header(dev, NULL, &next);
+    cmd_w_addr(dev, ADDR_RX_READ, NEXT_TO_ERXRDPT(next));
+    cmd_bfs(dev, REG_ECON2, -1, ECON2_PKTDEC);
+}
+
 static int nd_recv(netdev_t *netdev, void *buf, size_t max_len, void *info)
 {
     enc28j60_t *dev = (enc28j60_t *)netdev;
-    uint8_t head[6];
     uint16_t size;
     uint16_t next;
 
@@ -325,14 +352,7 @@ static int nd_recv(netdev_t *netdev, void *buf, size_t max_len, void *info)
     mutex_lock(&dev->devlock);
 
     /* set read pointer to RX read address */
-    uint16_t rx_rd_ptr = cmd_r_addr(dev, ADDR_RX_READ);
-    cmd_w_addr(dev, ADDR_READ_PTR, ERXRDPT_TO_NEXT(rx_rd_ptr));
-    /* read packet header */
-    cmd_rbm(dev, head, 6);
-    /* TODO: care for endianess */
-    next = (uint16_t)((head[1] << 8) | head[0]);
-    size = (uint16_t)((head[3] << 8) | head[2]) - 4;  /* discard CRC */
-
+    get_rx_header(dev, &size, &next);
     DEBUG("[enc28j60] recv: size=%i next=%i buf=%p len=%d\n",
           (int)size, (int)next, buf, max_len);
 
@@ -486,6 +506,9 @@ static void nd_isr(netdev_t *netdev)
         }
         if (eir & EIR_RXERIF) {
             DEBUG("[enc28j60] isr: incoming packet dropped - RX buffer full\n");
+            /* Drop partly received packet */
+            drop_rx_buffer(dev);
+            /* Clear RXERIF bit */
             cmd_bfc(dev, REG_EIR, -1, EIR_RXERIF);
         }
         if (eir & EIR_TXIF) {

It worked for 5 minutes :-) . The above patch is an updated version in which the header parsing is moved to a separate function in order to de-duplicate that code. Do you care to add that fix (or similar) to your PR? Or should I open a separate PR?

gschorcht · 2018-08-21T11:33:25Z

I can add it to my PR but you would have to test it again.

maribu · 2018-08-21T11:34:15Z

Crap. With DEBUG_ENABLE set to 0 for me ping6 stop's to work (times out). :-( With it enabled I get 99 pongs for 100 pings send.

maribu · 2018-08-21T11:37:00Z

I can add it to my PR but you would have to test it again.

Sure :-) Who thought that a couple of wrongly configured switches sending Ethernet frames to random nodes could provide such an excellent testing environment :-D

maribu · 2018-08-21T11:41:14Z

Wait. It happened again. The fix I proposed obviously doesn't solve the issue. I think I will have a look into it some other day...

gschorcht · 2018-08-21T11:41:47Z

Crap. With DEBUG_ENABLE set to 0 for me ping6 stop's to work (times out). :-(
With it enabled I get 99 pongs for 100 pings send.

I was just committing your changes, but I have not pushed them yet. That is, it still does not work for you?

gschorcht · 2018-08-21T11:42:51Z

I think I will have a look into it some other day...
Ok

maribu · 2018-08-21T11:49:47Z

I was just committing your changes, but I have not pushed them yet. That is, it still does not work for you?

Yes, the RX buffer overflow is still not handled correctly with my suggested fix. Additionally, there seem to be some unrelated timing issue, as ping6 stops working for me when the debug output is disabled.

However, I can confirm that your PR fixes the missing implementation of dropping packets in netdev_driver::recv and the driver works a lot more reliable because of that.

gschorcht · 2018-10-30T07:38:47Z

@maribu @haukepetersen
How do we continue with this PR? At least, it fixes the problem with dropping packets when the receive buffer is full and increase the stability a lot.

maribu · 2018-10-30T16:55:44Z

I suggest to merge it :-)

My reasoning:
The API change I suggested will either come not at all, or not any time soon. Thus, at least fixing the bugs against the old API and fixing the DoS security hole is a good thing. If and when the API is changed, this could be done just as easily on top of fixed drivers. Also, the additional DEBUG() macros will make future debugging easier :-)

PeterKietzmann · 2018-11-05T10:26:56Z

Without in-depth review or testing I'm also in favor of merging this. @smlng do you still have the device on your desk to run a short test?

gschorcht · 2018-11-05T10:58:24Z

@maribu The PR still doesn't contain your last changes. Is that OK or should we add them?

maribu · 2018-11-05T11:51:57Z

I personally would be in favor of merging it as it is. I still had sporadic dead-locks with my last changes. It seems to occur much less frequently with the changes, but this could be just bad luck. So I would like to investigate this further (hopefully some time soon) and be sure what causes the remaining deadlocks.

The missing drop feature added in this PR does reliably solve one cause of a deadlock, so I'm personally in favor of merging the PR as it is :-)

maribu

Thank you for addressing this issue. I believe clearing the PKTIF flag in the EIR register in line 485 is not required according to the datasheet (see comment above) - but I'm not sure about that.

Maybe you could add a comment, if it is actually needed. I will test this PR again this afternoon.

maribu · 2018-11-19T08:47:15Z

drivers/enc28j60/enc28j60.c

@@ -470,6 +482,7 @@ static void nd_isr(netdev_t *netdev)
                DEBUG("[enc28j60] isr: packet received\n");
                netdev->event_callback(netdev, NETDEV_EVENT_RX_COMPLETE);
            } while (cmd_rcr(dev, REG_B1_EPKTCNT, 1) > 0);
+            cmd_bfc(dev, REG_EIR, -1, EIR_PKTIF);


After decrementing, if EPKTCNT is ‘0’, the EIR.PKTIF flag will automatically be cleared.

Citing section 7.2.4 on page 45 in the datasheet. I assume this line could be dropped.

Probably yes, but it shouldn't cause errors if it is there. Since I don't have time to test it, I would leave it as it is for the moment. If you could test, we could decide to remove it.

@maribu Have you tried it?

Did so now. Works fine without manually clearing EIR_PKTIF.

aabadie · 2018-12-20T08:47:25Z

@maribu @gschorcht, what is the status here ? Do you think it can be merged or is it waiting for other updates ?

jia200x · 2019-01-09T15:52:33Z

ping @maribu @gschorcht

maribu · 2019-01-10T19:15:53Z

@jia200x, @gschorcht : Sorry, I again did not got to test it today. Tomorrow is hopefully less busy and I will finally be able to confirm whether cmd_bfc(dev, REG_EIR, -1, EIR_PKTIF) is indeed required or as the datasheet suggests not.

gschorcht · 2019-01-24T14:38:24Z

@maribu I removed this line but I had not the time to test it.

maribu · 2019-01-24T22:02:12Z

Mind to squash? I will verify that it still works tomorrow in a free minute and hopefully merge it then :-)

gschorcht · 2019-01-24T22:25:55Z

Rebase ans squashed.

maribu

I just tested on the Nucleo-F303RE and with the PR applied, the network stack works reliable even when frames larger than the buffer come in :-)

gschorcht · 2019-01-25T15:27:19Z

Thanks for testing.

maribu · 2019-01-25T16:10:18Z

No problem, thanks for identifying and fixing the issue :-)

gschorcht force-pushed the enc82j60_fix_9784 branch from bcf4b4f to aa00087 Compare August 21, 2018 07:38

maribu mentioned this pull request Sep 12, 2018

Misleading API in netdev driver #9805

Closed

miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: drivers Area: Device drivers labels Oct 17, 2018

maribu mentioned this pull request Nov 16, 2018

Missing drop implementations in netdev_driver_t::recv #10410

Open

maribu reviewed Nov 19, 2018

View reviewed changes

aabadie added this to the Release 2019.01 milestone Dec 3, 2018

drivers/enc28j60: fix of RIOT-OS#9784

5cb2dd3

gschorcht force-pushed the enc82j60_fix_9784 branch from 73f94a1 to 5cb2dd3 Compare January 24, 2019 22:26

maribu approved these changes Jan 25, 2019

View reviewed changes

maribu merged commit 370f33a into RIOT-OS:master Jan 25, 2019

gschorcht deleted the enc82j60_fix_9784 branch February 7, 2019 14:09

gschorcht mentioned this pull request Aug 26, 2019

send data with UDP at 10HZ, the program die #11860

Closed

chrysn mentioned this pull request Feb 12, 2022

enc28j60: irq handling #17593

Closed

drivers/enc28j60: fix of #9784 #9806

drivers/enc28j60: fix of #9784 #9806

Conversation

gschorcht commented Aug 21, 2018

maribu commented Aug 21, 2018 • edited Loading

maribu commented Aug 21, 2018

aabadie commented Aug 21, 2018 • edited Loading

gschorcht commented Aug 21, 2018

maribu commented Aug 21, 2018

maribu commented Aug 21, 2018 • edited Loading

gschorcht commented Aug 21, 2018

gschorcht commented Aug 21, 2018

maribu commented Aug 21, 2018

maribu commented Aug 21, 2018

gschorcht commented Aug 21, 2018

maribu commented Aug 21, 2018

maribu commented Aug 21, 2018

gschorcht commented Aug 21, 2018

maribu commented Aug 21, 2018

maribu commented Aug 21, 2018

maribu commented Aug 21, 2018

gschorcht commented Aug 21, 2018

gschorcht commented Aug 21, 2018

maribu commented Aug 21, 2018

gschorcht commented Oct 30, 2018

maribu commented Oct 30, 2018

PeterKietzmann commented Nov 5, 2018

gschorcht commented Nov 5, 2018

maribu commented Nov 5, 2018

maribu left a comment • edited Loading

Choose a reason for hiding this comment

maribu Nov 19, 2018

Choose a reason for hiding this comment

gschorcht Dec 20, 2018 • edited Loading

Choose a reason for hiding this comment

gschorcht Jan 9, 2019

Choose a reason for hiding this comment

maribu Jan 14, 2019

Choose a reason for hiding this comment

aabadie commented Dec 20, 2018

jia200x commented Jan 9, 2019

maribu commented Jan 10, 2019

gschorcht commented Jan 24, 2019 • edited Loading

maribu commented Jan 24, 2019

gschorcht commented Jan 24, 2019

maribu left a comment

Choose a reason for hiding this comment

gschorcht commented Jan 25, 2019

maribu commented Jan 25, 2019

maribu commented Aug 21, 2018 •

edited

Loading

aabadie commented Aug 21, 2018 •

edited

Loading

maribu commented Aug 21, 2018 •

edited

Loading

maribu left a comment •

edited

Loading

gschorcht Dec 20, 2018 •

edited

Loading

gschorcht commented Jan 24, 2019 •

edited

Loading