Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnrc_netif: add packet to queue when device is busy #11263

Merged
merged 3 commits into from
Sep 7, 2020

Conversation

miri64
Copy link
Member

@miri64 miri64 commented Mar 25, 2019

Contribution description

This fixes the issue showcased in #11256 (assuming the gnrc_netif_pktq module is compiled in and the network device supports to return -EBUSY) by providing a very simple MAC scheme: If the device is busy on send, queue the packet and don't send. When RX_COMPLETE or TX_COMPLETE is issued, try to send the first packet in the queue.

Testing procedure

Run make tests-gnrc_netif_pktq test for tests/unittests. For testing the actual integration a device adaptation is needed, which I will provide for at86rf2xx.

Since I edited the gnrc_netif.c: pinging with gnrc_networking should still work.

Issues/PRs references

Provides a fix for the race condition show-cased in #11256.

Basis for #11068

Route to 6Lo minimal fragment forwarding

SY`) by providing a very simple MAC scheme: If the device is busy on send, queue the packet and don't send. When RX_COMPLETE or TX_COMPLETE is issued, try to send the first packet in the queue.

Testing procedure

Run make tests-gnrc_netif_pktq test for tests/unittests. For testing the actual integration a device adaptation is needed, which I will provide for at86rf2xx.

Since I edited the gnrc_netif.c: pinging with gnrc_networking should still work.

Issues/PRs references

Provides a fix for the race condition show-cased in #11256.

@miri64 miri64 added Area: network Area: Networking Type: new feature The issue requests / The PR implemements a new feature for RIOT labels Mar 25, 2019
@bergzand
Copy link
Member

Initial gut feeling for this API change is positive. I think it simplifies a few device drivers as those don't have to block anymore until the radio is done transmitting the previous frame. I know that at least the mrf24j40 has a clunky while(not_in_the_right_state) {} loop that might be refactored out with this change.

MAC scheme

Maybe this should be called a Device Access Control Scheme? 😄

@miri64
Copy link
Member Author

miri64 commented Mar 25, 2019

See #11264 for my at86rf2xx adaption.

sys/net/gnrc/netif/gnrc_netif.c Outdated Show resolved Hide resolved
@kaspar030
Copy link
Contributor

Initial gut feeling for this API change is positive. I think it simplifies a few device drivers as those don't have to block anymore until the radio is done transmitting the previous frame. I know that at least the mrf24j40 has a clunky while(not_in_the_right_state) {} loop that might be refactored out with this change.

My feelings exactly. Now we can finally make the drivers just return "-EBUSY" when it currently cannot send.

Conceptionally this is a rather large change, so I suggest we don't rush it into the release.

@miri64
Copy link
Member Author

miri64 commented Mar 26, 2019

Conceptionally this is a rather large change, so I suggest we don't rush it into the release.

Yepp, fully agree with that. I just need it for my current work in #11068 but I believe #11068 itself isn't ready for this release as well ;-).

@miri64
Copy link
Member Author

miri64 commented Mar 26, 2019

Also I just noticed that it is not that easy netif->ops->send() currently is released no matter what so it is lost and can't be queued. This means we need to change all the link-layer glue-code and adapt the documentation (i.e. API change) to make this possible.

@miri64
Copy link
Member Author

miri64 commented Mar 27, 2019

Rebased to current master to resolve conflict.

@miri64 miri64 added the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Mar 27, 2019
@miri64
Copy link
Member Author

miri64 commented Mar 27, 2019

And set to WIP due to #11263 (comment)

@miri64
Copy link
Member Author

miri64 commented Apr 9, 2019

Made queue larger, since it was way to small for larger fragmented packets.

@miri64
Copy link
Member Author

miri64 commented Jul 12, 2019

Reworked a little bit, so a rewrite of all the gnrc_netif_* modules is not required for when gnrc_netif_pktq is included.

@miri64
Copy link
Member Author

miri64 commented Aug 30, 2019

Rebased to current master to resolve a conflict caused by the merging of #11837.

@miri64
Copy link
Member Author

miri64 commented Sep 25, 2019

Squashed to simplify some integration into a separate branch.

miri64 added a commit to 5G-I3/RIOT-public that referenced this pull request Sep 25, 2019
(cherry picked from commit 5cc265f,
see RIOT-OS#11263)
miri64 added a commit to 5G-I3/RIOT-public that referenced this pull request Sep 25, 2019
... and also send on send error (i.e. when *medium* was busy)

(cherry picked from commit 825b0f5,
see RIOT-OS#11263)
miri64 added a commit to 5G-I3/RIOT-public that referenced this pull request Sep 25, 2019
miri64 added a commit to 5G-I3/RIOT-public that referenced this pull request Sep 25, 2019
@miri64
Copy link
Member Author

miri64 commented Sep 2, 2020

Should I squash then or are you still going through the changes?

@jia200x
Copy link
Member

jia200x commented Sep 2, 2020

Should I squash then or are you still going through the changes?

Sure, please squash

@miri64
Copy link
Member Author

miri64 commented Sep 2, 2020

I just saw: also need to rebase ... again -.-

@miri64
Copy link
Member Author

miri64 commented Sep 2, 2020

Squashed

@miri64
Copy link
Member Author

miri64 commented Sep 2, 2020

And rebased and adapted to current master.

@jia200x
Copy link
Member

jia200x commented Sep 3, 2020

I tested this one carefully using #14787 and notice something strange: if a radio returns -EBUSY instead of blocking, the RTT increases.

For instance, using this feature with at86rf2xx:

ping6 fe80::204:2519:1801:bd0e -i 170 -s 1024 -c 100
...

2020-09-03 16:04:53,168 # round-trip min/avg/max = 123.286/132.150/142.522 ms

If I modify the driver to return -EBUSY:

diff --git a/drivers/at86rf2xx/at86rf2xx_netdev.c b/drivers/at86rf2xx/at86rf2xx_netdev.c
index 3c83a43947..f8ff12d0da 100644
--- a/drivers/at86rf2xx/at86rf2xx_netdev.c
+++ b/drivers/at86rf2xx/at86rf2xx_netdev.c
@@ -114,6 +114,10 @@ static int _send(netdev_t *netdev, const iolist_t *iolist)
     at86rf2xx_t *dev = (at86rf2xx_t *)netdev;
     size_t len = 0;
 
+    if (at86rf2xx_get_status(dev) == AT86RF2XX_STATE_BUSY_TX_ARET) {
+        return -EBUSY;
+    }
+

I get

ping6 fe80::204:2519:1801:bd0e -i 170 -s 1024 -c 100
...

2020-09-03 15:59:34,245 # round-trip min/avg/max = 130.729/140.632/155.966 ms

I consistently get extra ~8-10 ms if -EBUSY is returned, for 11 fragments (1024 kb ping).

Any idea on why this could happen?

@benpicco
Copy link
Contributor

benpicco commented Sep 4, 2020

With at86rf215 and

this patch
--- a/drivers/at86rf215/at86rf215.c
+++ b/drivers/at86rf215/at86rf215.c
@@ -250,6 +250,10 @@ static void _block_while_busy(at86rf215_t *dev)
 
 static void at86rf215_block_while_busy(at86rf215_t *dev)
 {
+    if (IS_ACTIVE(GNRC_NETIF_PKTQ)) {
+        return;
+    }
+
     if (_tx_ongoing(dev)) {
         DEBUG("[at86rf215] Block while TXing\n");
         _block_while_busy(dev);
@@ -262,7 +266,12 @@ int at86rf215_tx_prepare(at86rf215_t *dev)
         return -EAGAIN;
     }
 
-    at86rf215_block_while_busy(dev);
+    if (IS_ACTIVE(GNRC_NETIF_PKTQ) && _tx_ongoing(dev)) {
+        return -EBUSY;
+    } else {
+        at86rf215_block_while_busy(dev);
+    }
+
     dev->tx_frame_len = IEEE802154_FCS_LEN;
 
     return 0;

I get better results:

master

2020-09-04 18:39:16,406 #  ping6 2001:db8::204:2519:1801:c8c5 -s 512 -c 10
2020-09-04 18:39:16,533 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=0 ttl=64 rssi=-54 dBm time=113.710 ms
2020-09-04 18:39:17,525 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=1 ttl=64 rssi=-57 dBm time=109.456 ms
2020-09-04 18:39:18,534 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=2 ttl=64 rssi=-56 dBm time=120.383 ms
2020-09-04 18:39:19,540 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=3 ttl=64 rssi=-62 dBm time=118.382 ms
2020-09-04 18:39:20,533 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=4 ttl=64 rssi=-62 dBm time=110.219 ms
2020-09-04 18:39:22,533 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=6 ttl=64 rssi=-65 dBm time=112.776 ms
2020-09-04 18:39:23,539 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=7 ttl=64 rssi=-61 dBm time=113.988 ms
2020-09-04 18:39:24,548 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=8 ttl=64 rssi=-61 dBm time=123.613 ms
2020-09-04 18:39:26,401 # 
2020-09-04 18:39:26,420 # --- 2001:db8::204:2519:1801:c8c5 PING statistics ---
2020-09-04 18:39:26,425 # 10 packets transmitted, 8 packets received, 20% packet loss
2020-09-04 18:39:26,428 # round-trip min/avg/max = 109.456/115.315/123.613 ms

gnrc_netif_pktq

2020-09-04 18:38:11,040 #  ping6 2001:db8::204:2519:1801:c8c5 -s 512 -c 10
2020-09-04 18:38:11,133 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=0 ttl=64 rssi=-57 dBm time=77.002 ms
2020-09-04 18:38:12,128 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=1 ttl=64 rssi=-55 dBm time=73.160 ms
2020-09-04 18:38:13,121 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=2 ttl=64 rssi=-56 dBm time=75.721 ms
2020-09-04 18:38:14,129 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=3 ttl=64 rssi=-56 dBm time=74.772 ms
2020-09-04 18:38:15,120 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=4 ttl=64 rssi=-55 dBm time=73.810 ms
2020-09-04 18:38:16,127 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=5 ttl=64 rssi=-56 dBm time=72.549 ms
2020-09-04 18:38:17,134 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=6 ttl=64 rssi=-56 dBm time=77.309 ms
2020-09-04 18:38:18,130 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=7 ttl=64 rssi=-57 dBm time=83.665 ms
2020-09-04 18:38:19,133 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=8 ttl=64 rssi=-58 dBm time=75.729 ms
2020-09-04 18:38:20,128 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=9 ttl=64 rssi=-59 dBm time=76.030 ms
2020-09-04 18:38:20,129 # 
2020-09-04 18:38:20,131 # --- 2001:db8::204:2519:1801:c8c5 PING statistics ---
2020-09-04 18:38:20,142 # 10 packets transmitted, 10 packets received, 0% packet loss
2020-09-04 18:38:20,147 # round-trip min/avg/max = 72.549/75.974/83.665 ms

@benpicco benpicco added State: waiting for CI update State: The PR requires an Update to CI to be performed first and removed State: waiting for CI update State: The PR requires an Update to CI to be performed first labels Sep 4, 2020
Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good and is working fine.
Let's finally get this in!

@aabadie
Copy link
Contributor

aabadie commented Sep 4, 2020

@kaspar030 your concerns seem to be addressed. Can we dismiss your stale review ? (or you can just ACK :) )

@miri64
Copy link
Member Author

miri64 commented Sep 7, 2020

I tested this one carefully using #14787 and notice something strange: if a radio returns -EBUSY instead of blocking, the RTT increases.

Is it strange that when there is queuing involved that there might be an overhead?

@miri64
Copy link
Member Author

miri64 commented Sep 7, 2020

I consistently get extra ~8-10 ms if -EBUSY is returned, for 11 fragments (1024 kb ping).

If you are worried that it is such a high overhead, you have to remember to divide your round-trip time by 22 (11 fragments send to and from) and you get a more accurate picture.

@jia200x
Copy link
Member

jia200x commented Sep 7, 2020

Is it strange that when there is queuing involved that there might be an overhead?

I think so. I would expect an overhead of memory usage, but not such an RTT overhead. Considering that the queue is there because network stack is much faster than the network device (including queue operations), I wouldn't expect such a time difference.

If you are worried that it is such a high overhead, you have to remember to divide your round-trip time by 22 (11 fragments send to and from) and you get a more accurate picture.

I'm not worried about the higher RTT considering that this fixes several problems, but I'm just wondering why there's such a difference.

@jia200x
Copy link
Member

jia200x commented Sep 7, 2020

(I'm not blocking this PR, I'm just trying to figure out where does this time difference come from)

@benpicco benpicco dismissed kaspar030’s stale review September 7, 2020 09:03

Comments have been addressed over a year ago

@benpicco benpicco merged commit a336fdc into RIOT-OS:master Sep 7, 2020
@miri64 miri64 deleted the gnrc_netif/new/pktq branch September 8, 2020 10:43
}
/* hold in case device was busy to not having to rewrite *all* the link
* layer implementations in case `gnrc_netif_pktq` is included */
gnrc_pktbuf_hold(pkt, 1);
Copy link
Contributor

@benpicco benpicco Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was meant by this comment? 😨

Copy link
Member Author

@miri64 miri64 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this was written, the netif implementations just threw away the packet when they were unable to send (might not be the case with netdev_new, I don't know).

So, to prevent it not to be lost in that case, we hold the packet (i.e. the next release does not remove it from the packet buffer), to put it into the queue later in ll1431-1437. If the device was not busy, we just remove it in l1447.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netdev_new does indeed not release on send

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants