BLE Throughput #15171

hwagner66 · 2019-04-03T21:30:29Z

After implementing a throughput test for BLE using nrf52840 hardware (BL654-USB) and Zephyr 1.14-rc1, notice some curious behavior in data we are collecting. The test seems to show that sending data from the central (client) to peripheral (server) throughput performance is relatively balanced per connection but one peripheral seems to get little data and one peripheral get more. In data flow from the peripherals to the central, it is much more unbalanced with one or two peripherals having all the throughput and the others having none.

The central to peripheral is 1:N where the central is a single point source. The many peripherals to central is an N:1 with many sources. In my case N is 1 up to 5, but would like to be greater. The overall aggregate throughput is constant, just not evenly distributed among the BLE devices or comm directions. Using DLE and 2M PHY.

Any thoughts to why this behavior is present and how to diagnose/resolve to produce consistent and balanced throughput among the BLE devices? Do I have something misconfigured in my devices? Can share if that will help.

hwagner66 · 2019-04-04T15:53:37Z

In this test, all of the devices are Laird BL654-USB devices built around the nrf52840 and an FTDI UART-USB chip. The devices are all plugged into a USB hub for testing.
One device operates as the BLE central and is a client (call it dev0). The remaining 5 devices are BLE peripherals and act as servers (dev1-5). I use Zephyr 1.14-rc1 enabling the Shell and BLE devices. Shell implements commands via UART0 and USB vComm port to Putty window or Python script. Communication operates at 115.2 baud.

Currently the central has an address list of peripherals and connections are initiated for each peripheral via shell commands to the central. After connecting a larger MTU (247) is negotiated. The peripheral devices also have shell cmds implemented, so a predefined value can be written to a pair of characteristics to exchange data between devices. (The intent of this device is a serial bus replacement)
The exchange is in both directions simultaneously for 1:N devices.

Tested throughput with different connection intervals of 50mS, 100mS and 250mS. without significant changes in behavior. The connection peripheral parameters were set to min and max conn intervals of 40 (50mS), 80 (100mS) or 200 (250mS), slave latency of 0 and peripheral timeout of 40 (400mS). All devices had the same values for a given test. In the central, characteristics are written using bt_gatt_write_without_response_cb. In the peripheral, bt_gatt_notify is utilized to write data to the characteristic for the central to consume.

Attached is data from several different tests that were run. The red highlight differences from Test 1 (baseline).
Throughput data.pdf

cvinayak · 2019-04-08T08:16:07Z

@hwagner66 The central scheduling would place the slave connections of similar connection interval in a group next to each other with a minimum possibility of one default sized PDU exchange. When you have 251 bytes PDU and n peripheral connection, there will be chance for only the central to send a single PDU. The peripheral does not have enough on air time before the next peripheral connection is scheduled. Only the last scheduled peripheral in the connection interval get all the air time, hence reflecting higher throughput on the last connection (as reflected in your Test 1)

This is an implementation detail and there is no interface to tune the on air quality of service when PDU lengths and PHY is updated.

The required implementation change is not difficult, but I would like to collaborate with you by means of a conference call to be able to implement HCI connection event values that suits your requirement.

carlescufi · 2019-04-11T11:35:02Z

From @Vudentz in the mailing list:

I wonder if this could be related to the lack of buffers on the host, I assume you would use bt_gatt_notify with NULL conn which will take care of notifying each connection but that may end up consuming all the buffers blocking which could the unbalanced since the connection that first appear on the list would probably never block.
You could perhaps try to increase the buffer pool with:
CONFIG_BT_L2CAP_TX_BUF_COUNT 5
If that solves it we may have to consider making the default the number of connection + 1 so we can actually emit notifications to all connection at once, though that doesn't guarantee there is always going to be buffer available if there are more traffic going on.

hwagner66 · 2019-04-16T15:07:01Z

@cvinayak @carlescufi Can we setup a conference call this week to discuss potential resolution of this issue? I am meeting with the customer Friday and would like to have a potential plan in place before then.

cvinayak · 2019-05-02T01:10:31Z

@hwagner66 any follow up on this issue?

hwagner66 · 2019-05-03T03:47:01Z

Hi Vinayak, Yes I can report some progress. When we last spoke, we changed the slot timing in bluetooth/controller/ll_sw from 328 to a value more appropriate for an packet size of 251. That by itself gave unexpected results. I searched the ctrl.c file for other instances of 328 (there are 4 total) and changed all of them to a more appropriate value. I used a value of 1048 calculated from an max pkt of 251 with a 2MPHY. This change of value seems to work rather well in testing with a central and 5 peripherals. I was able to achieve consistent performance of 92 to 98kbps in the central to peri direction across all 5 peripherals. In the peri to central, the performance was 30-80kbps. The first device connected did the best and the last the worst (lowest throughput). Not sure exactly why this is but may have to do with the test conditions. I need to do more testing to understand this. I *think* this may give me enough performance to continue my testing. Unfortunately, my build system broke 2 days ago and I haven’t been able to get it working correctly since then. Also in thinking about how this should operate, I think the slot timing for each device needs to accommodate the negotiated MTU on a per connection basis and be updated when the MTU is negotiated. I can use the fixed value for this test, but I think the right way is to set the slot timing to accommodate the MTUs. Not doing that and always using the maximum slot duration would be a very inefficient use of bandwidth. -Henry

hwagner66 · 2019-05-23T17:29:04Z

After making the slot timing change, more testing was performed with various input data sets. All testing has been done with unreliable messaging for maximum throughput and better understanding of network performance. The input data is approximately 19.5kbps (averages to one 244 byte payload every 100mS). Using DLE and 2Mphy on Zephyr 1.14. The connection interval is set to 100mS.
The central device is configured to support up to 10 simultaneous connections always. The question I have is why do I see poorer throughput with <10 connections? The tput is best at 1, 5 and 10 connections but worse when other numbers of devices are connected for the test. Is this an issue of scheduling the packets to communicate within each connection interval, not enough buffers in the central, or something else? I've attached a graph of performance vs active connections and the prj.conf for the central device.
BLE throughput vs connections.pdf
prj.conf.txt

cvinayak · 2019-05-24T11:48:22Z

@hwagner66 Could you let me know the value of CONFIG_BT_CTLR_TX_BUFFERS? Are you able to use any sort of fairness in enqueuing the transfer across n connections?

My guesses are, the connections dont get enqueued packets within the 100 ms connection interval.

Could you use, say 13 tx buffers, and then every 100ms fairly enqueue 1 packet each to the n connections? (Do try to detect latency of bt_gatt_notify API call for any delays due to on-air retransmissions, if any).

hwagner66 · 2019-05-24T15:17:45Z

CONFIG_BT_CTLR_TX_BUFFERS=19 which appears to be the max allowed. As far a fairness, the messages are enqueued at a rate of 100mS/#connections in a round robin fashion, that is: 1 conn every 100mS, for 2 conn every 50mS alternating between conn 0 & 1, for 3 conn every 33mS cycling 0,1,2,0,1,2,… etc. At 10 connected peripherals, a msg is enqueued every 10mS to one of the connections. I am not using the bt_gatt_notify for any latency or timing. Are saying use that as a measure of the OTA retries? That could tell me that more OTA retries are required for successful tx but how could I encourage fewer retries? Right now I present messages to the BLE stack and track the # messages presented in the central and received in the peripherals. These numbers match indicating no msg loss but the throughput suffers as I add connections. 1, 5, and 10 connections offer the best throughput but when I have 9 connections the throughput is only half of what it should be (10.7kbps/conn vs the ideal of 19.5kbps/conn). This is shown in the attached pdf. From: Vinayak Kariappa Chettimada <[email protected]> Sent: Friday, May 24, 2019 6:49 AM To: zephyrproject-rtos/zephyr <[email protected]> Cc: hwagner66 <[email protected]>; Mention <[email protected]> Subject: Re: [zephyrproject-rtos/zephyr] BLE Throughput (#15171) EXTERNAL EMAIL: This email originated outside of Laird. Be careful with attachments and links. @hwagner66<https://github.com/hwagner66> Could you let me know the value of CONFIG_BT_CTLR_TX_BUFFERS? Are you able to use any sort of fairness in enqueuing the transfer across n connections? My guesses are, the connections dont get enqueued packets within the 100 ms connection interval. Could you use, say 13 tx buffers, and then every 100ms fairly enqueue 1 packet each to the n connections? (Do try to detect latency of bt_gatt_notify API call for any delays due to on-air retransmissions, if any). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#15171?email_source=notifications&email_token=AFA5EOCSWEPFVYRL52A4IX3PW7IZNA5CNFSM4HDN524KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWFBVTQ#issuecomment-495590094>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFA5EOAVZRQMFQ2RHUXCJLDPW7IZNANCNFSM4HDN524A>. THIS MESSAGE, ANY ATTACHMENT(S), AND THE INFORMATION CONTAINED HEREIN MAY BE PROPRIETARY TO LAIRD CONNECTIVITY, INC. AND/OR ANOTHER PARTY, AND MAY FURTHER BE INTENDED TO BE KEPT CONFIDENTIAL. IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE DELETE THE EMAIL AND ANY ATTACHMENTS, AND IMMEDIATELY NOTIFY THE SENDER BY RETURN EMAIL. THIS MESSAGE AND ITS CONTENTS ARE THE PROPERTY OF LAIRD CONNECTIVITY, INC. AND MAY NOT BE REPRODUCED OR USED WITHOUT THE EXPRESS WRITTEN CONSENT OF LAIRD CONNECTIVITY, INC.

hwagner66 · 2019-05-28T14:10:57Z

I am retrying the test with 13 tx buffers and using bt_gatt_notify to throttle the enqueing of msgs. Will let you know of results.
Any other thoughts come to mind?

cvinayak · 2019-05-28T14:52:02Z

@hwagner66 i am trying to fix my broken shell app before I start trying to reproduce the behaviors you are seeing. Please bare with the delay.

hwagner66 · 2019-05-28T14:58:39Z

Not a problem. From: Vinayak Kariappa Chettimada <[email protected]> Sent: Tuesday, May 28, 2019 9:52 AM To: zephyrproject-rtos/zephyr <[email protected]> Cc: hwagner66 <[email protected]>; Mention <[email protected]> Subject: Re: [zephyrproject-rtos/zephyr] BLE Throughput (#15171) EXTERNAL EMAIL: This email originated outside of Laird. Be careful with attachments and links. @hwagner66<https://github.com/hwagner66> i am trying to fix my broken shell app before I start trying to reproduce the behaviors you are seeing. Please bare with the delay. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#15171?email_source=notifications&email_token=AFA5EOGPZ7EHPRFZCYPOAVLPXVBKZA5CNFSM4HDN524KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMMJBI#issuecomment-496551045>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFA5EOGWKOSYP42NWNKA7XDPXVBKZANCNFSM4HDN524A>. THIS MESSAGE, ANY ATTACHMENT(S), AND THE INFORMATION CONTAINED HEREIN MAY BE PROPRIETARY TO LAIRD CONNECTIVITY, INC. AND/OR ANOTHER PARTY, AND MAY FURTHER BE INTENDED TO BE KEPT CONFIDENTIAL. IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE DELETE THE EMAIL AND ANY ATTACHMENTS, AND IMMEDIATELY NOTIFY THE SENDER BY RETURN EMAIL. THIS MESSAGE AND ITS CONTENTS ARE THE PROPERTY OF LAIRD CONNECTIVITY, INC. AND MAY NOT BE REPRODUCED OR USED WITHOUT THE EXPRESS WRITTEN CONSENT OF LAIRD CONNECTIVITY, INC.

hwagner66 · 2019-05-28T18:04:11Z

@cvinayak Modified central to use CONFIG_BT_CTLR_TX_BUFFERS=13 and counting semaphore on tx buf msgs. Performance was nearly identical to prior test using CONFIG_BT_CTLR_TX_BUFFERS=19 and no counting semaphore. Both tests used uniform random message sizes and 100mS connection interval. See graph in attached PDF (yellow latest test vs blue prior test).
BLE throughput vs connections 13Tx buf sem.pdf

hwagner66 · 2019-05-30T13:54:16Z

@cvinayak Any thoughts or progress on this issue? If there is something more I should look at in my code or configuration, or to check behavior in the stack, please let me know. I need to give an update to my customer today and any progress would be helpful in that regard.

cvinayak · 2019-06-28T13:44:44Z

@hwagner66 I have done some changes #17097 in controller implementation related to connection event length. Is it possible for you to share the test procedure/script so that I can try to reproduce your observations? You could also help me test your scenarios at your end.

hwagner66 · 2019-06-28T14:42:39Z

I can help test here if that helps. My test setup involves shell commands driven by a Python script. The shell commands are handled and call methods to create connections and transfer data. It is messy but it works for the testing I need to do. The test itself transfers 1 or 2 packets nearly full length (244 byte payload) packets from central to peripheral every 100mS. The average throughput per connection is 19-20kbps. Up to 10 connections have been tested per central. The reverse data flow path is smaller packet (60 byte payload) sent also at a 100mS interval per connection for a throughput of 4.7-4.8kbps. Then up to 5 centrals, each with 10 peripheral connections each (total 50 peripherals) are operated simultaneously. The connections are all made under control of the script with a table of known peripherals for each central. This way I ensure the connections are exactly the same for each test (a given node is the same connection # on a central each test run). The connection interval is 100mS. From: Vinayak Kariappa Chettimada <[email protected]> Sent: Friday, June 28, 2019 8:45 AM To: zephyrproject-rtos/zephyr <[email protected]> Cc: hwagner66 <[email protected]>; Mention <[email protected]> Subject: Re: [zephyrproject-rtos/zephyr] BLE Throughput (#15171) EXTERNAL EMAIL: This email originated outside of Laird. Be careful with attachments and links. @hwagner66<https://github.com/hwagner66> I have done some changes #17097<#17097> in controller implementation related to connection event length. Is it possible for you to share the test procedure/script so that I can try to reproduce your observations? You could also help me test your scenarios at your end. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#15171?email_source=notifications&email_token=AFA5EOFAK3W7MHID54MBEADP4YIWDA5CNFSM4HDN524KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY2DVGA#issuecomment-506739352>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFA5EOAP6Q5UJ34LXIAEEWDP4YIWDANCNFSM4HDN524A>. THIS MESSAGE, ANY ATTACHMENT(S), AND THE INFORMATION CONTAINED HEREIN MAY BE PROPRIETARY TO LAIRD CONNECTIVITY, INC. AND/OR ANOTHER PARTY, AND MAY FURTHER BE INTENDED TO BE KEPT CONFIDENTIAL. IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE DELETE THE EMAIL AND ANY ATTACHMENTS, AND IMMEDIATELY NOTIFY THE SENDER BY RETURN EMAIL. THIS MESSAGE AND ITS CONTENTS ARE THE PROPERTY OF LAIRD CONNECTIVITY, INC. AND MAY NOT BE REPRODUCED OR USED WITHOUT THE EXPRESS WRITTEN CONSENT OF LAIRD CONNECTIVITY, INC.

Fix the controller implementation to perform connection event length reservation based on the completed Data Length Update and/or PHY Update Procedure. This fix with avoid states/roles from stepping on each others event length. Connection would have supervision timed out or have stalled data transmissions due to insufficient reserved air time. Relates to zephyrproject-rtos#15171. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

Fix the controller implementation to perform connection event length reservation based on the completed Data Length Update and/or PHY Update Procedure. This fix with avoid states/roles from stepping on each others event length. Connection would have supervision timed out or have stalled data transmissions due to insufficient reserved air time. Relates to #15171. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

Fix the controller implementation to perform connection event length reservation based on the completed Data Length Update and/or PHY Update Procedure. This fix with avoid states/roles from stepping on each others event length. Connection would have supervision timed out or have stalled data transmissions due to insufficient reserved air time. Relates to zephyrproject-rtos#15171. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

Fix the controller implementation to perform connection event length reservation based on the completed Data Length Update and/or PHY Update Procedure. This fix with avoid states/roles from stepping on each others event length. Connection would have supervision timed out or have stalled data transmissions due to insufficient reserved air time. Relates to #15171. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>

aescolar · 2020-02-14T12:28:13Z

It would seem this is now resolved, please reopen if you disagree.

carlescufi added the question label Apr 3, 2019

carlescufi assigned joerchan and cvinayak Apr 3, 2019

carlescufi added the area: Bluetooth label Apr 3, 2019

joerchan removed their assignment Jun 12, 2019

aescolar closed this as completed Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLE Throughput #15171

BLE Throughput #15171

hwagner66 commented Apr 3, 2019

hwagner66 commented Apr 4, 2019

cvinayak commented Apr 8, 2019

carlescufi commented Apr 11, 2019

hwagner66 commented Apr 16, 2019

cvinayak commented May 2, 2019

hwagner66 commented May 3, 2019 via email •

edited

Loading

hwagner66 commented May 23, 2019

cvinayak commented May 24, 2019

hwagner66 commented May 24, 2019 via email

hwagner66 commented May 28, 2019

cvinayak commented May 28, 2019

hwagner66 commented May 28, 2019 via email

hwagner66 commented May 28, 2019

hwagner66 commented May 30, 2019

cvinayak commented Jun 28, 2019

hwagner66 commented Jun 28, 2019 via email

aescolar commented Feb 14, 2020

BLE Throughput #15171

BLE Throughput #15171

Comments

hwagner66 commented Apr 3, 2019

hwagner66 commented Apr 4, 2019

cvinayak commented Apr 8, 2019

carlescufi commented Apr 11, 2019

hwagner66 commented Apr 16, 2019

cvinayak commented May 2, 2019

hwagner66 commented May 3, 2019 via email • edited Loading

hwagner66 commented May 23, 2019

cvinayak commented May 24, 2019

hwagner66 commented May 24, 2019 via email

hwagner66 commented May 28, 2019

cvinayak commented May 28, 2019

hwagner66 commented May 28, 2019 via email

hwagner66 commented May 28, 2019

hwagner66 commented May 30, 2019

cvinayak commented Jun 28, 2019

hwagner66 commented Jun 28, 2019 via email

aescolar commented Feb 14, 2020

hwagner66 commented May 3, 2019 via email •

edited

Loading