Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HCI-UART: Bluetooth ACL data packets with 251 bytes not acknowledged #21854

Closed
AdrianEggenberger opened this issue Jan 13, 2020 · 11 comments
Closed
Assignees
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@AdrianEggenberger
Copy link

Describe the bug
I'm using the Zephyr HCI UART sample on an PCA10040 board as a HCI controller. As HCI host I've written an application on Windows that communicates over the HCI UART with the PCA10040 board and acts as a BLE host.

When a BLE peripehral is connected and data length extentions are activated (MTU => 251 bytes) sent ACL data packets with size of 251 bytes are not confirmed. If only 250 bytes are sent the confirmation is received by getting an HCI event "Number Of Completed Packets Event".

To Reproduce

Steps to reproduce the behavior:

  1. Modify the configuration for the HCI UART sample with the following values:
    BT_RX_BUF_LEN = 260
    BT_CTLR_TX_BUFFER_SIZE = 251
    BT_CTLR_DATA_LENGTH_MAX = 251
    Screenshot of the configuration and a .config file is attached.
    ZephyrKernelConfiguration.zip

  2. Create Bluetooth Low Energy connection to a peripheral with DLE support for 251 bytes.

  3. Trigger data length update to enhance the MTU to 251 bytes.

  4. Send ACL data with a length of 250 bytes (e.g. an ATT write command with 243bytes data.

  5. The sent ACL data is confirmed by a "Number Of Completed Packets Event" HCI event.

  6. Send ACL data with a length of 251 bytes (e.g. an ATT write command with 244bytes data.

  7. The sent ACL data is NOT confirmed by a "Number Of Completed Packets Event" HCI event.

Expected behavior
It is expected that the HCI UART device sends back an "Number Of Completed Packets Event" HCI event after sending the data to the peer device.

Environment:

  • OS: Windows
  • Toolchain: Zephyr SDK
  • Commit: d24e5e5
@AdrianEggenberger AdrianEggenberger added the bug The issue is a bug, or the PR is fixing a bug label Jan 13, 2020
@aescolar
Copy link
Member

CC @joerchan @cvinayak @carlescufi

@joerchan
Copy link
Contributor

@AdrianEggenberger Could you enable CONFIG_ASSERT=y and check if you run into any asserts? You have to use the very latest commit, otherwise you need to enable CONFIG_NET_BUF_LOG=y to get the asserts in net_buf. Then we could see if any buffers are being overrun.

@AdrianEggenberger
Copy link
Author

I've modified the configuration with the ASSERT and the NET_BUF_LOG support and tried again. There was no assert.

To get more information I've switched from HCI-UART to HCI-USB. This allowed to have a CONSOLE output over UART while using the USB interface for HCI. There error behavior is the same wiht HCI-USB.

I've created a log file with the same operation, but one time with the ACL data limited to 250 bytes and one time with the full 251 bytes.
ACL-DATA-250Bytes.log
ACL-DATA-251Bytes.log

Sending the 250 bytes works and there is always a "Num Complete" following the bt_send:
[00:00:21.904,754] bt_hci_raw.bt_send: buf 0x20006e38 len 254
[00:00:21.904,785] bt_ctlr_hci_driver.hci_driver_send: enter
[00:00:21.904,876] bt_ctlr_hci_driver.hci_driver_send: exit: 0
[00:00:21.905,059] net_buf: bt_buf_get_rx():54: Pool 0x20008118 blocked for 0 secs
[00:00:21.905,090] bt_ctlr_hci_driver.prio_recv_thread: Num Complete: 0x0000:1

If 251 bytes are sent there is no "Num Complete" entry following.
[00:00:55.236,114] bt_hci_raw.bt_send: buf 0x20006e20 len 255
[00:00:55.236,145] bt_ctlr_hci_driver.hci_driver_send: enter
[00:00:55.236,206] bt_ctlr_hci_driver.hci_driver_send: exit: 0

@cvinayak cvinayak self-assigned this Jan 13, 2020
@jhedberg jhedberg added the priority: medium Medium impact/importance bug label Jan 14, 2020
@cvinayak cvinayak removed their assignment Jan 15, 2020
@cvinayak
Copy link
Contributor

@AdrianEggenberger as I dont have your host, I have tested a modified Zephyr peripheral and central_hr application transmitting 251 bytes of HCI data payload (or 247 bytes ATT MTU) for GATT write without response. Above case, both with and without Data Length Update to 251 bytes PDUs.

Both the cases have no issue with 251 bytes of HCI data payload (247 ATT + 4 L2CAP header). Also, Bluetooth Conformance Tests pass for 251 byte transmissions.

Do let me know the peer device you are using...

central_hr_peripheral_full_duplex.zip

@AdrianEggenberger
Copy link
Author

After your test that the transfer seems to work properly I decided to setup a test environment that can be attached to this issue. I setup a minor application that connects to a peripheral, does DLE exchange to set the MTU to 251 and then initiates an ACL transfer with 251 bytes (ATT Write Command with 244 bytes). The strange thing is that in this simplified setup the transfer works as expected... So the failure behavior seems to be dependent to the state of the device/connection.

After this finding I changed back to the environment where the failure occurs. I tried to get deeper into the Zephyr HCi-USB sample and extended debugging to be able to deliver more facts about the issue. What I found out is that:

  • The ACL packet is sent over the air as it is visible in the Sodera Protocol Analyzer (see screenshot)
    ACL-TX-251Bytes-PDU
  • The "Num Complete" is not triggered

By adding more debug code to the source code I've been able to track down the issue to the function "tx_cmplt_get" in the ull.c module. There exists a check to the ll_id for values of PDU_DATA_LLID_DATA_START or PDU_DATA_LLID_DATA_CONTINUE. In the error case the packets are also there, but they are not counted by this check as the ll_id of this packet is 0 (== PDU_DATA_LLID_RESV). What needs to happen that an ACL packet is declared as "completed"?

@cvinayak
Copy link
Contributor

@AdrianEggenberger I am suspecting memory overflow in one of the HCI data when being copied into controller's tx node, this can cause an already enqueued tx node's ll_id being overwritten.

Please attach the sniffer log, so I can analyse it.

@AdrianEggenberger
Copy link
Author

A memory overflow sounds like a reasonable explanation. In the logs previously added in my second post there is no line that would indicate that.

Or do you mean a different log? What settings should i set for the log?

I‘m still trying to create a simple setup to be able to reproduce the issue, but till now without success.

@cvinayak
Copy link
Contributor

Or do you mean a different log? What settings should i set for the log?

I meant the sniffer log, or in your case Sodera Protocol Analyzer logs

I‘m still trying to create a simple setup to be able to reproduce the issue, but till now without success.

If you share the (simple) application when ever you have it, usually I should be able to analyze the issue and provide a solution quicker.

@AdrianEggenberger
Copy link
Author

I've managed to create an environment that should allow to reproduce the issue. In the attached zip-file you may find firmware for the peripheral, the HCI controller (=> zephyr HCI USB) and a .NET application that acts as HCI Host. For instruction see the "readme.txt" file in the the Zip. Also in the Zip-File are logs and recordings of the SoderaLE taken with this setup on my side.

IssueReproduction.zip

@AdrianEggenberger
Copy link
Author

I today went back to this issue to get deeper into the topic. I checked out the latest release v2.2.0-rc1 und repeated my test that I've created for this issue. In short the issue seems to be fixed in the latest release.

Were there changes in the last weeks that explain the fixed behavior?

Anyhow as it's fixed now we can close the issue.

@cvinayak
Copy link
Contributor

@AdrianEggenberger Thank you for testing. It could be 51c46d9 could be related, and have fixed the issue you see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

5 participants