Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: Fix various deadlock issues when running low on buffers #16870

Closed
wants to merge 3 commits into from

Conversation

jhedberg
Copy link
Member

@jhedberg jhedberg commented Jun 17, 2019

The first and third patches are for fixing potential deadlocks when running low on buffers. The Number of Completed Packets patch (the first one) affects pretty much all HCI drivers, including the native controller one. The last patch only affects the H:4 HCI driver. The middle patch (second one) is to make it valid to use some of the upper bits of buf->flags as extended user data.

Fixes #16864

@jhedberg jhedberg added bug The issue is a bug, or the PR is fixing a bug area: Bluetooth labels Jun 17, 2019
@jhedberg jhedberg requested a review from thoh-ot as a code owner June 17, 2019 12:59
@jhedberg jhedberg added the DNM This PR should not be merged (Do Not Merge) label Jun 17, 2019
@jhedberg
Copy link
Member Author

Added DNM label until it's verified that this actually fixes the issue and nobody is unhappy with the added HCI driver API for allocating event buffers.

@zephyrbot zephyrbot added area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test labels Jun 17, 2019
@zephyrproject-rtos zephyrproject-rtos deleted a comment from zephyrbot Jun 17, 2019
@zephyrbot
Copy link
Collaborator

zephyrbot commented Jun 17, 2019

All checks are passing now.

Review history of this comment for details about previous failed status.
Note that some checks might have not completed yet.

@jhedberg jhedberg force-pushed the num_cmplt_pool branch 3 times, most recently from 405e82d to a9bd744 Compare June 18, 2019 07:55
}

/* Insert non-discarded packets back to the RX queue */
k_fifo_put_slist(&rx.fifo, &list);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross asking you since you seem to have touched the k_queue code most recently. The above code lines, starting with line 205, are iterating the contents of a k_fifo with the aim to remove one element from there. In this place of the code we have a guarantee that no other thread or ISR will attempt to access the k_fifo while we do this, however in the absence of any more suitable public APIs I'm having to remove each element one-by one and then re-insert the non-discarded elements using k_fifo_put_slist(). What I'd really want to do is the equivalent of SYS_SFLIST_FOR_EACH_NODE() and sys_sflist_remove() and stop iterating once a matching element is found and removed, however currently that requires accessing what I assume is private data of k_fifo (and k_queue). Would that still be ok, or any cleaner way to solve this?

This event is a priority one, so it's not safe to have it use the RX
buffer pool which may be depleted due to non-priority events (e.g.
advertising events). Since the event is consumed synchronously it's
safe to have a single-buffer pool for it. Also introduce a new
bt_buf_get_evt() API for HCI drivers to simplify the driver-side code,
this effectively also deprecates bt_buf_get_cmd_complete() which now
has no in-tree HCI driver users anymore.

Fixes zephyrproject-rtos#16864

Signed-off-by: Johan Hedberg <[email protected]>
@jhedberg jhedberg changed the title Bluetooth: Add dedicated pool for HCI_Num_Completed_Packets HCI event Bluetooth: Fix various deadlock issues when running low on buffers Jun 19, 2019
@jhedberg
Copy link
Member Author

@jukkar I've added a net_buf patch here to make it officially ok to use some bits of buf->flags as extended user data, could you take a look?

@carlescufi @joerchan since you use the H:4 driver on the nRF91 it'd be good if you also check at least the changes I'm doing to that driver.

@xiaoliang314
Copy link

Through my tests, the latest patches work well on my system.

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to net_buf changes.

@@ -474,6 +474,12 @@ static inline void net_buf_simple_restore(struct net_buf_simple *buf,
*/
#define NET_BUF_EXTERNAL_DATA BIT(1)

/**
* Bitmask of the ower bits of buf->flags that are reserved for net_buf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: ower.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix, thanks

* events from the RX queue.
*/
if (IS_ENABLED(CONFIG_BT_HCI_ACL_FLOW_CONTROL) && rx.type == H4_ACL) {
return get_rx(K_FOREVER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the nature of such patches is to make sure that buffer-getting functions are never called with K_FOREVER.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario here is that we're in the RX thread with the UART RX interrupt disabled, i.e. there's nothing else that can be done than to sit and wait for a buffer. With ACL flow control this should be guaranteed to succeed since otherwise the controller is sending us an ACL packet when we haven't given it permission (credits) to do so. Another option would be K_NO_WAIT + ASSERT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. there's nothing else that can be done than to sit and wait for a buffer.

There's always something to do, e.g. to report a warning after some period of time, that this extended wait happens.

That said, that's just a random comment from someone not familiar with the BT subsys. (In IP networking, I believe we decided that we don't want any K_FOREVER's, and it was even implemented IIRC).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.

Interrestingm I was thinking something similar sometime ago, though if we just wake up the threads just to make them wait again that perhaps would change the order in the waiters list so we it might have to be done without touching the waiters list.

@pfalcon
Copy link
Contributor

pfalcon commented Jun 19, 2019

Add support to iterate the RX queue and discard (discardable) packets
in case the RX buffer pool is empty. This will eliminate potential
deadlock scenarios.

"Make less likely", perhaps. "Eliminate", unlikely.

Johan Hedberg added 2 commits June 19, 2019 13:16
Currently net_buf only needs the lower two bits. It may be useful to
let users of the API take advantage of some bits as a form of extended
user data (if using user data isn't appropriate or there's no space
there).

Signed-off-by: Johan Hedberg <[email protected]>
Add support to iterate the RX queue and discard (discardable) packets
in case the RX buffer pool is empty. This will make potential deadlock
scenarios less likely.

Signed-off-by: Johan Hedberg <[email protected]>
@jhedberg jhedberg removed the DNM This PR should not be merged (Do Not Merge) label Jun 19, 2019
@jhedberg
Copy link
Member Author

@carlescufi @Vudentz @joerchan this PR has been left hanging around - it has an informal approval from the person who reported the original issue, as well as one formal approval, however nothing from actual Bluetooth maintainers yet.

@carlescufi
Copy link
Member

I've read through the original issue and then through the code. I see that the new patch for number of completed packets addresses the issue with unblocking the TX thread that might be waiting on those. But what is the need for BT_H4_DISCARD_RX_WAIT? What is the deadlock that we are solving there, given that, if there are discardable packets, those will be processed and will free items in the rx queue?

BT_WARN("Attempting to discard packets from RX queue");

while ((buf = net_buf_get(&rx.fifo, K_NO_WAIT))) {
if (!discarded && (buf->flags & BUF_DISCARDABLE)) {
Copy link
Contributor

@Vudentz Vudentz Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems to discard only one buffer at time? If that is the correct behavior perhaps you can check if there have been anything added to the list, if there isn't anything that means the order is preserved and we can return interrupting the while loop since you don't have to call k_fifo_put_slist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. Even if we don't find anything to discard we'll have emptied rx.fifo through the net_buf_get calls. So we anyway have to do the put_slist(). And this code is discarding at most one buffer to minimise packet loss for something like mesh that's waiting for advertising reports.

@jhedberg
Copy link
Member Author

what is the need for BT_H4_DISCARD_RX_WAIT? What is the deadlock that we are solving there, given that, if there are discardable packets, those will be processed and will free items in the rx queue?

@carlescufi IIRC I explained this on Slack already, but I'll repeat here:

There may be multiple pending HCI packets that the controller wants to send to the host. At the same time the host might be stuck waiting for something like a Number of Completed Packets event. If that event is the next one in the queue for the controller to send to the host then all is fine (as guaranteed by the other patch in this PR), but if there are other, non-discardable, packets before it then we have to be able to receive those packets in order to eventually receive the Number of Completed Packets event. By digging into the host RX queue we're able to free up buffers to receive the pending packets from the controller.

@jhedberg
Copy link
Member Author

jhedberg commented Jun 26, 2019

what is the need for BT_H4_DISCARD_RX_WAIT?

@carlescufi seems I failed to address this specific question: the purpose of this initial wait is that digging into the RX queue is expensive. So we give the other threads in the system a chance to process and free up buffers before commencing our own expensive free-up operation.

I.e. this wait isn't necessary, but I was thinking it might give better performance. If the wait is successful it also means we don't have to loose any data from the controller.

@carlescufi
Copy link
Member

what is the need for BT_H4_DISCARD_RX_WAIT? What is the deadlock that we are solving there, given that, if there are discardable packets, those will be processed and will free items in the rx queue?

@carlescufi IIRC I explained this on Slack already, but I'll repeat here:

There may be multiple pending HCI packets that the controller wants to send to the host. At the same time the host might be stuck waiting for something like a Number of Completed Packets event. If that event is the next one in the queue for the controller to send to the host then all is fine (as guaranteed by the other patch in this PR), but if there are other, non-discardable, packets before it then we have to be able to receive those packets in order to eventually receive the Number of Completed Packets event. By digging into the host RX queue we're able to free up buffers to receive the pending packets from the controller.

Right, but where is the actual deadlock? why is a pending Number of Completed Packets, preventing us from processing other, unrelated events? Since the memory pools are separate for events and data anyway, that's the bit I don't understand. Sorry if you already explained this and I missed it.

@jhedberg
Copy link
Member Author

Right, but where is the actual deadlock? why is a pending Number of Completed Packets, preventing us from processing other, unrelated events? Since the memory pools are separate for events and data anyway, that's the bit I don't understand. Sorry if you already explained this and I missed it.

@carlescufi
That event releases the TX thread to send more packets to the controller. Until then TX buffers will sit around waiting in the TX queue. If some thread tries to send more data they'll be blocked on trying to allocate a TX buffer.

@carlescufi
Copy link
Member

Right, but where is the actual deadlock? why is a pending Number of Completed Packets, preventing us from processing other, unrelated events? Since the memory pools are separate for events and data anyway, that's the bit I don't understand. Sorry if you already explained this and I missed it.

@carlescufi
That event releases the TX thread to send more packets to the controller. Until then TX buffers will sit around waiting in the TX queue. If some thread tries to send more data they'll be blocked on trying to allocate a TX buffer.

Yes, but how is that a deadlock? The currently allocated event buffers will be processed and freed up by the RX thread regardless of what is happening to the TX thread?

@jhedberg
Copy link
Member Author

Yes, but how is that a deadlock? The currently allocated event buffers will be processed and freed up by the RX thread regardless of what is happening to the TX thread?

They don't have to be processed and freed up by the RX thread. E.g. both in Mesh and L2CAP cases they may be passed to the system workqueue for processing. net_buf's in general have the whole ref/unref feature and the ability to be passed through FIFOs to other contexts. So there's no guarantee that once you've done a bt_recv() call that the buffer will have gotten freed up.

Disclaimer: In all honesty I'm not 100% clear on what the precise deadlock is in #16864 or the various theoretical ways things can deadlock, but I'm fairly confident that we have to try to avoid the H:4 driver from halting the flow of packets from the controller so that we eventually get to process any packets that may unblock threads that are waiting for them.

@carlescufi
Copy link
Member

. So there's no guarantee that once you've done a bt_recv() call that the buffer will have gotten freed up.

Thanks for explaining. I agree that this is safe and reasonable.

@xiaoliang314
Copy link

xiaoliang314 commented Jun 26, 2019

@carlescufi Mesh stack sends two ACL packets in Rx thread synchronously. This action will block Rx thread before numCompleted is received. Because of the continuous reporting of advertising data packets, the Rx buffers is exhausted. Therefore, Tx thread cannot continue to report numCompleted events, resulting in deadlock.

This is the call stack for Rx thread:
#0 __swap (key=0, key@entry=16965509) at /home/ubuntu/zephyr/arch/arm/core/swap.c:68
#1 z_swap_irqlock (key=16965509) at /home/ubuntu/zephyr/kernel/include/kswap.h:128
#2 z_swap (key=..., lock=0x2000346c <k_sys_work_q>) at /home/ubuntu/zephyr/kernel/include/kswap.h:145
#3 z_pend_curr (lock=lock@entry=0x2000346c <k_sys_work_q>, key=..., key@entry=..., wait_q=wait_q@entry=0x0, timeout=timeout@entry=-1) at /home/ubuntu/zephyr/kernel/sched.c:448
#4 z_impl_k_sem_take (sem=sem@entry=0x0, timeout=timeout@entry=-1) at /home/ubuntu/zephyr/kernel/sem.c:160
#5 k_sem_take (timeout=-1, sem=0x0) at /home/ubuntu/zephyr/samples/bluetooth/mesh_test_platform/build/xxxx/zephyr/include/generated/syscalls/kernel.h:103
#6 bt_att_send (conn=0x0, conn@entry=0x20000ae4 , buf=buf@entry=0x200083a8 <net_buf_acl_tx_pool+48>, cb=cb@entry=0x0) at /home/ubuntu/zephyr/subsys/bluetooth/host/att.c:2222
#7 gatt_notify (conn=conn@entry=0x20000ae4 , handle=, data=0x0, data@entry=0x20004980 <rx_thread_stack+3120>, len=len@entry=28, cb=cb@entry=0x0) at /home/ubuntu/zephyr/subsys/bluetooth/host/gatt.c:1167
#8 bt_gatt_notify_cb (conn=conn@entry=0x20000ae4 , attr=0x20009320 <proxy_attrs+80>, attr@entry=0x2000930c <proxy_attrs+60>, data=0x20004980 <rx_thread_stack+3120>, len=, func=func@entry=0x0) at /home/ubuntu/zephyr/subsys/bluetooth/host/gatt.c:1379
#9 bt_gatt_notify (len=, data=, attr=0x2000930c <proxy_attrs+60>, conn=0x20000ae4 ) at /home/ubuntu/zephyr/include/bluetooth/gatt.h:759
#10 proxy_send (conn=conn@entry=0x20000ae4 , data=, len=) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/proxy.c:893
#11 proxy_segment_and_send (conn=conn@entry=0x20000ae4 , type=type@entry=0 '\000', msg=msg@entry=0x20004974 <rx_thread_stack+3108>) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/proxy.c:918
#12 bt_mesh_proxy_send (conn=0x20000ae4 , type=type@entry=0 '\000', msg=msg@entry=0x20004974 <rx_thread_stack+3108>) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/proxy.c:955
#13 bt_mesh_proxy_relay (buf=buf@entry=0x200086b4 <net_buf_adv_buf_pool+56>, dst=) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/proxy.c:878
#14 bt_mesh_net_send (tx=tx@entry=0x20004a64 <rx_thread_stack+3348>, buf=buf@entry=0x200086ac <net_buf_adv_buf_pool+48>, cb=0x1030210 <seg_sent_cb>, cb_data=cb_data@entry=0x200001d8 <seg_tx>) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/net.c:885
#15 send_seg (net_tx=net_tx@entry=0x20004a64 <rx_thread_stack+3348>, sdu=sdu@entry=0x20004a90 <rx_thread_stack+3392>, cb=cb@entry=0x0, cb_data=cb_data@entry=0x0) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/transport.c:411
#16 bt_mesh_trans_send (tx=tx@entry=0x20004a64 <rx_thread_stack+3348>, msg=msg@entry=0x20004a90 <rx_thread_stack+3392>, cb=cb@entry=0x0, cb_data=cb_data@entry=0x0) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/transport.c:507
#17 model_send (model=model@entry=0x20008a6c <root_models>, tx=tx@entry=0x20004a64 <rx_thread_stack+3348>, implicit_bind=implicit_bind@entry=false, msg=msg@entry=0x20004a90 <rx_thread_stack+3392>, cb=cb@entry=0x0, cb_data=cb_data@entry=0x0) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/access.c:638
#18 bt_mesh_model_send (model=model@entry=0x20008a6c <root_models>, ctx=ctx@entry=0x20004be8 <rx_thread_stack+3736>, msg=msg@entry=0x20004a90 <rx_thread_stack+3392>, cb=cb@entry=0x0, cb_data=cb_data@entry=0x0) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/access.c:654
#19 dev_comp_data_get (model=0x20008a6c <root_models>, ctx=0x20004be8 <rx_thread_stack+3736>, buf=) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/cfg_srv.c:198
#20 bt_mesh_model_recv (rx=rx@entry=0x20004be4 <rx_thread_stack+3732>, buf=buf@entry=0x20004b28 <rx_thread_stack+3544>) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/access.c:579
#21 sdu_recv (rx=rx@entry=0x20004be4 <rx_thread_stack+3732>, seq=1, hdr=, aszmic=aszmic@entry=0 '\000', buf=buf@entry=0x20004bfc <rx_thread_stack+3756>) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/transport.c:627
#22 trans_unseg (buf=buf@entry=0x20004bfc <rx_thread_stack+3756>, rx=rx@entry=0x20004be4 <rx_thread_stack+3732>, seq_auth=seq_auth@entry=0x20004bc0 <rx_thread_stack+3696>) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/transport.c:898
#23 bt_mesh_trans_recv (buf=buf@entry=0x20004bfc <rx_thread_stack+3756>, rx=rx@entry=0x20004be4 <rx_thread_stack+3732>) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/transport.c:1400
#24 bt_mesh_net_recv (data=data@entry=0x20009218 <clients+24>, rssi=rssi@entry=0 '\000', net_if=net_if@entry=BT_MESH_NET_IF_PROXY) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/net.c:1324
#25 proxy_complete_pdu (client=client@entry=0x20009200 ) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/proxy.c:405
#26 proxy_recv (conn=, attr=, buf=0x20008261 <net_buf_data_hci_rx_pool+109>, len=, offset=0, flags=2 '\002') at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/proxy.c:467
#27 write_cb (attr=0x200092f8 <proxy_attrs+40>, user_data=0x20004c98 <rx_thread_stack+3912>) at /home/ubuntu/zephyr/subsys/bluetooth/host/att.c:1225
#28 bt_gatt_foreach_attr (start_handle=start_handle@entry=25, end_handle=end_handle@entry=25, func=func@entry=0x10287f5 <write_cb>, user_data=user_data@entry=0x20004c98 <rx_thread_stack+3912>) at /home/ubuntu/zephyr/subsys/bluetooth/host/gatt.c:935
#29 att_write_rsp (conn=conn@entry=0x20000ae4 , req=req@entry=0 '\000', rsp=rsp@entry=0 '\000', handle=, offset=offset@entry=0, value=0x20008261 <net_buf_data_hci_rx_pool+109>, len=22 '\026') at /home/ubuntu/zephyr/subsys/bluetooth/host/att.c:1268
#30 att_write_cmd (att=, buf=0x200081ac <net_buf_hci_rx_pool+24>) at /home/ubuntu/zephyr/subsys/bluetooth/host/att.c:1500
#31 bt_att_recv (chan=0x20000c18 <bt_req_pool>, buf=0x200081ac <net_buf_hci_rx_pool+24>) at /home/ubuntu/zephyr/subsys/bluetooth/host/att.c:1947
#32 l2cap_chan_recv (chan=, buf=buf@entry=0x200081ac <net_buf_hci_rx_pool+24>) at /home/ubuntu/zephyr/subsys/bluetooth/host/l2cap.c:1573
#33 bt_l2cap_recv (conn=conn@entry=0x20000ae4 , buf=buf@entry=0x200081ac <net_buf_hci_rx_pool+24>) at /home/ubuntu/zephyr/subsys/bluetooth/host/l2cap.c:1606
#34 bt_conn_recv (conn=conn@entry=0x20000ae4 , buf=buf@entry=0x200081ac <net_buf_hci_rx_pool+24>, flags=flags@entry=2 '\002') at /home/ubuntu/zephyr/subsys/bluetooth/host/conn.c:1149
#35 hci_acl (buf=buf@entry=0x200081ac <net_buf_hci_rx_pool+24>) at /home/ubuntu/zephyr/subsys/bluetooth/host/hci_core.c:551
#36 hci_rx_thread () at /home/ubuntu/zephyr/subsys/bluetooth/host/hci_core.c:4683
#37 z_thread_entry (entry=0x1005b11 <hci_rx_thread>, p1=, p2=, p3=) at /home/ubuntu/zephyr/lib/os/thread_entry.c:29
#38 0xaaaaaaaa in ?? ())

@jhedberg
Copy link
Member Author

hci_rx_thread () at /home/ubuntu/zephyr/subsys/bluetooth/host/hci_core.c:4683

@xiaoliang314 this is something I don't understand. In #16864 you said you're using the H4 driver, which enables CONFIG_BT_RECV_IS_RX_THREAD. This option in turn disables hci_rx_thread in hci_core.c, so I don't understand how you have that in your backtrace?

@xiaoliang314
Copy link

@jhedberg Sorry, I may not have told you more details. On our chip, we transplanted the Bluetooth controller. The HCI driver of our controller reference the H4 driver. But the actual H4 interface is not used.

@@ -191,7 +255,7 @@ static void rx_thread(void *p1, void *p2, void *p3)
* original command buffer (if available).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment need updating now that there is a separate pool for num complete?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. To be honest, I don't fully understand how the host stack works.

@jhedberg jhedberg added the DNM This PR should not be merged (Do Not Merge) label Jun 26, 2019
@jhedberg
Copy link
Member Author

The above comment brings up an important point though: if it's the bt_recv() call that's blocking there's still an issue since it's in the same thread that we try to free up buffers, i.e. the freeing would need to be moved to some independent thread. Since the problem seems to be fairly complex I'm thinking I'll move the less controversial patches to a separate PR that could be merged first (i.e. the new one-buffer pool and the net_buf flags usage)

@jhedberg
Copy link
Member Author

Closing this PR - I'll open a new one for the num_completed pool and some patches to a different approach I'm planning to take. This also means that no change is needed to the net_buf API (the flags stuff) for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Bluetooth area: Networking area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug DNM This PR should not be merged (Do Not Merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bluetooth: Mesh: Rx buffer exhaustion causes deadlock
7 participants