-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: Fix various deadlock issues when running low on buffers #16870
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,8 @@ | |
#define H4_SCO 0x03 | ||
#define H4_EVT 0x04 | ||
|
||
#define BUF_DISCARDABLE BIT(7) | ||
|
||
static K_THREAD_STACK_DEFINE(rx_thread_stack, CONFIG_BT_RX_STACK_SIZE); | ||
static struct k_thread rx_thread_data; | ||
|
||
|
@@ -159,15 +161,82 @@ static void reset_rx(void) | |
|
||
static struct net_buf *get_rx(int timeout) | ||
{ | ||
struct net_buf *buf; | ||
|
||
BT_DBG("type 0x%02x, evt 0x%02x", rx.type, rx.evt.evt); | ||
|
||
if (rx.type == H4_EVT) { | ||
return bt_buf_get_evt(rx.evt.evt, timeout); | ||
buf = bt_buf_get_evt(rx.evt.evt, timeout); | ||
} else { | ||
buf = bt_buf_get_rx(BT_BUF_ACL_IN, timeout); | ||
} | ||
|
||
if (buf) { | ||
if (rx.discardable) { | ||
buf->flags |= BUF_DISCARDABLE; | ||
} else { | ||
buf->flags &= ~BUF_DISCARDABLE; | ||
} | ||
} | ||
|
||
return bt_buf_get_rx(BT_BUF_ACL_IN, timeout); | ||
return buf; | ||
} | ||
|
||
#if (CONFIG_BT_H4_DISCARD_RX_WAIT >= 0) | ||
static struct net_buf *get_rx_blocking(void) | ||
{ | ||
sys_slist_t list = SYS_SLIST_STATIC_INIT(&list); | ||
bool discarded = false; | ||
struct net_buf *buf; | ||
|
||
/* If we have ACL flow control enabled then ACL buffers come from a | ||
* dedicated pool, and we cannot benefit from trying to discard | ||
* events from the RX queue. | ||
*/ | ||
if (IS_ENABLED(CONFIG_BT_HCI_ACL_FLOW_CONTROL) && rx.type == H4_ACL) { | ||
return get_rx(K_FOREVER); | ||
} | ||
|
||
buf = get_rx(CONFIG_BT_H4_DISCARD_RX_WAIT); | ||
if (buf) { | ||
return buf; | ||
} | ||
|
||
BT_WARN("Attempting to discard packets from RX queue"); | ||
|
||
while ((buf = net_buf_get(&rx.fifo, K_NO_WAIT))) { | ||
if (!discarded && (buf->flags & BUF_DISCARDABLE)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this seems to discard only one buffer at time? If that is the correct behavior perhaps you can check if there have been anything added to the list, if there isn't anything that means the order is preserved and we can return interrupting the while loop since you don't have to call k_fifo_put_slist. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not quite. Even if we don't find anything to discard we'll have emptied rx.fifo through the net_buf_get calls. So we anyway have to do the put_slist(). And this code is discarding at most one buffer to minimise packet loss for something like mesh that's waiting for advertising reports. |
||
net_buf_unref(buf); | ||
discarded = true; | ||
} else { | ||
/* Since we use k_fifo_put_slist() instead of | ||
* net_buf APIs to insert back to the FIFO we | ||
* cannot support fragmented buffers. This should | ||
* never happen but better to "document" it in the | ||
* form of an assert here. | ||
*/ | ||
__ASSERT(!(buf->flags & NET_BUF_FRAGS), | ||
"Fragmented buffers not supported"); | ||
sys_slist_append(&list, &buf->node); | ||
} | ||
} | ||
|
||
if (!discarded) { | ||
BT_WARN("Unable to discard anything from the RX queue"); | ||
} | ||
|
||
/* Insert non-discarded packets back to the RX queue */ | ||
k_fifo_put_slist(&rx.fifo, &list); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @andyross asking you since you seem to have touched the k_queue code most recently. The above code lines, starting with line 205, are iterating the contents of a k_fifo with the aim to remove one element from there. In this place of the code we have a guarantee that no other thread or ISR will attempt to access the k_fifo while we do this, however in the absence of any more suitable public APIs I'm having to remove each element one-by one and then re-insert the non-discarded elements using |
||
|
||
return get_rx(K_FOREVER); | ||
} | ||
#else | ||
static inline struct net_buf *get_rx_blocking(void) | ||
{ | ||
return get_rx(K_FOREVER); | ||
} | ||
#endif | ||
|
||
static void rx_thread(void *p1, void *p2, void *p3) | ||
{ | ||
struct net_buf *buf; | ||
|
@@ -186,7 +255,7 @@ static void rx_thread(void *p1, void *p2, void *p3) | |
* original command buffer (if available). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this comment need updating now that there is a separate pool for num complete? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure. To be honest, I don't fully understand how the host stack works. |
||
*/ | ||
if (rx.have_hdr && !rx.buf) { | ||
rx.buf = get_rx(K_FOREVER); | ||
rx.buf = get_rx_blocking(); | ||
BT_DBG("Got rx.buf %p", rx.buf); | ||
if (rx.remaining > net_buf_tailroom(rx.buf)) { | ||
BT_ERR("Not enough space in buffer"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the nature of such patches is to make sure that buffer-getting functions are never called with K_FOREVER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario here is that we're in the RX thread with the UART RX interrupt disabled, i.e. there's nothing else that can be done than to sit and wait for a buffer. With ACL flow control this should be guaranteed to succeed since otherwise the controller is sending us an ACL packet when we haven't given it permission (credits) to do so. Another option would be K_NO_WAIT + ASSERT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's always something to do, e.g. to report a warning after some period of time, that this extended wait happens.
That said, that's just a random comment from someone not familiar with the BT subsys. (In IP networking, I believe we decided that we don't want any K_FOREVER's, and it was even implemented IIRC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interrestingm I was thinking something similar sometime ago, though if we just wake up the threads just to make them wait again that perhaps would change the order in the waiters list so we it might have to be done without touching the waiters list.