Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: Fix various deadlock issues when running low on buffers #16870

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions drivers/bluetooth/hci/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,20 @@ config BT_UART_ON_DEV_NAME
This option specifies the name of UART device to be used
for Bluetooth.

if BT_H4

config BT_H4_DISCARD_RX_WAIT
int "How many milliseconds to initially wait for RX buffers"
range -1 6000000
default 100
help
This option specifies how many milliseconds the H4 HCI driver
will initially wait for RX buffers to become available before
attempting to discard discardable buffers from the RX queue. Set
to -1 (K_FOREVER) to disable the feature.

endif # BT_H4

if BT_SPI

config BT_BLUENRG_ACI
Expand Down
80 changes: 72 additions & 8 deletions drivers/bluetooth/hci/h4.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
#define H4_SCO 0x03
#define H4_EVT 0x04

#define BUF_DISCARDABLE BIT(7)

static K_THREAD_STACK_DEFINE(rx_thread_stack, CONFIG_BT_RX_STACK_SIZE);
static struct k_thread rx_thread_data;

Expand Down Expand Up @@ -159,19 +161,81 @@ static void reset_rx(void)

static struct net_buf *get_rx(int timeout)
{
struct net_buf *buf;

BT_DBG("type 0x%02x, evt 0x%02x", rx.type, rx.evt.evt);

if (rx.type == H4_EVT && (rx.evt.evt == BT_HCI_EVT_CMD_COMPLETE ||
rx.evt.evt == BT_HCI_EVT_CMD_STATUS)) {
return bt_buf_get_cmd_complete(timeout);
if (rx.type == H4_EVT) {
buf = bt_buf_get_evt(rx.evt.evt, timeout);
} else {
buf = bt_buf_get_rx(BT_BUF_ACL_IN, timeout);
}

if (rx.type == H4_ACL) {
return bt_buf_get_rx(BT_BUF_ACL_IN, timeout);
} else {
return bt_buf_get_rx(BT_BUF_EVT, timeout);
if (buf) {
if (rx.discardable) {
buf->flags |= BUF_DISCARDABLE;
} else {
buf->flags &= ~BUF_DISCARDABLE;
}
}

return buf;
}

#if (CONFIG_BT_H4_DISCARD_RX_WAIT >= 0)
static struct net_buf *get_rx_blocking(void)
{
sys_slist_t list = SYS_SLIST_STATIC_INIT(&list);
bool discarded = false;
struct net_buf *buf;

/* If we have ACL flow control enabled then ACL buffers come from a
* dedicated pool, and we cannot benefit from trying to discard
* events from the RX queue.
*/
if (IS_ENABLED(CONFIG_BT_HCI_ACL_FLOW_CONTROL) && rx.type == H4_ACL) {
return get_rx(K_FOREVER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the nature of such patches is to make sure that buffer-getting functions are never called with K_FOREVER.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario here is that we're in the RX thread with the UART RX interrupt disabled, i.e. there's nothing else that can be done than to sit and wait for a buffer. With ACL flow control this should be guaranteed to succeed since otherwise the controller is sending us an ACL packet when we haven't given it permission (credits) to do so. Another option would be K_NO_WAIT + ASSERT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. there's nothing else that can be done than to sit and wait for a buffer.

There's always something to do, e.g. to report a warning after some period of time, that this extended wait happens.

That said, that's just a random comment from someone not familiar with the BT subsys. (In IP networking, I believe we decided that we don't want any K_FOREVER's, and it was even implemented IIRC).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.

Interrestingm I was thinking something similar sometime ago, though if we just wake up the threads just to make them wait again that perhaps would change the order in the waiters list so we it might have to be done without touching the waiters list.

}

buf = get_rx(CONFIG_BT_H4_DISCARD_RX_WAIT);
if (buf) {
return buf;
}

BT_WARN("Attempting to discard packets from RX queue");

while ((buf = net_buf_get(&rx.fifo, K_NO_WAIT))) {
if (!discarded && (buf->flags & BUF_DISCARDABLE)) {
Copy link
Contributor

@Vudentz Vudentz Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems to discard only one buffer at time? If that is the correct behavior perhaps you can check if there have been anything added to the list, if there isn't anything that means the order is preserved and we can return interrupting the while loop since you don't have to call k_fifo_put_slist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. Even if we don't find anything to discard we'll have emptied rx.fifo through the net_buf_get calls. So we anyway have to do the put_slist(). And this code is discarding at most one buffer to minimise packet loss for something like mesh that's waiting for advertising reports.

net_buf_unref(buf);
discarded = true;
} else {
/* Since we use k_fifo_put_slist() instead of
* net_buf APIs to insert back to the FIFO we
* cannot support fragmented buffers. This should
* never happen but better to "document" it in the
* form of an assert here.
*/
__ASSERT(!(buf->flags & NET_BUF_FRAGS),
"Fragmented buffers not supported");
sys_slist_append(&list, &buf->node);
}
}

if (!discarded) {
BT_WARN("Unable to discard anything from the RX queue");
}

/* Insert non-discarded packets back to the RX queue */
k_fifo_put_slist(&rx.fifo, &list);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross asking you since you seem to have touched the k_queue code most recently. The above code lines, starting with line 205, are iterating the contents of a k_fifo with the aim to remove one element from there. In this place of the code we have a guarantee that no other thread or ISR will attempt to access the k_fifo while we do this, however in the absence of any more suitable public APIs I'm having to remove each element one-by one and then re-insert the non-discarded elements using k_fifo_put_slist(). What I'd really want to do is the equivalent of SYS_SFLIST_FOR_EACH_NODE() and sys_sflist_remove() and stop iterating once a matching element is found and removed, however currently that requires accessing what I assume is private data of k_fifo (and k_queue). Would that still be ok, or any cleaner way to solve this?


return get_rx(K_FOREVER);
}
#else
static inline struct net_buf *get_rx_blocking(void)
{
return get_rx(K_FOREVER);
}
#endif

static void rx_thread(void *p1, void *p2, void *p3)
{
Expand All @@ -191,7 +255,7 @@ static void rx_thread(void *p1, void *p2, void *p3)
* original command buffer (if available).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment need updating now that there is a separate pool for num complete?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. To be honest, I don't fully understand how the host stack works.

*/
if (rx.have_hdr && !rx.buf) {
rx.buf = get_rx(K_FOREVER);
rx.buf = get_rx_blocking();
BT_DBG("Got rx.buf %p", rx.buf);
if (rx.remaining > net_buf_tailroom(rx.buf)) {
BT_ERR("Not enough space in buffer");
Expand Down
11 changes: 1 addition & 10 deletions drivers/bluetooth/hci/h5.c
Original file line number Diff line number Diff line change
Expand Up @@ -408,16 +408,7 @@ static inline struct net_buf *get_evt_buf(u8_t evt)
{
struct net_buf *buf;

switch (evt) {
case BT_HCI_EVT_CMD_COMPLETE:
case BT_HCI_EVT_CMD_STATUS:
buf = bt_buf_get_cmd_complete(K_NO_WAIT);
break;
default:
buf = bt_buf_get_rx(BT_BUF_EVT, K_NO_WAIT);
break;
}

buf = bt_buf_get_evt(evt, K_NO_WAIT);
if (buf) {
net_buf_add_u8(h5.rx_buf, evt);
}
Expand Down
6 changes: 1 addition & 5 deletions drivers/bluetooth/hci/ipm_stm32wb.c
Original file line number Diff line number Diff line change
Expand Up @@ -114,12 +114,8 @@ void TM_EvtReceivedCb(TL_EvtPacket_t *hcievt)
BT_ERR("Unknown evtcode type 0x%02x",
hcievt->evtserial.evt.evtcode);
goto out;
case BT_HCI_EVT_CMD_COMPLETE:
case BT_HCI_EVT_CMD_STATUS:
buf = bt_buf_get_cmd_complete(K_FOREVER);
break;
default:
buf = bt_buf_get_rx(BT_BUF_EVT, K_FOREVER);
buf = bt_buf_get_evt(evtserial.evt.evtcode, K_FOREVER);
break;
}
net_buf_add_mem(buf, &hcievt->evtserial.evt,
Expand Down
7 changes: 2 additions & 5 deletions drivers/bluetooth/hci/spi.c
Original file line number Diff line number Diff line change
Expand Up @@ -353,12 +353,9 @@ static void bt_spi_rx_thread(void)
/* Vendor events are currently unsupported */
bt_spi_handle_vendor_evt(rxmsg);
continue;
case BT_HCI_EVT_CMD_COMPLETE:
case BT_HCI_EVT_CMD_STATUS:
buf = bt_buf_get_cmd_complete(K_FOREVER);
break;
default:
buf = bt_buf_get_rx(BT_BUF_EVT, K_FOREVER);
buf = bt_buf_get_evt(rxmsg[EVT_HEADER_EVENT],
K_FOREVER);
break;
}

Expand Down
11 changes: 3 additions & 8 deletions drivers/bluetooth/hci/userchan.c
Original file line number Diff line number Diff line change
Expand Up @@ -57,16 +57,11 @@ static int bt_dev_index = -1;

static struct net_buf *get_rx(const u8_t *buf)
{
if (buf[0] == H4_EVT && (buf[1] == BT_HCI_EVT_CMD_COMPLETE ||
buf[1] == BT_HCI_EVT_CMD_STATUS)) {
return bt_buf_get_cmd_complete(K_FOREVER);
if (buf[0] == H4_EVT) {
return bt_buf_get_evt(buf[1], K_FOREVER);
}

if (buf[0] == H4_ACL) {
return bt_buf_get_rx(BT_BUF_ACL_IN, K_FOREVER);
} else {
return bt_buf_get_rx(BT_BUF_EVT, K_FOREVER);
}
return bt_buf_get_rx(BT_BUF_ACL_IN, K_FOREVER);
}

static bool uc_ready(void)
Expand Down
12 changes: 12 additions & 0 deletions include/bluetooth/buf.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,18 @@ struct net_buf *bt_buf_get_rx(enum bt_buf_type type, s32_t timeout);
*/
struct net_buf *bt_buf_get_cmd_complete(s32_t timeout);

/** Allocate a buffer for an HCI Event
*
* This will set the buffer type so bt_buf_set_type() does not need to
* be explicitly called before bt_recv_prio() or bt_recv().
*
* @param evt HCI event code
* @param timeout Timeout in milliseconds, or one of the special values
* K_NO_WAIT and K_FOREVER.
* @return A new buffer.
*/
struct net_buf *bt_buf_get_evt(u8_t evt, s32_t timeout);

/** Set the buffer type
*
* @param buf Bluetooth buffer
Expand Down
11 changes: 10 additions & 1 deletion include/net/buf.h
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,12 @@ static inline void net_buf_simple_restore(struct net_buf_simple *buf,
*/
#define NET_BUF_EXTERNAL_DATA BIT(1)

/**
* Bitmask of the lower bits of buf->flags that are reserved for net_buf
* internal use.
*/
#define NET_BUF_FLAGS_MASK BIT_MASK(5)

/**
* @brief Network buffer representation.
*
Expand All @@ -493,7 +499,10 @@ struct net_buf {
/** Reference count. */
u8_t ref;

/** Bit-field of buffer flags. */
/** Bit-field of buffer flags. Lower 5 bits are reserved for net_buf
* internal usage. Upper 3 bits can be freely used externally as
* a form of extended user data.
*/
u8_t flags;

/** Where the buffer should go when freed up. */
Expand Down
4 changes: 2 additions & 2 deletions subsys/bluetooth/controller/hci/hci.c
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ void *hci_cmd_complete(struct net_buf **buf, u8_t plen)
{
struct bt_hci_evt_cmd_complete *cc;

*buf = bt_buf_get_cmd_complete(K_FOREVER);
*buf = bt_buf_get_evt(BT_HCI_EVT_CMD_COMPLETE, K_FOREVER);

hci_evt_create(*buf, BT_HCI_EVT_CMD_COMPLETE, sizeof(*cc) + plen);

Expand All @@ -137,7 +137,7 @@ static struct net_buf *cmd_status(u8_t status)
struct bt_hci_evt_cmd_status *cs;
struct net_buf *buf;

buf = bt_buf_get_cmd_complete(K_FOREVER);
buf = bt_buf_get_evt(BT_HCI_EVT_CMD_STATUS, K_FOREVER);
hci_evt_create(buf, BT_HCI_EVT_CMD_STATUS, sizeof(*cs));

cs = net_buf_add(buf, sizeof(*cs));
Expand Down
35 changes: 35 additions & 0 deletions subsys/bluetooth/host/hci_core.c
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,16 @@ NET_BUF_POOL_DEFINE(hci_cmd_pool, CONFIG_BT_HCI_CMD_COUNT,
NET_BUF_POOL_DEFINE(hci_rx_pool, CONFIG_BT_RX_BUF_COUNT,
BT_BUF_RX_SIZE, BT_BUF_USER_DATA_MIN, NULL);

#if defined(CONFIG_BT_CONN)
/* Dedicated pool for HCI_Number_of_Completed_Packets. This event is always
* consumed synchronously by bt_recv_prio() so a single buffer is enough.
* Having a dedicated pool for it ensures that exhaustion of the RX pool
* cannot block the delivery of this priority event.
*/
NET_BUF_POOL_DEFINE(num_complete_pool, 1, BT_BUF_RX_SIZE,
BT_BUF_USER_DATA_MIN, NULL);
#endif /* CONFIG_BT_CONN */

struct event_handler {
u8_t event;
u8_t min_len;
Expand Down Expand Up @@ -5656,6 +5666,31 @@ struct net_buf *bt_buf_get_cmd_complete(s32_t timeout)
return bt_buf_get_rx(BT_BUF_EVT, timeout);
}

struct net_buf *bt_buf_get_evt(u8_t evt, s32_t timeout)
{
switch (evt) {
#if defined(CONFIG_BT_CONN)
case BT_HCI_EVT_NUM_COMPLETED_PACKETS:
{
struct net_buf *buf;

buf = net_buf_alloc(&num_complete_pool, timeout);
if (buf) {
net_buf_reserve(buf, CONFIG_BT_HCI_RESERVE);
bt_buf_set_type(buf, BT_BUF_EVT);
}

return buf;
}
#endif /* CONFIG_BT_CONN */
case BT_HCI_EVT_CMD_COMPLETE:
case BT_HCI_EVT_CMD_STATUS:
return bt_buf_get_cmd_complete(timeout);
default:
return bt_buf_get_rx(BT_BUF_EVT, timeout);
}
}

#if defined(CONFIG_BT_BREDR)
static int br_start_inquiry(const struct bt_br_discovery_param *param)
{
Expand Down
2 changes: 1 addition & 1 deletion subsys/bluetooth/host/hci_ecc.c
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ static void send_cmd_status(u16_t opcode, u8_t status)

BT_DBG("opcode %x status %x", opcode, status);

buf = bt_buf_get_cmd_complete(K_FOREVER);
buf = bt_buf_get_evt(BT_HCI_EVT_CMD_STATUS, K_FOREVER);
bt_buf_set_type(buf, BT_BUF_EVT);

hdr = net_buf_add(buf, sizeof(*hdr));
Expand Down
12 changes: 12 additions & 0 deletions subsys/bluetooth/host/hci_raw.c
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,18 @@ struct net_buf *bt_buf_get_cmd_complete(s32_t timeout)
return buf;
}

struct net_buf *bt_buf_get_evt(u8_t evt, s32_t timeout)
{
struct net_buf *buf;

buf = net_buf_alloc(&hci_rx_pool, timeout);
if (buf) {
bt_buf_set_type(buf, BT_BUF_EVT);
}

return buf;
}

int bt_recv(struct net_buf *buf)
{
BT_DBG("buf %p len %u", buf, buf->len);
Expand Down
2 changes: 1 addition & 1 deletion subsys/net/buf.c
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ static inline struct net_buf *pool_get_uninit(struct net_buf_pool *pool,

void net_buf_reset(struct net_buf *buf)
{
NET_BUF_ASSERT(buf->flags == 0U);
NET_BUF_ASSERT((buf->flags & NET_BUF_FLAGS_MASK) == 0U);
NET_BUF_ASSERT(buf->frags == NULL);

net_buf_simple_reset(&buf->b);
Expand Down
2 changes: 1 addition & 1 deletion tests/bluetooth/hci_prop_evt/src/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ static void *cmd_complete(struct net_buf **buf, u8_t plen, u16_t opcode)
{
struct bt_hci_evt_cmd_complete *cc;

*buf = bt_buf_get_cmd_complete(K_FOREVER);
*buf = bt_buf_get_evt(BT_HCI_EVT_CMD_COMPLETE, K_FOREVER);
evt_create(*buf, BT_HCI_EVT_CMD_COMPLETE, sizeof(*cc) + plen);
cc = net_buf_add(*buf, sizeof(*cc));
cc->ncmd = 1U;
Expand Down