-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(hid): Eliminate data race in USB pathway causing dropped keys #2257
base: main
Are you sure you want to change the base?
Conversation
3e0d978
to
c661988
Compare
app/src/usb_hid.c
Outdated
#if IS_ENABLED(CONFIG_ZMK_USB_BOOT) | ||
#define HID_BOOT_REPORT_SIZE sizeof(zmk_hid_boot_report_t) | ||
#else | ||
#define HID_BOOT_REPORT_SIZE 0 | ||
#endif | ||
|
||
#if IS_ENABLED(CONFIG_ZMK_MOUSE) | ||
#define HID_MOUSE_REPORT_SIZE sizeof(struct zmk_hid_mouse_report) | ||
#else | ||
#define HID_MOUSE_REPORT_SIZE 0 | ||
#endif | ||
|
||
#define HID_EP_WRITE_BUF_SIZE \ | ||
MAX(MAX(HID_BOOT_REPORT_SIZE, sizeof(struct zmk_hid_keyboard_report)), \ | ||
MAX(sizeof(struct zmk_hid_consumer_report), HID_MOUSE_REPORT_SIZE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative here is define the buffer size to be a certain fixed constant and then add compile time checks that the buffer is large enough to handle any possible response.
Good catch! I am on the fence on your fix. In practice, the semaphore gating should properly prevent the issue with the buffer changing out from under the USB device driver, but I also share the concerns about the buffer sizing macros being... painful, but perhaps necessary. Any chance you found any upstream guidance on whether the "copy early" or not approach in the drivers is expected? Accidental? Suggested "best practice" for handling? |
@petejohanson Thanks so much for reviewing this so quickly, given how much activity the project has! I'm likely saying something you totally already understand, but while the semaphore completely prevents another call to That said, deferring to your confidence that the current implementation is supposed to work, I just had a look at all of the Zephyr The question is whether it is a bug in In the meantime, since this bug means that really wired-USB key reports are getting through by luck, would you at all consider nonetheless implementing a fix along the lines described here temporarily while we depend on a buggy version of Zephyr? An alternative less-intrusive workaround is that we could arrange that for now the semaphore just blocks the entire function Another option which solves the problem and also potentially improves the current implementation would be for Please let me know what you think. I'm just sad my |
The function `usb_dc_ep_write()` races against its caller because it does not copy the passed data buffer as expected by its contract, and as the other drivers do. Thus the TX data may change from underneath the driver while it is pending. Alongside the output endpoint RX buffers which already exist, we define an input endpoint TX buffer for each endpoint and copy into TX data into it before transmitting. The new buffer is protected by the same lock which prevents a write being issued while an existing write is in progress. This bug was discovered on a Kinesis Adv360 keyboard running ZMK, and was observed to very reliably cause keys with a sufficiently high keycode (hence last to be transmitted) to be dropped. With Wireshark two TX messages were recorded on each keypress (corresponding to key press and key release), but both messages contained the same contents (no keys pressed). Only a key press-release combination generated by a macro-like mode of ZMK is fast enough to trigger the bug. My proposed fix to ZMK, the PR zmkfirmware/zmk#2257, simply copies the data into a temporary buffer before the call and immediately fixed the problem. This commit also fixes the bug now using a vanilla copy of ZMK, and has been tested to work on real hardware when backported to ZMK's Zephyr fork. Closes zephyrproject-rtos#71299. Signed-off-by: Keeley Hoek <[email protected]>
FYI, after trying to carefully read all of the Zephyr USB drivers I believe the problem is localized to the |
It seems that the current view is that Zephyr is unable to make the guarantee that So it appears to me that something along the lines of this PR will be necessary for the time being, at least for |
999f443
to
0a435a1
Compare
1e1ff22
to
43636ee
Compare
There's now a Kconfig flag guarding the workaround. |
As I found out diagnosing #2253, there is a data race in the USB HID core on many platforms; the call to
hid_int_ep_write()
, a Zephyr function, doesn't always copy the passed buffer before beginning the USB transaction. I've only read a few of the HAL drivers, but e.g.nrfx
andstm32
don't, whilerpi_pico
,native_posix
, andkinetis
do.So the USB keycode report is changing underneath the HAL driver while, for example, a tapped
<
quickly activate-deactivates the key. I originally thought this had something to do with extended NKRO, but actually noone ever noticed for smaller keycode bits just because they were first and thus sent more quickly!Amazing to me that this hadn't come up before! I guess there are many more bluetooth users, where it seems in practice that there is no such problem (though I haven't checked the code).
Tested working on a real Adv360.
Closes #2253