Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NKRO breaks layer-tap with "extended" keycodes #2253

Open
khoek opened this issue Apr 5, 2024 · 9 comments · May be fixed by #2257
Open

NKRO breaks layer-tap with "extended" keycodes #2253

khoek opened this issue Apr 5, 2024 · 9 comments · May be fixed by #2257

Comments

@khoek
Copy link
Contributor

khoek commented Apr 5, 2024

The binding &lt my_layer F24 doesn't work properly with NKRO on, even with extended reporting enabled. I observe that: with CONFIG_ZMK_HID_KEYBOARD_NKRO_EXTENDED_REPORT=n I never see an F24 at all when tapping (as expected), but with CONFIG_ZMK_HID_KEYBOARD_NKRO_EXTENDED_REPORT=y I only infrequently get a F24 keycode after spamming, alternating between the layer-tap key and another ordinary e.g. &kp A key. Tapping the layer-tap key by itself, I never see any F24s at all. Replacing the original layer-tap binding with &lt my_layer A works fine in any case.

However, by disabling NKRO altogether with CONFIG_ZMK_HID_REPORT_TYPE_NKRO=n the problem completely goes away. I am connected over USB. Does anyone know what the cause of this could be?

@caksoylar
Copy link
Contributor

The only difference with &lt .. F24 and &kp F24 is likely the tap duration. Former sends it for a very short duration (one HID report?) so maybe whatever is listening doesn't like that, combined with the NKRO protocol.

@khoek
Copy link
Contributor Author

khoek commented Apr 6, 2024

Thanks for your help! The thing is, from what I can see this behavior is happening on both Windows and Ubuntu, and doesn't happen for the < 12 function keys e.g. with F11 instead of F24. I'm 90% confident ZMK is deciding not to send any keycode at all when I am tapping the key, in the situation where it was compiled with NKRO on.

@khoek
Copy link
Contributor Author

khoek commented Apr 6, 2024

Yes, after setting up wireshark for USB capture I can confirm that a single tap of my &lt my_layer F24 key does cause 2 USB packets to be sent (press and release) but both of them are identical with the payload consisting of 010000... i.e. the body is all zeroes indicating no keys are being pressed. This doesn't happen over bluetooth (that is, bluetooth is working perfectly).

So there is definitely a bug in ZMK over USB + Extended Report NKRO. I'm not familiar with the codebase so it's going to take a while for me to set up a development environment and insert print statements everywhere for me to understand.

Complicating the issue is the fact that I can get the F24 to come through unreliably by alternating spamming another (working) key; from what I can see the only order which works is that I need to press the &lt my_layer F24 key, then press the other key, then release the other key, then release the &lt my_layer F24 key, all within the 200ms window for &lt to be cancelled. Perhaps there is a second pathway in the hold-tap behavior which is avoiding the bug in this case somehow? (I am just guessing, I don't know how any of this works yet.)

If anyone more knowledgeable can infer what is going wrong from this information please feel free to point me in the right direction!

@khoek
Copy link
Contributor Author

khoek commented Apr 6, 2024

Another thing is that, even more rarely, once I get the "right combination" of spammy presses to get the F24 to appear tapping the key will successfully emit more F24s for a little while. But it eventually stops working again---I get the sense that maybe there is an uninitialized memory-type problem potentially going on?

@khoek
Copy link
Contributor Author

khoek commented Apr 6, 2024

OK, I believe this is a bug in upstream Zephyr. By inserting print statements I see ZMK making a call to hid_int_ep_write() passing the 22-byte string 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00, but wireshark captures the payload 01000000000000000000000000000000000000000000 actually transmitted to the interrupt endpoint. (For the avoidance of doubt, I was testing with the F17 key here.)

Weird.

I've been testing this on an Adv360.

@khoek
Copy link
Contributor Author

khoek commented Apr 7, 2024

Okay, found the problem. There is a data race in usb_hid.c on many platforms; the call to hid_int_ep_write(), a Zephyr function, is not in general synchronous and doesn't always copy the passed buffer before beginning the USB transaction. I've only read a few of the HAL drivers, but e.g. nrfx and stm32 don't, while rpi_pico, native_posix, and kinetis do.

I conclude that copying the buffer is not part of the Zephyr function's contract (the docs are ambiguous, and on first glance would cause me to guess hid_int_ep_write() is synchronous, which is even less true). So the USB keycode report is changing underneath the HAL driver as &lt quickly deactivates the key. Noone ever noticed for smaller keycode bits because they were sent so quickly!

Pull request incoming.

@khoek
Copy link
Contributor Author

khoek commented Apr 7, 2024

The fix is in #2257. Tested working on a real Adv360.

@jhorology
Copy link
Contributor

In my experience ,CONFIG_HID_INTERRUPT_EP_MPS=32 will solve your problem.

@khoek
Copy link
Contributor Author

khoek commented Apr 18, 2024

@jhorology Unfortunately the bug isn't not having enough bytes sent to the USB endpoint, but rather a race condition caused by an ambiguous API in upstream Zephyr (which they acknowledge, but can't change---the relevant conversation is zephyrproject-rtos/zephyr#71306). So ZMK is going to have to work around this (at least until apparently a new USB subsystem for Zephyr which will be available at some point in the future).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants