Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix libusb usage for at least freebsd around the worker thread and transfer cancellation #805

Merged

Conversation

erikarn
Copy link

@erikarn erikarn commented Nov 9, 2020

This fixes Issue #804 .

On at least freebsd-13 trying to cancel a transfer whilst the libusb thread
is not running results in the transfers not completing cancellation.
The next time they're attempted to be re-added the libusb code thinks
they're still active, and returns BUSY on the buffers.

This causes gqrx to error out when one makes DSP changes or stops/starts it.
You have to restart gqrx to fix it.

After digging into it a bit, the libusb code expects that you're actively
running the main loop in order to have some deferred actions run in the
context of said main loop thread. This includes processing cancelled
transfers - the callbacks have to be run (if they exist) before the
buffers are properly cancelled and have their tracking metadata (a couple of
private pointers and state) removed from inside of libusb.

This patch does the following:

  • separate out adding and cancelling transfers from the libusb worker thread
    create/destroy path
  • create the libusb worker thread when opening the device
  • destroy the libusb worker thread when closing the device
  • only add and cancel transfers when starting and stopping tx/rx
  • handle cancelled transfers gracefully in the USB callback

Whilst here, also make the libusb device memory zeroed by using
calloc instead of malloc.

This fixes all of the weird libusb related buffer management problems
on FreeBSD.

…ansfer cancellation

On at least freebsd-13 trying to cancel a transfer whilst the libusb thread
is not running results in the transfers not completing cancellation.
The next time they're attempted to be re-added the libusb code thinks
they're still active, and returns BUSY on the buffers.

This causes gqrx to error out when one makes DSP changes or stops/starts it.
You have to restart gqrx to fix it.

After digging into it a bit, the libusb code expects that you're actively
running the main loop in order to have some deferred actions run in the
context of said main loop thread.  This includes processing cancelled
transfers - the callbacks have to be run (if they exist) before the
buffers are properly cancelled and have their tracking metadata (a couple of
private pointers and state) removed from inside of libusb.

This patch does the following:

* separate out adding and cancelling transfers from the libusb worker thread
  create/destroy path
* create the libusb worker thread when opening the device
* destroy the libusb worker thread when closing the device
* only add and cancel transfers when starting and stopping tx/rx
* handle cancelled transfers gracefully in the USB callback

Whilst here, also make the libusb device memory zeroed by using
calloc instead of malloc.

This fixes all of the weird libusb related buffer management problems
on FreeBSD.
@erikarn
Copy link
Author

erikarn commented Nov 9, 2020

(hm, it seems to work fine for stop/start but not for changing operating mode in gqrx for some odd reason, so let me go dig into it a bit more to see what else needs to be fixed with libusb use.)

@erikarn
Copy link
Author

erikarn commented Nov 9, 2020

(Update - so far the OTHER issues are inside of gr-osmosdr and their lack of clear lock / condition wait handing in the buffer management path; I'll keep poking at this and update the patch when I'm ready.)

* Update device->streaming to reflect whether we're streaming data,
  rather than just whether the streaming thread is active.
  The streaming thread is now always active!
@erikarn
Copy link
Author

erikarn commented Nov 9, 2020

Ok, this commit stack (3 commits) fixes all the libhackrf issues I've been seeing. there are hangs in gr-osmosdr because of how they currently implement receive streaming and they wait forever for buffers that may never come; I've fixed that and I'll try to upstream that asap.

@erikarn
Copy link
Author

erikarn commented Nov 9, 2020

Ugh, failing on WIndows? Ok, lemme go make it work there using Sleep() or something.

This seems to stop consumers that are doing quick back to back stop/start
(eg gqrx changing decode mode / filter bandwidth) from hanging the
device.

I now don't have any weird hangs on hackrf with gqrx/freebsd/libusb!

When things hang it isn't erroring out in any way; it just doesn't
start receive.  It doesn't look like a libusb issue; I'd have to get
some USB bus sniffing to see what's going on behind the scenes.
@erikarn erikarn force-pushed the ahc_20201108_fix_libusb_cancel branch from 4ae1a96 to b4ea51a Compare November 9, 2020 18:40
@erikarn
Copy link
Author

erikarn commented Nov 9, 2020

Ok, this passed. I'd like this patch set to be considered. thanks!

@ktemkin
Copy link
Contributor

ktemkin commented Nov 12, 2020

It'll be a little bit before I can take a look at it -- I suspect @miek might beat me to it. :)

@erikarn
Copy link
Author

erikarn commented Nov 12, 2020

oki thanks!

@erikarn
Copy link
Author

erikarn commented Nov 24, 2020

Ping @ktemkin ! Don't suppose you could eyeball this a bit pretty please?

Thanks!

-erikarn

@ktemkin
Copy link
Contributor

ktemkin commented Nov 30, 2020

Apologies; it's been a while and I'm only now just getting a chance to look at things. (This potentially needs testing on multiple platforms, so it's had to wait until I had a good enough chunk of time.)

I don't have a FreeBSD box around to do testing with; so I'm going to take it on faith that you've tested enough over there and check things out using other platforms.

@ktemkin
Copy link
Contributor

ktemkin commented Nov 30, 2020

I haven't taken a deep look at this, but at the very least this breaks API compatibility on Linux -- hackrf_close now fails with a return code:

$ hackrf_info

hackrf_info version: git-b4ea51a
libhackrf version: git-b4ea51a (0.5)
Found HackRF
Index: 0
Serial number: 000000000000000087c867dc2a8d4d5f
Board ID Number: 2 (HackRF One)
Firmware Version: 2018.01.1 (API:1.02)
Part ID Number: 0xa000cb3c 0x0056475f
hackrf_close() failed: unspecified error (-9999)

I haven't had a chance to take a deeper look, but I suspect one of the libusb calls needs to be handled with a bit more subtlety on Linux.

@erikarn
Copy link
Author

erikarn commented Nov 30, 2020 via email

My previous commits didn't handle the specific case of hackrf_close()
being called without the transfers being active.

In this instance the transfers haven't been setup, so calling
cancel_transfers() returned an error.

Instead:

* refactor out the tx/rx stop command from canceling transfers
* send the tx/rx stop command without canceling transfers, since ..
* ... we can then destroy the transfer thread.

I may also need to put an explicit cancel_transfers() before the
call to send the tx/rx stop commands; I'll look at doing that
in a subsequent commit.
@erikarn
Copy link
Author

erikarn commented Dec 11, 2020

ok, I addressed why it was throwing an error internally; no transfers were setup so checking the return value of check_transfers() failed.

Wanna try again and let me know if that works any better? thanks a bunch!

@ktemkin
Copy link
Contributor

ktemkin commented Dec 14, 2020

Yep -- will check this out today and get back to you. :)

@ktemkin
Copy link
Contributor

ktemkin commented Dec 14, 2020

This does seem to fix the issues on Linux; though I still need to test this macOS and Windows. :)

@ktemkin
Copy link
Contributor

ktemkin commented Dec 14, 2020

Okays; it seems like basic tests pass over here -- and assuming the sleeps added are a workaround because other methods were non-trivial to implement cross-platform, LGTM.

I'll defer merging to @mossmann or @miek; since they're the ones who are going to have to deal with any breakage. :)

@erikarn
Copy link
Author

erikarn commented Dec 18, 2020

Okays; it seems like basic tests pass over here -- and assuming the sleeps added are a workaround because other methods were non-trivial to implement cross-platform, LGTM.

I'll defer merging to @mossmann or @miek; since they're the ones who are going to have to deal with any breakage. :)

Thanks Kate! I look forward for further review and hopefully a merge!

@erikarn
Copy link
Author

erikarn commented Jan 4, 2021

ping! Is there anything else I can do to help get this over the line? :-) Thanks!

@mossmann mossmann merged commit ee44d2d into greatscottgadgets:master Jan 24, 2021
@mossmann
Copy link
Member

Thank you very much for this contribution!

@mossmann mossmann self-assigned this Jan 24, 2021
@erikarn
Copy link
Author

erikarn commented Jan 24, 2021 via email

martinling added a commit to martinling/hackrf that referenced this pull request Mar 18, 2022
These were added in greatscottgadgets#805, as a workaround to prevent their parent
functions from returning before transfer cancellations had completed.
This has since been fixed properly in greatscottgadgets#1029.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants