-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thingbuf::mpsc::Sender
hanging up for parallel try_send_ref
and send
/ send_ref
from sync thread and async tokio::task
#83
Comments
We are also using StaticThingbuf in the DragonOS kernel and have encountered the exact same problem. The issue appears randomly and is difficult to reproduce after the system kernel is restarted. In our use case:
Once the buffer is full, there is a very sluggish phenomenon. After a period of time, the performance drops off a cliff. Ultimately, it freezes at the |
## v0.1.5 (2024-04-06) #### Features * **mpsc:** add `len`, `capacity`, and `remaining` methods to mpsc (#72) ([00213c1](00213c1), closes [#71](#71)) #### Bug Fixes * unused import with `alloc` enabled ([ac1eafc](ac1eafc)) * skip slots with active reading `Ref`s in `push_ref` (#81) ([a72a286](a72a286), closes [#83](#83), [#80](#80))
After updating the repository, the performance issue still remains unresolved. Below is the backtrace of the program execution: We made some minor modifications to the code to adapt it to our requirements. Below is the repository link after merging the latest version:https://github.com/xiaolin2004/thingbuf.git |
Hmm, that's unfortunate. Thanks for following up --- I'll keep looking. |
It looks like, after 11089327 iterations under Full logs from that iteration
|
I have an idea what's wrong with the test but I have to debug it. Will try to find some time this week to work on it. |
I investigated the test and the code in the https://github.com/sgasse/thingbuf_hangup/. The test fails because of an edge case, and my initial thought is to adjust the test itself. Here's a breakdown of the issue: We have a buffer with two slots. Initially, we write to slot [0], then to slot [1], and start reading from slot [0]. If we try writing again, slot [0] is unavailable (as it's still being read), so we skip it. Once the reading from slot [0] is completed, the head pointer moves to slot [1]. We attempt to write to slot [1] but can't, since it's not yet read, and the head pointer is already on slot [1]. Consequently, we declare the buffer as full. The problem arises in my test when starting to read from slot [1] – it never gets released, making it perpetually empty for reading and full for writing. However, I identified a genuine issue within the code at https://github.com/sgasse/thingbuf_hangup/, which I've temporarily fixed on my local setup. I executed I will re-examine my findings and aim to submit a PR later this week. |
Fixes #83 Previously, to determine if the buffer was full, we checked whether the head and tail were pointing to the same slot with the head one generation behind. However, this check fails if we skip slots, leading to scenarios where the `head` and `tail` point to different slots even though the buffer is full. For example, consider a buffer with 3 slots. Initially, we write to the buffer three times (gen + 0). Then, we read from slot 0 and slot 1, holding the reference from slot 1, and read from slot 2 (gen + 0). Next, we write to slot 0 (gen + 1) and read from slot 0 (gen + 1), which moves our `head` to slot 1 (gen + 1). Then we try to write to slot 1 (gen + 1) and skip it, so we write to slot 2 (gen + 1). Then again we write to slot 0 (gen + 2). And then we attempt to write to slot 1 but we skip and attempt to write to slot 2 (gen + 2). However, we can’t write into it because it still contains data from the previous generation (gen + 1), and our `head` points to slot 1 instead of slot 2. This fix ensures the buffer full condition accurately reflects the actual status of the slots, particularly when writes are skipped.
Hi! We are using
thingbuf
in a performance-sensitive application for its speed while supporting sync and async interaction on the same sender type. Recently I ran into an issue wherethingbuf
seems to hang up our async application completely in busy loops. The profiling in the degraded state showed almost only calls tothingbuf
.I was able to reproduce the issue in a minimum example. You can run several variations of it yourself by checking out this repo:
https://github.com/sgasse/thingbuf_hangup/
The initial setup (binary
thingbuf_sendref
) to get into the hangup was this:std::thread
sends withtry_send_ref
in a loop every 10ms.tokio::task
receives on the channel withrecv_ref().await
in a loop. After 10s, there is a delay of 10s introduced between receive calls. This simulates badly handled backpressure from a downstream task.tokio::task
starts sending withsend(..).await
in a loop after 20s.tokio::task
logs an alive message every second.Once the second sender becomes active, we no longer see any alive logs. Introducing logs to
thingbuf
shows that two threads (one tokio worker and the self-spawned thread) are both stuck in this loop inpush_ref
.Here is some log output with line numbers from
src/lib.rs
fromthingbuf
for the hang-up scenario, this is infinitely:The initial setup mimics the behavior of a part of our real application. However I varied the setup in other examples, here are some findings:
try_send_ref()
(sync),send_ref().await()
orsend(..).await
(async) does not seem to matter, see examplesthingbuf_sendref
,thingbuf_sendref_pure
,thingbuf_send_recvref
andthingbuf_send_no_try_recvref
which all hang-up.recv().await
instead ofrecv_ref().await
, there is no hangup, see examplethingbuf_send
.x86_64-unknown-linux-gnu
, but I initially found it onaarch64-linux-android
so I guess it does not depend on the platform.rustc
in version1.76
andnightly
, so probably not related to the compiler.thingbuf::mpsc
withtokio::mpsc
in one example, which works as expected: It still logs the alive messages and does not hang up.I would have expected the same behavior as I see with
tokio::mpsc
, so I guess it is a bug. But please let me know if there is a limitation which I overlooked or if I can provide further info.The text was updated successfully, but these errors were encountered: