ui_waker: coalesce batched repaint requests into one #178
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update:
Taking a step back, I do not think the reasoning below is correct: even though the proposed solution does fix the issue, I still think I got the cause wrong.
Still, the fact that coalescing those repaint requests fixes the issue might be a good indicator of what's going on for someone that knows better?
Stuff I'd like to try when I get some time:
Background
This started with me trying to load large
.rrd
files full ofTextExtries
into the viewer.Unfortunately, more often than not, doing so would lead to a complete deadlock of the app.
After fiddling around with it for while, I eventually came to the conclusion that the quantity of data had no bearing on the issue: speed of ingestion was the real culprit.
So, here's what this looks like in practice:
22-10-09_13.22.05.patched.mp4
Sometimes everything would deadlock before we could even spawn a window yet, sometimes it would all work fine, sometimes it would deadlock later on and leave us with a totally unresponsive window. 🤷♂️
I spent hours trying to find anything suspicious on our end, tried running everything through a patched version of our
deadlock_detection
feature... to no avail.So in the end I just threw
ThreadSanitizer
at it, and started sifting through the noise, until stumbling upon that one:Race report
What's going here (as far as I understand, please correct me if I'm wrong) is this: on native targets,
wake_up_ui_thread_on_each_msg
will catch each incoming raw datapoint (aka.LogMsg
) and notifyegui
that the UI needs to be updated, which will in turn notifyeframe
, which will notifywinit
and so on until we finally reach the platform-dependent libraries sitting at the edge.In my case, that would be the Smithay ecosystem of libraries, as I'm targeting wayland.
Upon receiving a repaint request event,
smithay
will simply queue it into a mpsc channel using theircalloop
abstraction.Now it turns out that, for reasons unknown to me and that I don't want to spend any more time looking into, there's a data race lurking in there somewhere (i.e. in smithay / calloop) that is almost impossible to hit... Unless of course you are ingesting 100k
TextEntries
(which is 300kLogMsg
, i.e. 300k repaint requests) in a couple hundreds of microseconds, in which case you are almost guaranteed to hit that race.And once you do, it's game over: the communication pipeline with wayland is deadlocked and the entire window becomes unresponsive. Your only way out is a
SIGKILL
.Mitigation
This PR mitigates the issue simply by coalescing batches of repaint requests.
In practice, this make the wayland issue completely disappear.
And, beyond wayland, it just seems like a decent performance improvement that should apply to all platforms anyway?
If it is indeed the right approach, then perhaps this should be done directly inside
egui
so that it benefits every caller ofrequest_repaint()
.But then again, I've been knee deep into backtraces for so long that I might just be talking absolute non-sense and missing something completely obvious here so... you tell me.