table: fix deadlocks caused by lock fairness #2156

teh-cmc · 2022-10-16T14:27:26Z

Alright, story time :D

Background

Some context first: we have a simple application that runs on two threads:

Thread 0 just calls ctx.request_repaint() once in a while, depending on various external factors that are irrelevant to this issue.
Thread 1 just draws a resizable egui Table every frame.

Problem is: more often than not, that application will deadlock.
In particular, it seems the more thread 0 tries to request repaints, the more likely we are to deadlock.

The situation just described can be dumbed down to this:

use eframe::egui;

fn main() {
    let options = eframe::NativeOptions::default();
    eframe::run_native(
        "deadlock",
        options,
        Box::new(|cc| Box::new(MyApp::new(cc))),
    );
}

struct MyApp {}

impl MyApp {
    fn new(cc: &eframe::CreationContext<'_>) -> Self {
        std::thread::Builder::new()
            .name("ui_waker".to_owned())
            .spawn({
                let egui_ctx = cc.egui_ctx.clone();
                move || loop {
                    egui_ctx.request_repaint();
                }
            })
            .unwrap();

        Self {}
    }
}

impl eframe::App for MyApp {
    fn update(&mut self, ctx: &egui::Context, _frame: &mut eframe::Frame) {
        egui::CentralPanel::default().show(ctx, |ui| {
            show_table(ui, 10_000);
        });
    }
}

static mut I: usize = 0;
fn show_table(ui: &mut egui::Ui, _size: usize) {
    egui_extras::TableBuilder::new(ui)
        .resizable(true) // not a random detail!!!!!
        .cell_layout(egui::Layout::left_to_right(egui::Align::Center))
        .column(egui_extras::Size::initial(1000.0).at_least(1000.0))
        .body(|_body| {
            // don't even need to render actual rows
            unsafe {
                I += 1;
                println!("i = {I}"); // will stop printing _very_ soon
            }
        });
}

and, indeed, that piece of code will always deadlock without fail.

The fix

The solution to all our problems is the following..

@@ -377,8 +378,10 @@ impl<'a> Table<'a> {
-                let dragging_something_else =
-                    ui.input().pointer.any_down() || ui.input().pointer.any_pressed();
+                let dragging_something_else = {
+                    let pointer = &ui.input().pointer;
+                    pointer.any_down() || pointer.any_pressed()
+                };

..which really just raises more questions.

First off, here's how the compiler desugars that condition:

let dragging_something_else = ui.input().pointer.any_down() || ui.input().pointer.any_pressed();

becomes:

let dragging_something_else = {
    let guard1 = ui.input();
    let any_down = guard1.pointer.any_down();
    let guard2 = ui.input();
    let any_pressed = guard1.pointer.any_pressed();
    any_down || any_pressed
};

i.e. we end up with two read guards that live concurrently until the end of the scope, the second of which effectively behaves as a reentrant read.
That still doesn't explain why or how we end up with a deadlock.

Let's recap the situation: we have one thread fighting for the write-side of the lock, and 2 readers on another thread fighting for the read-side: sure we expect some contention but besides that, this shouldn't be an issue... unless parking_lot's RwLock aren't reentrant? But they are though, right?

Well, our fix is saying otherwise... gotta dig deeper.

Is `parking_lot::RwLock` reentrant??

At this point we can just take egui out of the equation and focus on parking_lot:

fn main() {
    use parking_lot::RwLock;

    let lock = Arc::new(RwLock::new(0usize));

    std::thread::Builder::new()
        .name("inc".to_owned())
        .spawn({
            let lock = Arc::clone(&lock);
            move || loop {
                let mut n = lock.write();
                *n += 1;
            }
        })
        .unwrap();

    let mut i = 0usize;
    loop {
        let a_or_b = *lock.read() == 42 || *lock.read() == 666;

        // This won't be printing for long.
        if a_or_b {
            println!("yay {i}");
        } else {
            println!("nay {i}");
        }
        i += 1;
    }
}

This will reproduce the deadlock all the same.

Let's take a step back: intuitively, it seems that an RwLock should be read-reentrant, right? Why wouldn't it be? After all it can have all the readers it want!
In fact, the example shown in parking_lot's documentation seems to agree with that first intuition:

// Copied verbatim parking_lot::RwLock's documentation:

use parking_lot::RwLock;

let lock = RwLock::new(5);

// many reader locks can be held at once
{
    let r1 = lock.read();
    let r2 = lock.read();
    assert_eq!(*r1, 5);
    assert_eq!(*r2, 5);
} // read locks are dropped at this point

// only one write lock may be held, however
{
    let mut w = lock.write();
    *w += 1;
    assert_eq!(*w, 6);
} // write lock is dropped here

Even better: this example is doing exactly the same thing as our condition from earlier!

So, parking_lot's RwLocks: read-reentrant or not? Weeeell... it depends.
In particular, it depends on whether there is another thread concurrently trying to grab the lock exclusively.

This has to do with how parking_lot handles fairness internally: the implementation is biased towards writers.
When a thread tries to grab an RwLock exclusively, a flag is immediately set on the lock to block future readers, thereby preventing writer starvation.
At this point, the writer-to-be is guaranteed to get exclusivity of the lock at some point, and just waits there for existing readers to leave.

Knowing this, it's pretty straightforward to infer what's going on in our case, based on the desugared condition:

let dragging_something_else = {
    let guard1 = ui.input();
    let any_down = guard1.pointer.any_down();
    let guard2 = ui.input();
    let any_pressed = guard1.pointer.any_pressed();
    any_down || any_pressed
};

Thread 1 grabs a read guard and stashes it in guard1.
Thread 0 tries to grab a write guard, fails, sets the internal flag to block future readers, and waits for guard1 to leave.
Thread 1 tries to grab a read guard to stash into guard2, except it can't, so it waits.
Deadlocked.

Make sure those two read-guards don't live concurrently, and the issue goes away.

Follow up

This issue is known and has already been documented here.
Though I feel the current example in the documentation is quite misleading; maybe I'll communicate that upstream.
(In fairness to parking_lot though, the documentation of the method itself does clearly state that it is not reentrant!)

While debugging this, I stumbled upon parking_lot's experimental deadlock detection feature, which works quite well and is well aware of parking_lot's implementation details, contrary to ours.
I'll publish a follow-up PR to replace our custom detection with theirs when running on native.

emilk · 2022-10-16T18:10:16Z

Amazing job @teh-cmc. I think we need to get something like #1969 merged asap

table: fix deadlocks caused by lock fairness

95bd046

emilk approved these changes Oct 16, 2022

View reviewed changes

emilk merged commit 3d36a20 into emilk:master Oct 16, 2022

emilk deleted the cmc/fix_table_deadlock branch October 16, 2022 18:10

teh-cmc mentioned this pull request Oct 16, 2022

ui_waker: coalesce batched repaint requests into one rerun-io/rerun#178

Closed

5 tasks

Wumpf mentioned this pull request Oct 17, 2022

update egui to latest rerun-io/rerun#204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

table: fix deadlocks caused by lock fairness #2156

table: fix deadlocks caused by lock fairness #2156

teh-cmc commented Oct 16, 2022

emilk commented Oct 16, 2022

table: fix deadlocks caused by lock fairness #2156

table: fix deadlocks caused by lock fairness #2156

Conversation

teh-cmc commented Oct 16, 2022

Background

The fix

Is parking_lot::RwLock reentrant??

Follow up

emilk commented Oct 16, 2022

Is `parking_lot::RwLock` reentrant??