Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table: fix deadlocks caused by lock fairness #2156

Merged
merged 1 commit into from
Oct 16, 2022

Conversation

teh-cmc
Copy link
Collaborator

@teh-cmc teh-cmc commented Oct 16, 2022

Alright, story time :D

Background

Some context first: we have a simple application that runs on two threads:

  • Thread 0 just calls ctx.request_repaint() once in a while, depending on various external factors that are irrelevant to this issue.
  • Thread 1 just draws a resizable egui Table every frame.

Problem is: more often than not, that application will deadlock.
In particular, it seems the more thread 0 tries to request repaints, the more likely we are to deadlock.

The situation just described can be dumbed down to this:

use eframe::egui;

fn main() {
    let options = eframe::NativeOptions::default();
    eframe::run_native(
        "deadlock",
        options,
        Box::new(|cc| Box::new(MyApp::new(cc))),
    );
}

struct MyApp {}

impl MyApp {
    fn new(cc: &eframe::CreationContext<'_>) -> Self {
        std::thread::Builder::new()
            .name("ui_waker".to_owned())
            .spawn({
                let egui_ctx = cc.egui_ctx.clone();
                move || loop {
                    egui_ctx.request_repaint();
                }
            })
            .unwrap();

        Self {}
    }
}

impl eframe::App for MyApp {
    fn update(&mut self, ctx: &egui::Context, _frame: &mut eframe::Frame) {
        egui::CentralPanel::default().show(ctx, |ui| {
            show_table(ui, 10_000);
        });
    }
}

static mut I: usize = 0;
fn show_table(ui: &mut egui::Ui, _size: usize) {
    egui_extras::TableBuilder::new(ui)
        .resizable(true) // not a random detail!!!!!
        .cell_layout(egui::Layout::left_to_right(egui::Align::Center))
        .column(egui_extras::Size::initial(1000.0).at_least(1000.0))
        .body(|_body| {
            // don't even need to render actual rows
            unsafe {
                I += 1;
                println!("i = {I}"); // will stop printing _very_ soon
            }
        });
}

and, indeed, that piece of code will always deadlock without fail.

The fix

The solution to all our problems is the following..

@@ -377,8 +378,10 @@ impl<'a> Table<'a> {
-                let dragging_something_else =
-                    ui.input().pointer.any_down() || ui.input().pointer.any_pressed();
+                let dragging_something_else = {
+                    let pointer = &ui.input().pointer;
+                    pointer.any_down() || pointer.any_pressed()
+                };

..which really just raises more questions.

First off, here's how the compiler desugars that condition:

let dragging_something_else = ui.input().pointer.any_down() || ui.input().pointer.any_pressed();

becomes:

let dragging_something_else = {
    let guard1 = ui.input();
    let any_down = guard1.pointer.any_down();
    let guard2 = ui.input();
    let any_pressed = guard1.pointer.any_pressed();
    any_down || any_pressed
};

i.e. we end up with two read guards that live concurrently until the end of the scope, the second of which effectively behaves as a reentrant read.
That still doesn't explain why or how we end up with a deadlock.

Let's recap the situation: we have one thread fighting for the write-side of the lock, and 2 readers on another thread fighting for the read-side: sure we expect some contention but besides that, this shouldn't be an issue... unless parking_lot's RwLock aren't reentrant? But they are though, right?

Well, our fix is saying otherwise... gotta dig deeper.

Is parking_lot::RwLock reentrant??

At this point we can just take egui out of the equation and focus on parking_lot:

fn main() {
    use parking_lot::RwLock;

    let lock = Arc::new(RwLock::new(0usize));

    std::thread::Builder::new()
        .name("inc".to_owned())
        .spawn({
            let lock = Arc::clone(&lock);
            move || loop {
                let mut n = lock.write();
                *n += 1;
            }
        })
        .unwrap();

    let mut i = 0usize;
    loop {
        let a_or_b = *lock.read() == 42 || *lock.read() == 666;

        // This won't be printing for long.
        if a_or_b {
            println!("yay {i}");
        } else {
            println!("nay {i}");
        }
        i += 1;
    }
}

This will reproduce the deadlock all the same.

Let's take a step back: intuitively, it seems that an RwLock should be read-reentrant, right? Why wouldn't it be? After all it can have all the readers it want!
In fact, the example shown in parking_lot's documentation seems to agree with that first intuition:

// Copied verbatim parking_lot::RwLock's documentation:

use parking_lot::RwLock;

let lock = RwLock::new(5);

// many reader locks can be held at once
{
    let r1 = lock.read();
    let r2 = lock.read();
    assert_eq!(*r1, 5);
    assert_eq!(*r2, 5);
} // read locks are dropped at this point

// only one write lock may be held, however
{
    let mut w = lock.write();
    *w += 1;
    assert_eq!(*w, 6);
} // write lock is dropped here

Even better: this example is doing exactly the same thing as our condition from earlier!

So, parking_lot's RwLocks: read-reentrant or not? Weeeell... it depends.
In particular, it depends on whether there is another thread concurrently trying to grab the lock exclusively.

This has to do with how parking_lot handles fairness internally: the implementation is biased towards writers.
When a thread tries to grab an RwLock exclusively, a flag is immediately set on the lock to block future readers, thereby preventing writer starvation.
At this point, the writer-to-be is guaranteed to get exclusivity of the lock at some point, and just waits there for existing readers to leave.

Knowing this, it's pretty straightforward to infer what's going on in our case, based on the desugared condition:

let dragging_something_else = {
    let guard1 = ui.input();
    let any_down = guard1.pointer.any_down();
    let guard2 = ui.input();
    let any_pressed = guard1.pointer.any_pressed();
    any_down || any_pressed
};
  1. Thread 1 grabs a read guard and stashes it in guard1.
  2. Thread 0 tries to grab a write guard, fails, sets the internal flag to block future readers, and waits for guard1 to leave.
  3. Thread 1 tries to grab a read guard to stash into guard2, except it can't, so it waits.
  4. Deadlocked.

Make sure those two read-guards don't live concurrently, and the issue goes away.

Follow up

This issue is known and has already been documented here.
Though I feel the current example in the documentation is quite misleading; maybe I'll communicate that upstream.
(In fairness to parking_lot though, the documentation of the method itself does clearly state that it is not reentrant!)

While debugging this, I stumbled upon parking_lot's experimental deadlock detection feature, which works quite well and is well aware of parking_lot's implementation details, contrary to ours.
I'll publish a follow-up PR to replace our custom detection with theirs when running on native.

@emilk
Copy link
Owner

emilk commented Oct 16, 2022

Amazing job @teh-cmc. I think we need to get something like #1969 merged asap

@emilk emilk merged commit 3d36a20 into emilk:master Oct 16, 2022
@emilk emilk deleted the cmc/fix_table_deadlock branch October 16, 2022 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants