Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pattern match directly on LockResult causes deadlock #37612

Open
lloydmeta opened this issue Nov 6, 2016 · 20 comments
Open

Using pattern match directly on LockResult causes deadlock #37612

lloydmeta opened this issue Nov 6, 2016 · 20 comments
Labels
A-destructors Area: Destructors (`Drop`, …) A-lifetimes Area: Lifetimes / regions C-bug Category: This is a bug. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@lloydmeta
Copy link
Member

Came across this as I was playing around with implementing memoisation via double-checked-locking with sync::RWLock:

The following, which puts the .read() result into a temporary binding works fine:

// ...
{
    let lock_result = self.cache.read(); // <-- temporary binding
    match lock_result {
        Ok(ref data) if data.len() > to => Some(data[to].clone()),
        _ => None,
    }
}
.unwrap_or_else(|| {
    // We need to write now, so get a write lock
    let mut data = self.cache.write().unwrap();
// ...

The following, which directly matches on the result of .read(), dies in a deadlock:

// ...
{
    match self.cache.read() { // <-- direct pattern matching
        Ok(ref data) if data.len() > to => Some(data[to].clone()),
        _ => None,
    }
}
.unwrap_or_else(|| {
    // We need to write now, so get a write lock
    let mut data = self.cache.write().unwrap();
// ...

I'm guessing the direct pattern match is compiled/desugared in a way that the RwLockReadGuard never goes out of scope and thus the read lock is never released, but I'm not sure if this is the expected behaviour. It certainly surprised me, especially since the entire "read" block is scoped by its own set of braces.

For extra context, here is the relevant part in my scratchpad project. I have a concurrency test for it that fails to finish if the pattern matching section is changed to match the latter case.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Nov 6, 2016

Temporaries live for the entire statement, never shorter. There are cases where they are made to live longer (e.g. to make stuff like let x = &5; work) and obviously the surrounding expression can move the temporary into a function where it'll be dropped earlier, but "left to its own devices" a temporary will live until the end of the statement.

The second snippet shows part of one huge multi-line statement, and the extra set of braces around the match is as irrelevant as it is in, say, let x = {5};. In contrast, in the first snippet the self.cache.read() temporary is moved into a short-lived variable and thus dropped at the brace between the match and the unwrap_or_else call.

So everything is working as intended IMHO. Still, what can we do to help people who run into it in the future? Maybe a lint could look for multiple locking API calls within the same expression. However, it would need to be carefully written to avoid a false positive for code like the second. Or maybe it's better to work on MIR more generally and try to detect overlapping code regions holding the same lock, even if it's spread across statements. Sounds like a nice idea for clippy (cc @Manishearth @llogiq).

@lloydmeta
Copy link
Member Author

lloydmeta commented Nov 6, 2016

Interesting, I think for me, the main point of confusion is this:

The second snippet shows part of one huge multi-line statement, and the extra set of braces around the match is as irrelevant as it is in, say, let x = {5};

From my reading of the Rust book, I was under the impression that braces {} define blocks and blocks define binding scope, so that any bindings defined inside them will be out of scope outside them. For example this section on bindings alludes to that.

In this case, it was primarily non-obvious to me that the braces would be irrelevant for the scoping of pattern match bindings (causing the lock to not be dropped), and I'm now wondering what the reasoning is behind that. Is it reasonable and/or possible to make the braces significant in this case?

Otherwise, yes, a warning would be nice :)

PS for what it's worth, turning it into an if let to make the binding more explicit also deadlocks

if let Ok(ref data) = self.cache.read() {
    if data.len() > to {
        Some(data[to].clone())
    } else {
        None
    }
} else {
    None
}

EDIT: a few more findings

Adding a (); (or anything really println!, let x = 1) to make the block multiline doesn't help, the following still deadlocks:

{
    (); // block consists of an expression statement and an expression but still fails to release lock
    match self.cache.read() {
        Ok(ref data) if data.len() > to => Some(data[to].clone()),
        _ => None,
    }
}

This however, works fine:

{
    let t = match self.cache.read() {
        Ok(ref data) if data.len() > to => Some(data[to].clone()),
        _ => None,
    };
    t
}

@arielb1
Copy link
Contributor

arielb1 commented Nov 6, 2016

The destructors for temporaries in tail expressions run when the containing block ends. The intent is to make braces "transparent". We need good examples for that.

@lloydmeta
Copy link
Member Author

Writing sound multithreaded code is hard enough as it is, and so is testing it correctly to find bugs. In this case I got almost everything right, but at the end, got snagged by what looks like a hard-to-understand syntactical mistake.

I still don't really understand the full breadth of why some examples work and others don't. But I'd like to pitch in to make it harder to for others to make the same mistake.

@Mark-Simulacrum
Copy link
Member

This, as expected, reproduces today. I'm uncertain whether we don't want it to... it feels like this isn't actually a bug (cc @rust-lang/lang, can you confirm?)

@lloydmeta
Copy link
Member Author

lloydmeta commented May 16, 2017

Reviewing this after 1/2 year, I have to say I'm still uneasy about the current behaviour. I get that there may be reasons for it from a language-implementations point of view, but from a end-user/language-user standpoint, it doesn't seem desirable.

In this particular case, my expectations of referential transparency as well as brace-based scoping were broken, leading to a deadly silent runtime failure.

Having said that, I'd really like to see a reason why the current behaviour is desirable for the end user. Short of one, IMHO the implementation should be fixed, or at the very least, docs updated and a warning thrown at compile time

EDIT: grammar fix.

@Mark-Simulacrum
Copy link
Member

Oh, no, I forgot to include my minimal example. Well, I can retype it up if anyone wants me to, otherwise I'll leave it as-is since composing it was rather.. difficult, IIRC.

@nikomatsakis
Copy link
Contributor

I agree that these rules can be confusing, particularly around match appearing in tail position. That said, I'm not sure what's the best fix -- and obviously any change here would most likely be backwards incompatible (i.e., some people may be relying on the dtor to run when it does).

@Havvy
Copy link
Contributor

Havvy commented May 29, 2017

I believe this is a duplicate of #21114.

@lloydmeta
Copy link
Member Author

@Havvy Definitely looks related given this comment and the following response.

Feels good to see that it is a bug that needs to be fixed after all.

@arielb1
Copy link
Contributor

arielb1 commented May 29, 2017

The reason this works is because you are supposed to be able to write this:

{
    match self.cache.read() { // <-- direct pattern matching
        Ok(ref data) => Some(data)
        _ => None,
    }
}.map(|data| {
    // use `data` - the lock better be held
})

That pattern is quite common in some places and should better work.

@arielb1
Copy link
Contributor

arielb1 commented May 29, 2017

Of course, because destructor locations aren't sensitive to lifetime inference, the lock must also be held in your example.

Also, #21114 is about the borrow checker being overly stupid, not about destructor ordering, so @pnkfelix was wrong in his diagnosis.

@lloydmeta
Copy link
Member Author

@arielb1 Thanks for that example. I must say, it's one of the most convincing examples I've seen thus far. That said, I think:

  1. The braces surrounding the match and before map are superfluous; you can do without them (Playground link):
match mutex.lock() {
    Ok(ref data) => Some(data.0),
    _ => None,
}.map(|data| {
    data + 1
})
  1. That pattern seems a bit unnecessarily verbose; why not just work with ref data inside the pattern match arm?

I have no doubt though that there is code like this out there; it does seem very convenient to do something like this and perhaps sometimes it's can happen due to personal preference for style.

@ryantheleach
Copy link

I've not programmed in rust ever, but just from the little I've read about it,

{ //start scoping.
    match self.cache.read() { //gain a lock and read.
        Ok(ref data) => Some(data) 
        _ => None,
    } //no semi colon so data leaks out of the next scope e.g. it's not unit()
//end scope.
}.map(|data| { //data is available, but the lock is out of scope.
    // use `data` - the lock better be held, but it's not, because you ended the scope, error?
})

vs some pseudo code ish looking thing.

{ 
    let x = match self.cache.read() { 
        Ok(ref data) => Some(data) 
        _ => None,
    } 
   x.maintainLock? x.lend? //how do you punch the lock through this? guessing you don't or can't. so why does it make sense to have brackets in the first place?
}.map(|data| {
    // use `data` - the lock better be held
})
let x = { 
    match self.cache.read() { 
        Ok(ref data) => Some(data) 
        _ => None,
    } 
}
//this deadzone is worrying though.
x.lock().map(|data| {
    // use `data` - the lock better be held
})

I can't think of a way that makes syntactic sense, given the simple rules I've briefly heard about.

@Havvy
Copy link
Contributor

Havvy commented Jun 3, 2017

@ryantheleach - Temporaries created in the final expression of a block live as long as temporaries in the statement containing the block.

@ryantheleach
Copy link

That makes a little more sense now.

@PieterPenninckx
Copy link
Contributor

PieterPenninckx commented Jul 26, 2017

Not knowing the precise rules, my intuition was that temporaries would be dropped "as soon as possible". This is clearly wrong, as people explained in their comments and as explained in the reference.

Adjusting the language so that it matches my intuition is impossible in general, I think, because it would change the behavior of existing code. So instead it's the intuition -- or at least the knowledge -- that should be changed. So I opened two issues: one for The Book and one for clippy.

I think however that this issue is precisely what linear data types (rust-lang/rfcs#814, or at least part thereof) intend to solve. In particular, the compiler could give a warning when a value of a data type that is marked as "must be dropped explicitly", goes out of scope without being dropped explicitly. (Edit: please note that RFC 814 is different from what I propose here.) I'm not sure however what would be the best way to treat compound data types (e.g. when a field of a struct is marked as "must be dropped explicitly"), so I'm leaving this as a suggestion.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 9, 2018

Just doing some triage.

I do not believe this to be a duplicate of #21114.

Some people connected the two issues due to this comment #21114 (comment) and my response #21114 (comment), but its important to note a couple things.

The examples given here, however, are about the consequences of the dynamic semantics of the language. The consequences listed here are, by definition, not cases where adding of a semi-colon or a let x = ...; x is spurious; rather, it has an observable effect on the drop order of rvalue temporaries.

Anyway, this is a long winded build up to me saying that while I do not believe this to be a duplicate of #21114, I do think it is a duplicate or perhaps sub-bug of #46413.

prototypicalpro pushed a commit to prototypicalpro/embedded-executor that referenced this issue Jun 13, 2020
Apparently this is a problem that's existed for quite a while and
*still* hasn't been fixed.

rust-lang/rust#37612
huntiep added a commit to Wiggin-Labs/incarnation that referenced this issue Nov 26, 2020
This also helps to fix a bug I ran into with Rust.

rust-lang/rust#37612
flip1995 pushed a commit to flip1995/rust that referenced this issue Jun 30, 2023
…s, r=giraffate

Bring up Rust lang rust-lang#37612 as a known problem for let_and_return

Fixes rust-lang/rust-clippy#4182.

I don't think conforming to this lint could trigger the issue immediately, only if subsequent code-changes go wrong, but I may be mistaken.

Since the lint can't trigger it by itself, just closing this issue might be reasonable, if not maybe this PR fixes it.

changelog: Update docs for `let_and_return`, mention rust-lang rust-lang#37612
@ichn-hu
Copy link

ichn-hu commented Sep 15, 2023

the following case looks much more confusing (and error-prone) than matching directly on lock result, and it seems to be of the same reason, we happen to find it in our production code which no one spot it may cause problem when the code is merged.

in the following POC, using wait_for_update1 would not deadlock, and wait_for_update2 will

use std::sync::Arc;
use tokio::sync::{Notify, RwLock};

#[derive(Default)]
struct ProxyInner {
    nr_updates: u64,
}

impl ProxyInner {
    fn update(&mut self) {
        self.nr_updates += 1;
    }

    fn get_update(&self) -> Option<u64> {
        if self.nr_updates % 2 == 0 {
            Some(self.nr_updates)
        } else {
            None
        }
    }
}

#[derive(Clone)]
struct ProxyClient {
    inner: Arc<RwLock<ProxyInner>>,
    notify: Arc<Notify>,
}

impl ProxyClient {
    fn new() -> ProxyClient {
        ProxyClient {
            inner: Arc::new(RwLock::new(ProxyInner::default())),
            notify: Arc::new(Notify::new()),
        }
    }

    async fn update(&self) {
        self.inner.write().await.update();
        // do something
        self.notify.notify_waiters();
    }

    // works fine
    async fn wait_for_update1(&self) -> Option<u64> {
        loop {
            let res = self.inner.read().await.get_update();
            match res {
                Some(nr) => return Some(nr),
                None => {
                    self.notify.notified().await;
                }
            }
        }
    }

    // may deadlock
    async fn wait_for_update2(&self) -> Option<u64> {
        loop {
            match self.inner.read().await.get_update() {
                Some(nr) => return Some(nr),
                None => {
                    self.notify.notified().await;
                }
            }
        }
    }
}

#[tokio::main(worker_threads = 2)]
async fn main() {
    let pc = ProxyClient::new();

    let task1 = {
        let pc = pc.clone();
        tokio::spawn(async move {
            loop {
                pc.update().await;
            }
        })
    };

    let task2 = {
        let pc = pc.clone();
        tokio::spawn(async move {
            loop {
                let nr = pc.wait_for_update1().await;
                println!("task2: nr = {:?}", nr);
            }
        })
    };

    task1.await.unwrap();
    task2.await.unwrap();
}

@tzachar
Copy link

tzachar commented Oct 10, 2024

I think I encountered a similar bug, in the following code:

use std::sync::Mutex;
                                                                                                                                                                    #[derive(Debug)]
struct Test
{
    a: i32,
    b: i32,
}

impl Test
{
    pub fn new(value: &Mutex<i32>) -> Self
    {
        Self {
            a: {
                *value.lock().expect("cannot lock")
            },
            b: {
                *value.lock().expect("cannot lock")
            },
        }
    }
}

fn main() {
    let value = Mutex::new(0);
    let t = Test::new(&value);
    println!("{:?}", t);
}

the new function deadlocks. I could not find a convincing explanation why the LockGuard is not dropped between the initializations of a and b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-destructors Area: Destructors (`Drop`, …) A-lifetimes Area: Lifetimes / regions C-bug Category: This is a bug. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests