-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand lockorder testing to look at mutexes, not specific instances #1420
Expand lockorder testing to look at mutexes, not specific instances #1420
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1420 +/- ##
==========================================
+ Coverage 90.87% 91.89% +1.02%
==========================================
Files 80 80
Lines 44569 50132 +5563
Branches 44569 50132 +5563
==========================================
+ Hits 40500 46069 +5569
+ Misses 4069 4063 -6
Continue to review full report at Codecov.
|
Hmm, may need to be smarter about how we determine which stack frame is the "mutex construction" one. |
In the next commit we add lockorder testing based on the line each mutex was created on rather than the particular mutex instance. This causes some additional test failure because of lockorder inversions for the same mutex across different tests, which is fixed here.
457f4d6
to
52aa726
Compare
d8562f0
to
0104080
Compare
OOOOOOoooooooooooo-KKKKKKkkkkkkkkkk, so this actually works on not-my-computer platforms now, would be great to land it! |
bca44d0
to
1d0b473
Compare
Had to rewrite a bit to address issues turned up once I fixed the issue where I wasn't even inserting the lock metadata into the map. |
520a00a
to
919c11a
Compare
1668072
to
e45bb7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, feel free to squash.
e3338f7
to
f14a324
Compare
Squashed + added one more fixup. Needs a second reviewer still. |
lightning/src/debug_sync.rs
Outdated
impl LockDep { | ||
/// Note that `Backtrace::new()` is rather expensive so we rely on the caller to fill in the | ||
/// `lockdep_backtrace` field after ensuring we need it. | ||
fn new_without_bt(lock: &Arc<LockMetadata>) -> Self { | ||
Self { lock: Arc::clone(lock), lockdep_trace: None } | ||
} | ||
} | ||
impl PartialEq for LockDep { | ||
fn eq(&self, o: &LockDep) -> bool { self.lock.lock_idx == o.lock.lock_idx } | ||
} | ||
impl Eq for LockDep {} | ||
impl std::hash::Hash for LockDep { | ||
fn hash<H: std::hash::Hasher>(&self, hasher: &mut H) { hasher.write_u64(self.lock.lock_idx); } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid the need for all this by using a HashMap
instead of a HashSet
for locked_before
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit to do this, but I'm not convinced its the right design. Ultimately we do have a set here - it happens to be unique by some index rather than a default PartialEq
, but so what. We don't really have a map, but we can mangle the set into a map if we do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess the thought was you could get rid of new_without_bt
, too, since you don't nee to create a LockDep
until you are ready to insert it. Well, the Option
within LockDep
, at least, as you'd still want a constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, yea, I guess that's worth it.
if found_debug_sync { | ||
if let Some(col) = symbol.colno() { | ||
return format!("{}:{}:{}", symbol.filename().unwrap().display(), symbol.lineno().unwrap(), col); | ||
} else { | ||
// Windows debug symbols don't support column numbers, so fall back to | ||
// line numbers only if no `colno` is available | ||
return format!("{}:{}", symbol.filename().unwrap().display(), symbol.lineno().unwrap()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be moved outside and before the enclosing if
, which could be eliminated? Shouldn't need to fail matching the regex once found_debug_sync
is true?
if found_debug_sync {
// ...
}
let symbol_name = symbol.name().unwrap().as_str().unwrap();
found_debug_sync = sync_mutex_constr_regex.is_match(symbol_name);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If a call to LockMetadata::new
doesn't get inlined (as done in the debug_sync
tests), we'll have multiple frames that can match found_debug_sync
, i.e., one frame for Mutex::new
and another for its underlying LockMetadata::new
. We want to go up the call stack until we find the last match, rather than the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If a call to
LockMetadata::new
doesn't get inlined (as done in thedebug_sync
tests), we'll have multiple frames that can matchfound_debug_sync
, i.e., one frame forMutex::new
and another for its underlyingLockMetadata::new
.
Ok, I read the backtrace
docs and have a better understanding of how this works now. I feel like there may be a more succinct was of doing this with Iterator::position
that might be more readable but then again maybe not lol.
We want to go up the call stack until we find the last match, rather than the first one.
Hmm... if I read the docs correctly, the first item is the top of the call stack so iterating through in order would be moving down the call stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, yea.....so I guess "top" and "bottom" depends on what architecture you're on and if your stack grows down or up...sorry for the lack of clarity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always think of the call stack as a logical data structure (a stack) regardless of how the architecture represents it. So a function call is added to the top of the stack, since that is the only way to add an item to a stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh yeah I think about them the other way around but that's mostly due to previous experience with disassemblers.
25c3dba
to
6d54ebb
Compare
if found_debug_sync { | ||
if let Some(col) = symbol.colno() { | ||
return format!("{}:{}:{}", symbol.filename().unwrap().display(), symbol.lineno().unwrap(), col); | ||
} else { | ||
// Windows debug symbols don't support column numbers, so fall back to | ||
// line numbers only if no `colno` is available | ||
return format!("{}:{}", symbol.filename().unwrap().display(), symbol.lineno().unwrap()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If a call to
LockMetadata::new
doesn't get inlined (as done in thedebug_sync
tests), we'll have multiple frames that can matchfound_debug_sync
, i.e., one frame forMutex::new
and another for its underlyingLockMetadata::new
.
Ok, I read the backtrace
docs and have a better understanding of how this works now. I feel like there may be a more succinct was of doing this with Iterator::position
that might be more readable but then again maybe not lol.
We want to go up the call stack until we find the last match, rather than the first one.
Hmm... if I read the docs correctly, the first item is the top of the call stack so iterating through in order would be moving down the call stack?
lightning/src/debug_sync.rs
Outdated
// that as the mutex construction site. Note that the first few frames may be in | ||
// `backtrace`, so we have to ignore those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow the last sentence. What is "first few frames" referring to? When wouldn't the first few frames be in backtrace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno, but on my machine locally there are no reported frames in backtrace
, the first one is directly the site that calls Backtrace::new
, so maybe they're filtering those on some platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think the confusion is that I read "in backtace
" as some BacktraceFrame
returned by Backtrace::frames
" whereas you mean "frames corresponding to calls on the backtrace
object". Maybe just s/in/on
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but its not on a backtrace
"object" its "in the backtrace crate"? I said that instead.
lightning/src/debug_sync.rs
Outdated
impl LockDep { | ||
/// Note that `Backtrace::new()` is rather expensive so we rely on the caller to fill in the | ||
/// `lockdep_backtrace` field after ensuring we need it. | ||
fn new_without_bt(lock: &Arc<LockMetadata>) -> Self { | ||
Self { lock: Arc::clone(lock), lockdep_trace: None } | ||
} | ||
} | ||
impl PartialEq for LockDep { | ||
fn eq(&self, o: &LockDep) -> bool { self.lock.lock_idx == o.lock.lock_idx } | ||
} | ||
impl Eq for LockDep {} | ||
impl std::hash::Hash for LockDep { | ||
fn hash<H: std::hash::Hasher>(&self, hasher: &mut H) { hasher.write_u64(self.lock.lock_idx); } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess the thought was you could get rid of new_without_bt
, too, since you don't nee to create a LockDep
until you are ready to insert it. Well, the Option
within LockDep
, at least, as you'd still want a constructor.
6d54ebb
to
2b2863e
Compare
lightning/src/debug_sync.rs
Outdated
// that as the mutex construction site. Note that the first few frames may be in | ||
// `backtrace`, so we have to ignore those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think the confusion is that I read "in backtace
" as some BacktraceFrame
returned by Backtrace::frames
" whereas you mean "frames corresponding to calls on the backtrace
object". Maybe just s/in/on
?
if found_debug_sync { | ||
if let Some(col) = symbol.colno() { | ||
return format!("{}:{}:{}", symbol.filename().unwrap().display(), symbol.lineno().unwrap(), col); | ||
} else { | ||
// Windows debug symbols don't support column numbers, so fall back to | ||
// line numbers only if no `colno` is available | ||
return format!("{}:{}", symbol.filename().unwrap().display(), symbol.lineno().unwrap()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always think of the call stack as a logical data structure (a stack) regardless of how the architecture represents it. So a function call is added to the top of the stack, since that is the only way to add an item to a stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to land once squashed.
2b2863e
to
0737783
Compare
Added one more fixup commit with a wording tweak, can squash after that. |
When we add lockorder detection based on mutex construction site rather than mutex instance in the next commit, ChannelMonitor's PartialEq implementation causes spurious failures. This is caused by the lockorder detection logic considering the ChannelMonitor inner mutex to be two distinct mutexes - one when monitors are deserialized and one when monitors are created fresh. Instead, we attempt to tell the lockorder detection logic that they are the same by ensuring they're constructed in the same place - in this case a util method.
Our existing lockorder inversion checks look at specific instances of mutexes rather than the general mutex itself. This changes that behavior to look at the instruction pointer at which a mutex was created and treat all mutexes which were created at the same location as equivalent. This allows us to detect lockorder inversions which occur across tests, though it does substantially reduce parallelism during test runs.
0737783
to
ff20203
Compare
Went ahead and squashed:
|
Our existing lockorder inversion checks look at specific instances
of mutexes rather than the general mutex itself. This changes that
behavior to look at the instruction pointer at which a mutex was
created and treat all mutexes which were created at the same
location as equivalent.
This allows us to detect lockorder inversions which occur across
tests, though it does substantially reduce parallelism during test
runs.