-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Consensus] Protect mapBlockIndex with its own mutex #951
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that bugs me is that cs_main is used as a general lock, when we should be locking individual things. Since this is targeting mapBlockIndex, do you see a reason to not create a lock specifically for mapBlockIndex so we can start separating from this giant super lock that is preventing significant multithreading to happen?
src/validation.cpp
Outdated
LOCK(cs_main); | ||
// If mapBlockIndex isn't bloated, don't bother taking the time. | ||
if (chainActive.Height() > static_cast<int>(mapBlockIndex.size() - PRUNE_COUNT)) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This area of code was a major place of inefficiency (primarily later on since IsAncestor took way longer than chainActive.Contains, and it was being done first). It's probably not something I want to lock through the whole thing, since it's looking for stale tips that are old (PRUNE_DEPTH
).
I think I would prefer locking to check if mapBlockIndex is bloated; and if so, make a copy, unlock and then let it run with the copy even if mapBlockIndex changes; so we're not locked for the whole check if it's a long check [e.g. on startup].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to use a separate lock, but missed this section in the subsequent passes. Will get back to this when I have a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to split this in two critical sections, and not use cs_main at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work so far!
Let me look into using a separate lock. I think at the time it was not especially clear whether cs_main was already locking for mapBlockIndex in some places. |
it was definitely requiring it (in some places ;))
|
fb97aab
to
042bcdc
Compare
tsan reported a data race in GetAncestor, so this is not ready to merge. I will have to look into that. |
Declare mapBlockIndex to be protected by cs_mapblockindex. Move a lot of generic lookups to LookupBlockIndex, while other more complex uses (e.g. atomic lookup+insert) are left alone. Make sure the lock is taken everywhere necessary (places found with clang thread-safety warnings) and only as tightly as possible to avoid potential lock cycles (detectable by tsan). Many uses of cs_main are replaced by cs_mapblockindex if they were only protecting mapBlockIndex, or removed (LookupBlockIndex taking the lock itself), or moved to after LookupBlockIndex calls. Rework PruneStaleBlockIndexes to do most work without the lock.
=== Veil: cannot presently remove SetNull helper due to other use, however, this is still worthwhile due to the initialization changes.
Cherry-picked explicit default initialization in CBlockIndex, this quiets the tsan warning about a read conflict with a write in operator::new. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK ad442a4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 13dd025
Problem
tsan identified a data race on CChainState, particularly between AcceptBlock and ActivateBestChain (used often by the miner threads).
Solution
Mark
mapBlockIndex
as guarded bycs_mapblockindex
, and ensure that the lock is taken prior to any use. Some code has been reorganized slightly to reduce the time spent holding the lock and avoid locking a second time for the same information.Move many lookups to use LookupBlockIndex, which grabs the lock itself, and remove lots of cs_main uses that would have otherwise blocked only for LookupBlockIndex.
Tested
Original: Configured with --with-sanitizers=tsan, built with clang, run on regtest with 2 SHA mining threads enabled.
With cs_mapblockindex: configured --with-sanitizers=tsan, built with clang, run on testnet to catch up and mine 2 sha threads.