-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use only one shard with a single thread #111755
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
This needs to be considered in conjunction with #111713. |
} | ||
|
||
/// The shard is selected by hashing `val` with `FxHasher`. | ||
#[inline] | ||
pub fn get_shard_by_value<K: Hash + ?Sized>(&self, val: &K) -> &Lock<T> { | ||
if SHARDS == 1 { &self.shards[0].0 } else { self.get_shard_by_hash(make_hash(val)) } | ||
self.get_shard_by_hash(if SHARDS == 1 { 0 } else { make_hash(val) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return &self.shards.get_unchecked(0).0
directly if SHARDS == 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets masked by 0 later, so it doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this a branch over cfg(debug_assertions)
? I'd rather avoid having some logic over the cfg and some other over the value of SHARDS
, even if they are equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you mean cfg(parallel_compiler)
here, it's still useful to disable just sharding by setting SHARDS
to 1 to evaluate how effective sharding is.
r? @SparrowLii |
It looks good to me. I have no privileges so r? @cjgillot |
Sharded { shards: [(); SHARDS].map(|()| CacheAligned(Lock::new(value()))) } | ||
Sharded { | ||
#[cfg(parallel_compiler)] | ||
mask: if is_dyn_thread_safe() { SHARDS - 1 } else { 0 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask
needs a comment explaining what it's for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for the perf improvement? Is it just a cache effect, or do we gain from skipping some key hashing?
|
||
#[inline(always)] | ||
fn count(&self) -> usize { | ||
self.mask() + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a reminder that mask
is 2^n - 1
, so adding 1 is correct.
} | ||
|
||
/// The shard is selected by hashing `val` with `FxHasher`. | ||
#[inline] | ||
pub fn get_shard_by_value<K: Hash + ?Sized>(&self, val: &K) -> &Lock<T> { | ||
if SHARDS == 1 { &self.shards[0].0 } else { self.get_shard_by_hash(make_hash(val)) } | ||
self.get_shard_by_hash(if SHARDS == 1 { 0 } else { make_hash(val) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this a branch over cfg(debug_assertions)
? I'd rather avoid having some logic over the cfg and some other over the value of SHARDS
, even if they are equivalent.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 4d0150d81e6bd2b63caec8aa801ce8098c7d66eb with merge 0224514e56967ce45ee3658e8baf806c75f63c53... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (0224514e56967ce45ee3658e8baf806c75f63c53): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 645.989s -> 646.069s (0.01%) |
It probably just the cache effect and the replacement of the bounds check by a masking operation. Not sure there's any place where hashing can be skipped atm. |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (3fae1b9): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 646.068s -> 646.98s (0.14%) |
This changes
Sharded
to only access a single shard using a mask set to0
when a single thread is used, which leads to cache utilization improvements.Performance improvement with 1 thread and
cfg(parallel_compiler)
:cc @SparrowLii