Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incr.comp.: Use a set implementation optimized for small item counts for deduplicating read-edges. #45577

Closed

Conversation

michaelwoerister
Copy link
Member

Many kinds of DepNodes will only ever have between zero and three edges originating from them (see e.g. #45063 (comment)) so let's try to avoid allocating a HashSet in those cases.

r? @nikomatsakis

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 27, 2017
@leonardo-m
Copy link

Something like this? https://crates.io/crates/vec_map

@kennytm
Copy link
Member

kennytm commented Oct 28, 2017

@leonardo-m No, this structure is more like https://docs.rs/david-set/0.1.2/david_set/struct.Set.html.

// Many kinds of nodes often only have between 0 and 3 edges, so we provide a
// specialized set implementation that does not allocate for those some counts.
#[derive(Debug, PartialEq, Eq)]
enum DepNodeIndexSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to a generic data structure in rustc_data_structures?

@arielb1
Copy link
Contributor

arielb1 commented Oct 28, 2017

Could you try and get some numbers on the performance improvement or the hit rate?

Even if you don't, r=me with the set moved to rustc_data_structures.

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 28, 2017
@kennytm
Copy link
Member

kennytm commented Oct 28, 2017

@bors try

Preparing for perf.

@bors
Copy link
Contributor

bors commented Oct 28, 2017

⌛ Trying commit be27d8b with merge 877833f...

bors added a commit that referenced this pull request Oct 28, 2017
incr.comp.: Use a set implementation optimized for small item counts for deduplicating read-edges.

Many kinds of `DepNodes` will only ever have between zero and three edges originating from them (see e.g. #45063 (comment)) so let's try to avoid allocating a `HashSet` in those cases.

r? @nikomatsakis
@bors
Copy link
Contributor

bors commented Oct 28, 2017

☀️ Test successful - status-travis
State: approved= try=True

@kennytm
Copy link
Member

kennytm commented Oct 28, 2017

@rust-lang/infra perf check requested from #45577 (comment).

@arthurprs
Copy link
Contributor

This is fine, but an enum InlineSet<T> {Inline(ArrayVec<T>), Extended(FxHashSet<T>) } would be nicer and reusable.

Note: ArrayVec is already in the codebase.

@Mark-Simulacrum
Copy link
Member

http://perf.rust-lang.org/compare.html?start=7da9a5e178e28b2e387e6296aa1b0289acdf5781&end=877833ffc410fea76cf3eb1bee275ccb92a43e7d

@kennytm
Copy link
Member

kennytm commented Oct 30, 2017

The improvement/regression is negligible. There is even a +17.4% max-rss (memory use) in the regex-opt-0.1.80@020-incr-from... test case. Overall landing this PR does not seem beneficial.

Measure ∑before ∑after Δ
cpu-clock 414267.47 411066.66 -0.77%
cycles:u 1582921482931 1570998057890 -0.75%
faults 1842937 1836494 -0.35%
instructions:u 1832233383606 1832161663635 -0.00%
max-rss 17936396 18100956 +0.92%
task-clock 413244.27 409110.92 -1.00%
wall-time 220.22 220.70 +0.22%

@michaelwoerister
Copy link
Member Author

michaelwoerister commented Oct 30, 2017

Thanks for kicking off the performance measurement, @kennytm!

Could you try and get some numbers on the performance improvement or the hit rate?

@julian-seward1 and I ran into this function as on of the hotter ones while profiling a incremental building of the regex crate. The compiler spends roughly 0.9% and 1.6% of cycles in the read_index() function, depending on how much of the dependency graph can be marked as green. So that's the upper limit of the performance improvement here.

The main intended optimization here is to get rid of the heap allocation for small hash sets. It's a bit surprising that the effect on the overall instruction count is so small. I would have hoped for -0.5% instead of -0.1%.

There is even a +17.4% max-rss (memory use) in the regex-opt-0.1.80@020-incr-from... test case.

This must be some other effect. I don't see how this could introduce any noticeable increase in memory consumption.

Thanks for your comments, everyone! Maybe I'll tinker some more with this in my spare time. Closing for now.

@kennytm
Copy link
Member

kennytm commented Oct 30, 2017

The main intended optimization here is to get rid of the heap allocation for small hash sets. It's a bit surprising that the effect on the overall instruction count is so small. I would have hoped for -0.5% instead of -0.1%.

@michaelwoerister Probably it's because jemalloc is very efficient? 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants