Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vector-of-structs of preds/semi for Lengauer-Tarjan #408

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

samolisov
Copy link
Contributor

Closes #383

@samolisov
Copy link
Contributor Author

I use the following benchmark: dominator_tree_benchmark.cpp

On my machine (32 X 1792.7 MHz CPU s with hyper-threading and almost zero Load Average, Ubuntu 20.4) the report is the following (we may use the state after merging #407 as a base-line:

----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
Tarjan's paper (vertex list)                   934 ns          934 ns       748347
Tarjan's paper  (vertex vector)                845 ns          845 ns       830574
Appel. fig. 19.8 (vertex list)                 960 ns          959 ns       731191
Appel. fig. 19.8  (vertex vector)              860 ns          860 ns       813827
Muchnick. fig. 8.18 (vertex list)              561 ns          560 ns      1248586
Muchnick. fig. 8.18  (vertex vector)           538 ns          538 ns      1302725
Cytron's paper, fig. 9 (vertex list)          1145 ns         1145 ns       613263
Cytron's paper, fig. 9  (vertex vector)       1046 ns         1046 ns       674659
From a code, 186 BBs (vertex list)           12938 ns        12937 ns        54742
From a code, 186 BBs (vertex vector)         11528 ns        11527 ns        62319

After implementing a "vector-of-structs" solution, the numbers are the following:

----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
Tarjan's paper (vertex list)                   919 ns          919 ns       768302
Tarjan's paper  (vertex vector)                835 ns          835 ns       838532
Appel. fig. 19.8 (vertex list)                 944 ns          944 ns       739354
Appel. fig. 19.8  (vertex vector)              854 ns          854 ns       825316
Muchnick. fig. 8.18 (vertex list)              527 ns          527 ns      1285818
Muchnick. fig. 8.18  (vertex vector)           488 ns          488 ns      1433765
Cytron's paper, fig. 9 (vertex list)          1101 ns         1101 ns       636063
Cytron's paper, fig. 9  (vertex vector)       1024 ns         1024 ns       685137
From a code, 186 BBs (vertex list)           12754 ns        12753 ns        54584
From a code, 186 BBs (vertex vector)         11623 ns        11622 ns        6116

Here we can see about 1% speedup for the "large" cases (for CFGs with 186 basic blocks) and about 10% for small ones (Muchnick. fig. 8.18, 8 vertices).

I'm thinking what to deal with the semedom_ vector: whether should we put samedoms into the struct? The pattern is a little different so that some more experiments are required.

@samolisov
Copy link
Contributor Author

Maybe a check on a larger graph (up to 1000 or 2000-3000) nodes is needed to ensure there is no regression for large inputs.

@jeremy-murphy
Copy link
Contributor

Thanks for trying this change, pity it didn't yield anything significant. I still think it's a better logical design, so I'm happy to proceed with it, although I'd like to make a few style changes.
For starters, I think we can just drop the set functions on the struct. More later.

@samolisov
Copy link
Contributor Author

@jeremy-murphy Thank you for the suggestion, I've replaced every set_ method with a direct writing to the corresponding field and remove the methods.

Also, I added a benchmark for a huge (3000+ nodes) graph, on such graph I see the following situation. The baseline (code from the develop branch):

Huge Inlined Function (vertex list)         275707 ns       275683 ns         2531
Huge Inlined Function (vertex vector)       236892 ns       236878 ns         2969

With the "cache-friendly" solution:

Huge Inlined Function (vertex list)         284871 ns       284855 ns         2495
Huge Inlined Function (vertex vector)       251233 ns       251218 ns         2783

So, we can see even some performance degradation, up to 3-6%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why the implementation of Lengauer-Tarjan uses std::deque for a bucket?
2 participants