Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoogleBenchmark setup #2

Open
vincecr0ft opened this issue Aug 30, 2017 · 5 comments
Open

GoogleBenchmark setup #2

vincecr0ft opened this issue Aug 30, 2017 · 5 comments

Comments

@vincecr0ft
Copy link
Member

Two files have been added. runBinnedTest.cxx and buildBinnedTest3.cxx These are simple ROOT macros that have been provided to the ROOT Benchmarking team. Results on earlier tests can be seen here.

As can be seen the time goes up for multiple cores. this is apparently still true with more cores capping the job time at taking more than 12 hours.

A first solution is proposed to kill lines 57 and 61 (mentioning ActivateStatError()) and run on up to 8 cores.

Once the google benchmark code runs well on the knl with proper scaling this test can be extended to other systems (e.g. on NLESC machines) but in the mean while the code to run the tests can be found here.

@egpbos
Copy link
Member

egpbos commented Sep 4, 2017

/user/vcroft/public_html/forPatrick/outputAugust09

@egpbos
Copy link
Member

egpbos commented Sep 7, 2017

Yesterday, we ran a simple test with a modified buildBinnedTest3.cxx that produces just 1 channel and no nuisance parameters. We benchmarked it on up to 8 cores, with and without CPU affinity activated. Results:
total_timing
Blue lines are measured wallclock time, purple lines are theoretical "ideal scaling" times, i.e. the time it would take if the runtime scaled ideally with number of cores. That it does not is a known problem for this type of likelihood which is being worked on.

@egpbos
Copy link
Member

egpbos commented Sep 7, 2017

@vincecr0ft what are the other parameters in this run actually, N_samples, N_events and N_bins?

@egpbos
Copy link
Member

egpbos commented Sep 7, 2017

Judging by the other set of workspaces, I guess it's 1 sample, 1000 events, 10 bins, is that correct?

@egpbos
Copy link
Member

egpbos commented Sep 7, 2017

For completeness' sake, this plot compares the total and ideal wall clock times (blue and red) to the CPU times. Green is the CPU time of the master process, purple the sum of all the MPFE processes' CPU times and yellow the sum of green and purple (i.e. the total CPU time of the entire minimization process over multiple CPUs).

timing_total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants