Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range synchronization for histograms filled in parallel in auto-bin mode #902

Closed
wants to merge 8 commits into from

Conversation

gganis
Copy link
Member

@gganis gganis commented Aug 29, 2017

When filling histograms without limits in parallel a problem to be addressed is how to make sure that the ranges are compatible for the final merging.
This PR proposes a technique based on a static reference list of TAxis, kept as a static in TH1, filled/used by the different threads. The first thread calculates the TAxis ranges and saves it into the list, the others use it. The list is protected by a RW lock .
The logic is implemented in TH1::BufferEmpty and holds for TH{1,2,3}, the specificity of each TH{1,2,3} being moved to a set of new member functions called by TH1::BufferEmpty.

The change in TH1Merger is required to calculate the axis and dump the internal buffers when the internal buffersize has not yet been reached. This treatment can perhaps be improved to get the same result of the single thread case.

The patch also implements the hook for a call back function to implement the same functionality in the case of multi-processing. A patch with adaptation to multiproc will follow.

The tutorial mt301_fillHistAutoBin.C illustrates the usage with TThreadedObject .

NB: many of the changes in TH1.h come from clang-format-{3.8, 3.9, 4.0}

gganis added 4 commits August 29, 2017 12:36
To support the case where histograms without initial limits are filled
in threads without reaching the buffersize . Takes advantage of the
axes limits synchronization technique added in TH1.
For an easy way to printout number of bins and limits
This is implemented with a common reference list of list of TAxis for
multi-threaded applications, and with support for a call back to the
steering process for multi-process applications .

The new member fNameForRanges is used to uniquely identify internally
the relevant set of TAxis; at this purpose, the full name, including the
directories is used. In order to cope with the use of temporary
directories, for example, for threads (see TThreadedObject), a configurable
list of dir tags can be filtered out of the name.

The default list is "__TThreaded_dir_?" (used a regular expression), to cope with
TThreadedObject.
It can be changed or modified with the static

          TH1::SetStripOffDirs(const char *tags)

were tags is in the form "[+]newtags" ('+' means 'newtags' is added to the
existing ones). Or with env 'Hist.StripOffDirs' (same rule for the '+').
@gganis gganis requested review from couet and lmoneta as code owners August 29, 2017 10:57
@phsft-bot
Copy link
Collaborator

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

#include "TROOT.h"
#include "TCanvas.h"
#include <thread>
#include <iostream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment: We could perhaps try to put stl headers at the end of ROOT includes in order not to mask defects in our headers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, forget about my comment. In principle macros must run without any includes: autoparsing should take care of inclusions. If this is not the case, either we forgot to select something (which means that auto-load/parse did not happen) or we have a bug in auto-loading/parsing...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Done.

c->cd(3);
fh3d->DrawClone();

gROOTMutex = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a remnant to be able to run more times in the same ROOT shell.
I will remove it, also because all our mt... tutorials suffer from the same problem, i.e. at the second run all threads get locked at TRoot::Append:

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fd50c469e42 in __GI___pthread_mutex_lock (mutex=0x3cd6060) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007fd50b096f58 in TPosixMutex::Lock (this=0x3cd6050) at /home/ganis/local/root/GIT/root/core/thread/src/TPosixMutex.cxx:75
#3 0x00007fd50b0881cb in TMutex::Lock (this=0x3d117e0) at /home/ganis/local/root/GIT/root/core/thread/src/TMutex.cxx:48
#4 0x00007fd50e0412a2 in TLockGuard::TLockGuard (this=0x7fd4fcd73100, mutex=0x3d117e0) at include/TVirtualMutex.h:77
#5 0x00007fd50dac67fe in TROOT::Append (this=0x7fd50dff3240 ROOT::Internal::GetROOT1()::alloc, obj=0x7fd4e00008c0, replace=false)
at /home/ganis/local/root/GIT/root/core/base/src/TROOT.cxx:1003
#6 0x00007fd501e1fca3 in TH1::Copy (this=0x48499e0, obj=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:2729
#7 0x00007fd501e3b82f in TH1F::Copy (this=0x48499e0, newth1=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:9426
#8 0x00007fd501e3b6ba in TH1F::TH1F (this=0x7fd4e00008c0, h=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:9411

@phsft-bot
Copy link
Collaborator

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

Put stl headers at the end of ROOT includes.

Comment out 'gROOTMutex = 0' which is a trick to enable re-running in the
same ROOT shell.
@phsft-bot
Copy link
Collaborator

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

Copy link
Member

@dpiparo dpiparo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gganis , yes this is an issue. You mean that running twice a macro you even load deadlocks like in this example below?

root [0] .L mt001_fillHistos.C
root [1] mt001_fillHistos()
(int) 0
root [2] mt001_fillHistos()

If yes, I can reproduce and @Axel-Naumann perhaps has a solution already...

#include "TROOT.h"
#include "TCanvas.h"
#include <thread>
#include <iostream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, forget about my comment. In principle macros must run without any includes: autoparsing should take care of inclusions. If this is not the case, either we forgot to select something (which means that auto-load/parse did not happen) or we have a bug in auto-loading/parsing...

fgRefSync = new THashList;
TList *raxl = (TList *)fgRefSync->FindObject(onm);
if (raxl) {
needaxes = (SetRangesFromList(raxl) != 0) ? kTRUE : kFALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a tautology?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whay you say so. It depends on the return value of SetRangesFromList, right?

@@ -101,322 +104,380 @@ class TH1 : public TNamed, public TAttLine, public TAttFill, public TAttMarker {
Double_t *fIntegral; ///<!Integral of bins used by GetRandom
TVirtualHistPainter *fPainter; ///<!pointer to histogram painter
EBinErrorOpt fBinStatErrOpt; ///< option for bin statistical errors
void *fCallbackCtx; ///<!Context for the function to be called back
CallbackFunc_t fCallbackFunc; ///<!Function to be called back, for example to get ranges
TString fNameForRanges; ///<Name to be used to identidy the range
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a full picture of the changes yet, but does this need to be marked as persistent? TH1* have custom streamers anyway, but other new members were marked transient.

@@ -52,6 +54,7 @@ class TCollection;
class TVirtualFFT;
class TVirtualHistPainter;

typedef TList *(*CallbackFunc_t)(void *ctx, TList *buf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want fcn pointers for consistency or we can move to

using  CallbackFunc_t = std::function<TList*(void*,TList*)>

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think move to that. Thanks.

virtual void UpdateBinContent(Int_t bin, Double_t content);
virtual Double_t GetBinErrorSqUnchecked(Int_t bin) const { return fSumw2.fN ? fSumw2.fArray[bin] : RetrieveBinContent(bin); }
static void *fgCallbackCtx; ///<!Global context setting for the function to be called back
static CallbackFunc_t fgCallbackFunc; ///<!Global callback function setting, for example to get ranges
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For static members I would remove the transient mark as it is not really defined in this context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just copied from the existing statics. I can remove that. Thanks.

@@ -35,6 +35,9 @@ class TH2 : public TH1 {
Double_t fTsumwy2; //Total Sum of weight*Y*Y
Double_t fTsumwxy; //Total Sum of weight*X*Y

static void *fgCallbackCtx; ///<!Global context setting for the function to be called back
static CallbackFunc_t fgCallbackFunc; ///<!Global callback function setting, for example to get ranges

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above :)

@@ -150,7 +159,7 @@ class TH3 : public TH1, public TAtt3D {
return h.DoProject2D(name, title, projX,projY, computeErrors, originalRange, useUF, useOF);
}

ClassDef(TH3,5) //3-Dim histogram base class
ClassDef(TH3,6) //3-Dim histogram base class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: You only need to increase the version number is the list of persistent data member from this class or one of its base classes changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That was too quick.
I have reverted the patch, making sure that none of the new members is persistent.
Thanks!

/// Print axis bins and ranges
void TAxis::Print(Option_t *) const
{
printf(" %s\t%s \tNbins= %d, \tmin= %g, \tmax=%g", GetName(), GetTitle(), GetNbins(), GetXmin(), GetXmax());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must use Printf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that was a typo.
Corrected. Thanks!

@phsft-bot
Copy link
Collaborator

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

Making sure that no new member is persistent.
@phsft-bot
Copy link
Collaborator

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

@pcanal
Copy link
Member

pcanal commented Aug 29, 2017

I am a little concerned about the basic idea. If I understood correctly, there is a (unique) global registry where the histogram are identified based on their 'full path name' (beside the fact that GetNameForRanges seems both brittle and currently seems on first reading 'wrong'/'not-as-intended'). I see two major problems, one is that the 'full path name' may never be really unique i.e. it might be impossible to avoid synonyms that are semantically distincts ... a good example is two free standings (not attached to any directory) histograms that happens to have the same name in two distinct and independent part of the code (e.g. two CMSSW modules).

The other major problem is that it unnecessary tie (via that global mutex) all the 'parallel' histogram, this means that the overall scalability is inherently decreased.

Rather than a completely global state, wouldn't it make sense to have a wrapper object (for example TThreadedObject) be the holder of the lock and list for a single set of related histogram.
This would both reduce contention and guarantees that the histograms are really related.

A 3rd significant problem (but fixeable) is that the operation covered by the ReadWrite lock are not atomic (and/or trivial) and for the look of it could plausibly indirectly request the ROOT global lock and thus can lead to deadlocks (from some other code that hold the ROOT global locks and request the Write part of the ReadWrite lock).

A 4th deficiency is that once activated for one histogram it seems to apply to all histogram. i.e. as far as can tell if you one parallel histograms and 10,000 single-thread histograms, filling the single-thread histogram still has to go through the multi-thread registration/mechanism.

And that remind me, that another challenge for the 'unique registry' solution is to understand its scalability where reaching 10 to 100 thousands histograms.

Thanks,
Philippe.

@gganis
Copy link
Member Author

gganis commented Aug 29, 2017

Hi Philippe,

it might be impossible to avoid synonyms that are semantically distincts ... a good example is two free standings (not attached to any directory) histograms that happens to have the same name in two distinct and independent part of the code (e.g. two CMSSW modules).

Good point. I admit that did not really think to this case.

The other major problem is that it unnecessary tie (via that global mutex) all the 'parallel' histogram, this means that the overall scalability is inherently decreased.

Rather than a completely global state, wouldn't it make sense to have a wrapper object (for example TThreadedObject) be the holder of the lock and list for a single set of related histogram.
This would both reduce contention and guarantees that the histograms are really related.

I agree on this and TThreadedObject could be the place where to control this. The drawback is that we would then not have a solution outside TThreadedObject, i.e. we will have to find a way to force the use of TThreadedObject in MT cases.

A 3rd significant problem (but fixeable) is that the operation covered by the ReadWrite lock are not atomic (and/or trivial) and for the look of it could plausibly indirectly request the ROOT global lock and thus can lead to deadlocks (from some other code that hold the ROOT global locks and request the Write part of the ReadWrite lock).

Not sure to understand, you mean FindObject?
I will have a closer look.

A 4th deficiency is that once activated for one histogram it seems to apply to all histogram. i.e. as far as can tell if you one parallel histograms and 10,000 single-thread histograms, filling the single-thread histogram still has to go through the multi-thread registration/mechanism.

Ok, a solution 'per histogram' would also address this.

Thanks for going deep into it,
Gerri

@pcanal
Copy link
Member

pcanal commented Aug 29, 2017

Not sure to understand, you mean FindObject?
I will have a closer look.

FindObject might take the lock (or might not depending on the container and the implementation of the containee's function that are called). 'Warning' for sure (sometimes) request the lock as if I recall correctly it uses TClass inside. etc...

I.e. if you are not using the ROOT main lock, you must be very careful of what is inside the locked section ... and as always might it as 'small' as possible.

@pcanal
Copy link
Member

pcanal commented Aug 29, 2017

i.e. we will have to find a way to force the use of TThreadedObject in MT cases.

I am not sure what you mean :). We would be saying that if one want the internal implementation of MT histogram then you have to use this (or that) wrapper to enable it.

@phsft-bot
Copy link
Collaborator

Build failed on ubuntu14/native.
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Build failed on mac1012/native.
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Build failed on slc6/gcc49.
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Build failed on slc6/gcc62.
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Build failed on centos7/gcc49.
See console output.

Failing tests:

@gganis
Copy link
Member Author

gganis commented Aug 30, 2017

Hi,
After some more thinking, I believe we have to close this and rethink the all thing.
I found particularly tricky this point raised by Philippe:

a good example is two free standings (not attached to any directory) histograms that happens to have the same name in two distinct and independent part of the code (e.g. two CMSSW modules).

Supporting this case makes basically impossible to have an identifier for the histogram. In this moment I do not see how we can synchronize objects that we cannot somehow tag being together. In PROOF we somehow implicitly assumed that this could not happen (PROOF is not supporting it).

Rather than a completely global state, wouldn't it make sense to have a wrapper object (for example TThreadedObject) be the holder of the lock and list for a single set of related histogram.

This looked an appealing idea. However, it means that the member of a TThreadedObject has to know that is part of a TThreadedObject (which is not the case now) or that we should have a specialized TThreadedObject for histograms that does some settings on the histograms to steer the special behavior. And remains the fact that people will be forced to use a TThreadedObject (which may be ok).

Perhaps it is also worth to investigate if we can find an improved bin-finding algorithm that gives consistent binnings in the first place that can be merged. That would solve the problem at the roots.

Cheers,
Gerri

@gganis
Copy link
Member Author

gganis commented Aug 30, 2017

Changing strategy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants