-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range synchronization for histograms filled in parallel in auto-bin mode #902
Conversation
To support the case where histograms without initial limits are filled in threads without reaching the buffersize . Takes advantage of the axes limits synchronization technique added in TH1.
For an easy way to printout number of bins and limits
This is implemented with a common reference list of list of TAxis for multi-threaded applications, and with support for a call back to the steering process for multi-process applications . The new member fNameForRanges is used to uniquely identify internally the relevant set of TAxis; at this purpose, the full name, including the directories is used. In order to cope with the use of temporary directories, for example, for threads (see TThreadedObject), a configurable list of dir tags can be filtered out of the name. The default list is "__TThreaded_dir_?" (used a regular expression), to cope with TThreadedObject. It can be changed or modified with the static TH1::SetStripOffDirs(const char *tags) were tags is in the form "[+]newtags" ('+' means 'newtags' is added to the existing ones). Or with env 'Hist.StripOffDirs' (same rule for the '+').
And its use with TThreadedObject .
Starting build on |
Build failed on slc6/gcc62. Failing tests: |
Build failed on centos7/gcc49. Failing tests: |
Build failed on mac1012/native. Failing tests: |
#include "TROOT.h" | ||
#include "TCanvas.h" | ||
#include <thread> | ||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor comment: We could perhaps try to put stl headers at the end of ROOT includes in order not to mask defects in our headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, forget about my comment. In principle macros must run without any includes: autoparsing should take care of inclusions. If this is not the case, either we forgot to select something (which means that auto-load/parse did not happen) or we have a bug in auto-loading/parsing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Done.
c->cd(3); | ||
fh3d->DrawClone(); | ||
|
||
gROOTMutex = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a remnant to be able to run more times in the same ROOT shell.
I will remove it, also because all our mt... tutorials suffer from the same problem, i.e. at the second run all threads get locked at TRoot::Append:
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fd50c469e42 in __GI___pthread_mutex_lock (mutex=0x3cd6060) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007fd50b096f58 in TPosixMutex::Lock (this=0x3cd6050) at /home/ganis/local/root/GIT/root/core/thread/src/TPosixMutex.cxx:75
#3 0x00007fd50b0881cb in TMutex::Lock (this=0x3d117e0) at /home/ganis/local/root/GIT/root/core/thread/src/TMutex.cxx:48
#4 0x00007fd50e0412a2 in TLockGuard::TLockGuard (this=0x7fd4fcd73100, mutex=0x3d117e0) at include/TVirtualMutex.h:77
#5 0x00007fd50dac67fe in TROOT::Append (this=0x7fd50dff3240 ROOT::Internal::GetROOT1()::alloc, obj=0x7fd4e00008c0, replace=false)
at /home/ganis/local/root/GIT/root/core/base/src/TROOT.cxx:1003
#6 0x00007fd501e1fca3 in TH1::Copy (this=0x48499e0, obj=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:2729
#7 0x00007fd501e3b82f in TH1F::Copy (this=0x48499e0, newth1=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:9426
#8 0x00007fd501e3b6ba in TH1F::TH1F (this=0x7fd4e00008c0, h=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:9411
Build failed on ubuntu14/native. Failing tests: |
Build failed on slc6/gcc49. Failing tests: |
Starting build on |
Put stl headers at the end of ROOT includes. Comment out 'gROOTMutex = 0' which is a trick to enable re-running in the same ROOT shell.
Starting build on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gganis , yes this is an issue. You mean that running twice a macro you even load deadlocks like in this example below?
root [0] .L mt001_fillHistos.C
root [1] mt001_fillHistos()
(int) 0
root [2] mt001_fillHistos()
If yes, I can reproduce and @Axel-Naumann perhaps has a solution already...
#include "TROOT.h" | ||
#include "TCanvas.h" | ||
#include <thread> | ||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, forget about my comment. In principle macros must run without any includes: autoparsing should take care of inclusions. If this is not the case, either we forgot to select something (which means that auto-load/parse did not happen) or we have a bug in auto-loading/parsing...
fgRefSync = new THashList; | ||
TList *raxl = (TList *)fgRefSync->FindObject(onm); | ||
if (raxl) { | ||
needaxes = (SetRangesFromList(raxl) != 0) ? kTRUE : kFALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a tautology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whay you say so. It depends on the return value of SetRangesFromList, right?
hist/hist/inc/TH1.h
Outdated
@@ -101,322 +104,380 @@ class TH1 : public TNamed, public TAttLine, public TAttFill, public TAttMarker { | |||
Double_t *fIntegral; ///<!Integral of bins used by GetRandom | |||
TVirtualHistPainter *fPainter; ///<!pointer to histogram painter | |||
EBinErrorOpt fBinStatErrOpt; ///< option for bin statistical errors | |||
void *fCallbackCtx; ///<!Context for the function to be called back | |||
CallbackFunc_t fCallbackFunc; ///<!Function to be called back, for example to get ranges | |||
TString fNameForRanges; ///<Name to be used to identidy the range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a full picture of the changes yet, but does this need to be marked as persistent? TH1* have custom streamers anyway, but other new members were marked transient.
@@ -52,6 +54,7 @@ class TCollection; | |||
class TVirtualFFT; | |||
class TVirtualHistPainter; | |||
|
|||
typedef TList *(*CallbackFunc_t)(void *ctx, TList *buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want fcn pointers for consistency or we can move to
using CallbackFunc_t = std::function<TList*(void*,TList*)>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think move to that. Thanks.
virtual void UpdateBinContent(Int_t bin, Double_t content); | ||
virtual Double_t GetBinErrorSqUnchecked(Int_t bin) const { return fSumw2.fN ? fSumw2.fArray[bin] : RetrieveBinContent(bin); } | ||
static void *fgCallbackCtx; ///<!Global context setting for the function to be called back | ||
static CallbackFunc_t fgCallbackFunc; ///<!Global callback function setting, for example to get ranges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For static members I would remove the transient mark as it is not really defined in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just copied from the existing statics. I can remove that. Thanks.
@@ -35,6 +35,9 @@ class TH2 : public TH1 { | |||
Double_t fTsumwy2; //Total Sum of weight*Y*Y | |||
Double_t fTsumwxy; //Total Sum of weight*X*Y | |||
|
|||
static void *fgCallbackCtx; ///<!Global context setting for the function to be called back | |||
static CallbackFunc_t fgCallbackFunc; ///<!Global callback function setting, for example to get ranges | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above :)
hist/hist/inc/TH3.h
Outdated
@@ -150,7 +159,7 @@ class TH3 : public TH1, public TAtt3D { | |||
return h.DoProject2D(name, title, projX,projY, computeErrors, originalRange, useUF, useOF); | |||
} | |||
|
|||
ClassDef(TH3,5) //3-Dim histogram base class | |||
ClassDef(TH3,6) //3-Dim histogram base class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: You only need to increase the version number is the list of persistent data member from this class or one of its base classes changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. That was too quick.
I have reverted the patch, making sure that none of the new members is persistent.
Thanks!
hist/hist/src/TAxis.cxx
Outdated
/// Print axis bins and ranges | ||
void TAxis::Print(Option_t *) const | ||
{ | ||
printf(" %s\t%s \tNbins= %d, \tmin= %g, \tmax=%g", GetName(), GetTitle(), GetNbins(), GetXmin(), GetXmax()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must use Printf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that was a typo.
Corrected. Thanks!
Starting build on |
Making sure that no new member is persistent.
Starting build on |
I am a little concerned about the basic idea. If I understood correctly, there is a (unique) global registry where the histogram are identified based on their 'full path name' (beside the fact that GetNameForRanges seems both brittle and currently seems on first reading 'wrong'/'not-as-intended'). I see two major problems, one is that the 'full path name' may never be really unique i.e. it might be impossible to avoid synonyms that are semantically distincts ... a good example is two free standings (not attached to any directory) histograms that happens to have the same name in two distinct and independent part of the code (e.g. two CMSSW modules). The other major problem is that it unnecessary tie (via that global mutex) all the 'parallel' histogram, this means that the overall scalability is inherently decreased. Rather than a completely global state, wouldn't it make sense to have a wrapper object (for example TThreadedObject) be the holder of the lock and list for a single set of related histogram. A 3rd significant problem (but fixeable) is that the operation covered by the ReadWrite lock are not atomic (and/or trivial) and for the look of it could plausibly indirectly request the ROOT global lock and thus can lead to deadlocks (from some other code that hold the ROOT global locks and request the Write part of the ReadWrite lock). A 4th deficiency is that once activated for one histogram it seems to apply to all histogram. i.e. as far as can tell if you one parallel histograms and 10,000 single-thread histograms, filling the single-thread histogram still has to go through the multi-thread registration/mechanism. And that remind me, that another challenge for the 'unique registry' solution is to understand its scalability where reaching 10 to 100 thousands histograms. Thanks, |
Hi Philippe,
Good point. I admit that did not really think to this case.
I agree on this and TThreadedObject could be the place where to control this. The drawback is that we would then not have a solution outside TThreadedObject, i.e. we will have to find a way to force the use of TThreadedObject in MT cases.
Not sure to understand, you mean FindObject?
Ok, a solution 'per histogram' would also address this. Thanks for going deep into it, |
FindObject might take the lock (or might not depending on the container and the implementation of the containee's function that are called). 'Warning' for sure (sometimes) request the lock as if I recall correctly it uses TClass inside. etc... I.e. if you are not using the ROOT main lock, you must be very careful of what is inside the locked section ... and as always might it as 'small' as possible. |
I am not sure what you mean :). We would be saying that if one want the internal implementation of MT histogram then you have to use this (or that) wrapper to enable it. |
Build failed on ubuntu14/native. Failing tests: |
Build failed on mac1012/native. Failing tests: |
Build failed on slc6/gcc49. Failing tests: |
Build failed on slc6/gcc62. Failing tests: |
Build failed on centos7/gcc49. Failing tests: |
Hi,
Supporting this case makes basically impossible to have an identifier for the histogram. In this moment I do not see how we can synchronize objects that we cannot somehow tag being together. In PROOF we somehow implicitly assumed that this could not happen (PROOF is not supporting it).
This looked an appealing idea. However, it means that the member of a TThreadedObject has to know that is part of a TThreadedObject (which is not the case now) or that we should have a specialized TThreadedObject for histograms that does some settings on the histograms to steer the special behavior. And remains the fact that people will be forced to use a TThreadedObject (which may be ok). Perhaps it is also worth to investigate if we can find an improved bin-finding algorithm that gives consistent binnings in the first place that can be merged. That would solve the problem at the roots. Cheers, |
Changing strategy |
When filling histograms without limits in parallel a problem to be addressed is how to make sure that the ranges are compatible for the final merging.
This PR proposes a technique based on a static reference list of TAxis, kept as a static in TH1, filled/used by the different threads. The first thread calculates the TAxis ranges and saves it into the list, the others use it. The list is protected by a RW lock .
The logic is implemented in TH1::BufferEmpty and holds for TH{1,2,3}, the specificity of each TH{1,2,3} being moved to a set of new member functions called by TH1::BufferEmpty.
The change in TH1Merger is required to calculate the axis and dump the internal buffers when the internal buffersize has not yet been reached. This treatment can perhaps be improved to get the same result of the single thread case.
The patch also implements the hook for a call back function to implement the same functionality in the case of multi-processing. A patch with adaptation to multiproc will follow.
The tutorial mt301_fillHistAutoBin.C illustrates the usage with TThreadedObject .
NB: many of the changes in TH1.h come from clang-format-{3.8, 3.9, 4.0}