Range synchronization for histograms filled in parallel in auto-bin mode #902

gganis · 2017-08-29T10:57:13Z

When filling histograms without limits in parallel a problem to be addressed is how to make sure that the ranges are compatible for the final merging.
This PR proposes a technique based on a static reference list of TAxis, kept as a static in TH1, filled/used by the different threads. The first thread calculates the TAxis ranges and saves it into the list, the others use it. The list is protected by a RW lock .
The logic is implemented in TH1::BufferEmpty and holds for TH{1,2,3}, the specificity of each TH{1,2,3} being moved to a set of new member functions called by TH1::BufferEmpty.

The change in TH1Merger is required to calculate the axis and dump the internal buffers when the internal buffersize has not yet been reached. This treatment can perhaps be improved to get the same result of the single thread case.

The patch also implements the hook for a call back function to implement the same functionality in the case of multi-processing. A patch with adaptation to multiproc will follow.

The tutorial mt301_fillHistAutoBin.C illustrates the usage with TThreadedObject .

NB: many of the changes in TH1.h come from clang-format-{3.8, 3.9, 4.0}

To support the case where histograms without initial limits are filled in threads without reaching the buffersize . Takes advantage of the axes limits synchronization technique added in TH1.

For an easy way to printout number of bins and limits

This is implemented with a common reference list of list of TAxis for multi-threaded applications, and with support for a call back to the steering process for multi-process applications . The new member fNameForRanges is used to uniquely identify internally the relevant set of TAxis; at this purpose, the full name, including the directories is used. In order to cope with the use of temporary directories, for example, for threads (see TThreadedObject), a configurable list of dir tags can be filtered out of the name. The default list is "__TThreaded_dir_?" (used a regular expression), to cope with TThreadedObject. It can be changed or modified with the static TH1::SetStripOffDirs(const char *tags) were tags is in the form "[+]newtags" ('+' means 'newtags' is added to the existing ones). Or with env 'Hist.StripOffDirs' (same rule for the '+').

And its use with TThreadedObject .

phsft-bot · 2017-08-29T10:57:27Z

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

phsft-bot · 2017-08-29T12:28:19Z

Build failed on slc6/gcc62.
See console output.

Failing tests:

phsft-bot · 2017-08-29T12:35:54Z

Build failed on centos7/gcc49.
See console output.

Failing tests:

phsft-bot · 2017-08-29T12:41:46Z

Build failed on mac1012/native.
See console output.

Failing tests:

dpiparo · 2017-08-29T12:42:33Z

tutorials/multicore/mt301_fillHistAutoBin.C

+#include "TROOT.h"
+#include "TCanvas.h"
+#include <thread>
+#include <iostream>


Just a minor comment: We could perhaps try to put stl headers at the end of ROOT includes in order not to mask defects in our headers.

oh, forget about my comment. In principle macros must run without any includes: autoparsing should take care of inclusions. If this is not the case, either we forgot to select something (which means that auto-load/parse did not happen) or we have a bug in auto-loading/parsing...

Sure! Done.

dpiparo · 2017-08-29T12:43:35Z

tutorials/multicore/mt301_fillHistAutoBin.C

+   c->cd(3);
+   fh3d->DrawClone();
+
+   gROOTMutex = 0;


This is a remnant to be able to run more times in the same ROOT shell.
I will remove it, also because all our mt... tutorials suffer from the same problem, i.e. at the second run all threads get locked at TRoot::Append:

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fd50c469e42 in __GI___pthread_mutex_lock (mutex=0x3cd6060) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007fd50b096f58 in TPosixMutex::Lock (this=0x3cd6050) at /home/ganis/local/root/GIT/root/core/thread/src/TPosixMutex.cxx:75
#3 0x00007fd50b0881cb in TMutex::Lock (this=0x3d117e0) at /home/ganis/local/root/GIT/root/core/thread/src/TMutex.cxx:48
#4 0x00007fd50e0412a2 in TLockGuard::TLockGuard (this=0x7fd4fcd73100, mutex=0x3d117e0) at include/TVirtualMutex.h:77
#5 0x00007fd50dac67fe in TROOT::Append (this=0x7fd50dff3240 ROOT::Internal::GetROOT1()::alloc, obj=0x7fd4e00008c0, replace=false)
at /home/ganis/local/root/GIT/root/core/base/src/TROOT.cxx:1003
#6 0x00007fd501e1fca3 in TH1::Copy (this=0x48499e0, obj=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:2729
#7 0x00007fd501e3b82f in TH1F::Copy (this=0x48499e0, newth1=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:9426
#8 0x00007fd501e3b6ba in TH1F::TH1F (this=0x7fd4e00008c0, h=...) at /home/ganis/local/root/GIT/root/hist/hist/src/TH1.cxx:9411

phsft-bot · 2017-08-29T13:02:29Z

Build failed on ubuntu14/native.
See console output.

Failing tests:

phsft-bot · 2017-08-29T13:07:19Z

Build failed on slc6/gcc49.
See console output.

Failing tests:

phsft-bot · 2017-08-29T14:01:55Z

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

Put stl headers at the end of ROOT includes. Comment out 'gROOTMutex = 0' which is a trick to enable re-running in the same ROOT shell.

phsft-bot · 2017-08-29T14:21:31Z

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

dpiparo

Hi @gganis , yes this is an issue. You mean that running twice a macro you even load deadlocks like in this example below?

root [0] .L mt001_fillHistos.C
root [1] mt001_fillHistos()
(int) 0
root [2] mt001_fillHistos()

If yes, I can reproduce and @Axel-Naumann perhaps has a solution already...

dpiparo · 2017-08-29T12:45:07Z

tutorials/multicore/mt301_fillHistAutoBin.C

+#include "TROOT.h"
+#include "TCanvas.h"
+#include <thread>
+#include <iostream>


oh, forget about my comment. In principle macros must run without any includes: autoparsing should take care of inclusions. If this is not the case, either we forgot to select something (which means that auto-load/parse did not happen) or we have a bug in auto-loading/parsing...

dpiparo · 2017-08-29T12:49:21Z

hist/hist/src/TH1.cxx

+               fgRefSync = new THashList;
+            TList *raxl = (TList *)fgRefSync->FindObject(onm);
+            if (raxl) {
+               needaxes = (SetRangesFromList(raxl) != 0) ? kTRUE : kFALSE;


Isn't this a tautology?

Not sure whay you say so. It depends on the return value of SetRangesFromList, right?

dpiparo · 2017-08-29T12:56:16Z

hist/hist/inc/TH1.h

@@ -101,322 +104,380 @@ class TH1 : public TNamed, public TAttLine, public TAttFill, public TAttMarker {
    Double_t     *fIntegral;        ///<!Integral of bins used by GetRandom
    TVirtualHistPainter *fPainter;  ///<!pointer to histogram painter
    EBinErrorOpt  fBinStatErrOpt;   ///< option for bin statistical errors
+    void *fCallbackCtx;             ///<!Context for the function to be called back
+    CallbackFunc_t fCallbackFunc;   ///<!Function to be called back, for example to get ranges
+    TString fNameForRanges;         ///<Name to be used to identidy the range


I do not have a full picture of the changes yet, but does this need to be marked as persistent? TH1* have custom streamers anyway, but other new members were marked transient.

dpiparo · 2017-08-29T12:57:52Z

hist/hist/inc/TH1.h

@@ -52,6 +54,7 @@ class TCollection;
 class TVirtualFFT;
 class TVirtualHistPainter;

+typedef TList *(*CallbackFunc_t)(void *ctx, TList *buf);


Do we want fcn pointers for consistency or we can move to

using CallbackFunc_t = std::function<TList*(void*,TList*)>

?

Yes, I think move to that. Thanks.

dpiparo · 2017-08-29T12:58:35Z

hist/hist/inc/TH1.h

-   virtual void     UpdateBinContent(Int_t bin, Double_t content);
-   virtual Double_t GetBinErrorSqUnchecked(Int_t bin) const { return fSumw2.fN ? fSumw2.fArray[bin] : RetrieveBinContent(bin); }
+    static void *fgCallbackCtx;           ///<!Global context setting for the function to be called back
+    static CallbackFunc_t fgCallbackFunc; ///<!Global callback function setting, for example to get ranges


For static members I would remove the transient mark as it is not really defined in this context.

Yes, I just copied from the existing statics. I can remove that. Thanks.

dpiparo · 2017-08-29T12:59:52Z

hist/hist/inc/TH2.h

@@ -35,6 +35,9 @@ class TH2 : public TH1 {
   Double_t     fTsumwy2;         //Total Sum of weight*Y*Y
   Double_t     fTsumwxy;         //Total Sum of weight*X*Y

+   static void *fgCallbackCtx;           ///<!Global context setting for the function to be called back
+   static CallbackFunc_t fgCallbackFunc; ///<!Global callback function setting, for example to get ranges
+


See comments above :)

pcanal · 2017-08-29T14:27:22Z

hist/hist/inc/TH3.h

@@ -150,7 +159,7 @@ class TH3 : public TH1, public TAtt3D {
      return h.DoProject2D(name, title, projX,projY, computeErrors, originalRange, useUF, useOF);
   }

-   ClassDef(TH3,5)  //3-Dim histogram base class
+   ClassDef(TH3,6)  //3-Dim histogram base class


Note: You only need to increase the version number is the list of persistent data member from this class or one of its base classes changes.

Right. That was too quick.
I have reverted the patch, making sure that none of the new members is persistent.
Thanks!

pcanal · 2017-08-29T14:27:35Z

hist/hist/src/TAxis.cxx

+/// Print axis bins and ranges
+void TAxis::Print(Option_t *) const
+{
+   printf(" %s\t%s \tNbins= %d, \tmin= %g, \tmax=%g", GetName(), GetTitle(), GetNbins(), GetXmin(), GetXmax());


This must use Printf.

Sure, that was a typo.
Corrected. Thanks!

phsft-bot · 2017-08-29T14:29:43Z

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

Making sure that no new member is persistent.

phsft-bot · 2017-08-29T14:38:15Z

Starting build on centos7/gcc49, mac1012/native, slc6/gcc49, slc6/gcc62, ubuntu14/native with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

pcanal · 2017-08-29T14:53:48Z

I am a little concerned about the basic idea. If I understood correctly, there is a (unique) global registry where the histogram are identified based on their 'full path name' (beside the fact that GetNameForRanges seems both brittle and currently seems on first reading 'wrong'/'not-as-intended'). I see two major problems, one is that the 'full path name' may never be really unique i.e. it might be impossible to avoid synonyms that are semantically distincts ... a good example is two free standings (not attached to any directory) histograms that happens to have the same name in two distinct and independent part of the code (e.g. two CMSSW modules).

The other major problem is that it unnecessary tie (via that global mutex) all the 'parallel' histogram, this means that the overall scalability is inherently decreased.

Rather than a completely global state, wouldn't it make sense to have a wrapper object (for example TThreadedObject) be the holder of the lock and list for a single set of related histogram.
This would both reduce contention and guarantees that the histograms are really related.

A 3rd significant problem (but fixeable) is that the operation covered by the ReadWrite lock are not atomic (and/or trivial) and for the look of it could plausibly indirectly request the ROOT global lock and thus can lead to deadlocks (from some other code that hold the ROOT global locks and request the Write part of the ReadWrite lock).

A 4th deficiency is that once activated for one histogram it seems to apply to all histogram. i.e. as far as can tell if you one parallel histograms and 10,000 single-thread histograms, filling the single-thread histogram still has to go through the multi-thread registration/mechanism.

And that remind me, that another challenge for the 'unique registry' solution is to understand its scalability where reaching 10 to 100 thousands histograms.

Thanks,
Philippe.

gganis · 2017-08-29T15:36:17Z

Hi Philippe,

it might be impossible to avoid synonyms that are semantically distincts ... a good example is two free standings (not attached to any directory) histograms that happens to have the same name in two distinct and independent part of the code (e.g. two CMSSW modules).

Good point. I admit that did not really think to this case.

The other major problem is that it unnecessary tie (via that global mutex) all the 'parallel' histogram, this means that the overall scalability is inherently decreased.

Rather than a completely global state, wouldn't it make sense to have a wrapper object (for example TThreadedObject) be the holder of the lock and list for a single set of related histogram.
This would both reduce contention and guarantees that the histograms are really related.

I agree on this and TThreadedObject could be the place where to control this. The drawback is that we would then not have a solution outside TThreadedObject, i.e. we will have to find a way to force the use of TThreadedObject in MT cases.

A 3rd significant problem (but fixeable) is that the operation covered by the ReadWrite lock are not atomic (and/or trivial) and for the look of it could plausibly indirectly request the ROOT global lock and thus can lead to deadlocks (from some other code that hold the ROOT global locks and request the Write part of the ReadWrite lock).

Not sure to understand, you mean FindObject?
I will have a closer look.

A 4th deficiency is that once activated for one histogram it seems to apply to all histogram. i.e. as far as can tell if you one parallel histograms and 10,000 single-thread histograms, filling the single-thread histogram still has to go through the multi-thread registration/mechanism.

Ok, a solution 'per histogram' would also address this.

Thanks for going deep into it,
Gerri

pcanal · 2017-08-29T15:52:43Z

Not sure to understand, you mean FindObject?
I will have a closer look.

FindObject might take the lock (or might not depending on the container and the implementation of the containee's function that are called). 'Warning' for sure (sometimes) request the lock as if I recall correctly it uses TClass inside. etc...

I.e. if you are not using the ROOT main lock, you must be very careful of what is inside the locked section ... and as always might it as 'small' as possible.

pcanal · 2017-08-29T15:55:04Z

i.e. we will have to find a way to force the use of TThreadedObject in MT cases.

I am not sure what you mean :). We would be saying that if one want the internal implementation of MT histogram then you have to use this (or that) wrapper to enable it.

phsft-bot · 2017-08-29T16:16:46Z

Build failed on ubuntu14/native.
See console output.

Failing tests:

projectroot.roottest.root.dataframe.roottest_root_dataframe_misc

phsft-bot · 2017-08-29T16:18:54Z

Build failed on mac1012/native.
See console output.

Failing tests:

projectroot.roottest.root.dataframe.roottest_root_dataframe_misc

phsft-bot · 2017-08-29T16:20:53Z

Build failed on slc6/gcc49.
See console output.

Failing tests:

projectroot.roottest.root.dataframe.roottest_root_dataframe_misc

phsft-bot · 2017-08-29T16:21:48Z

Build failed on slc6/gcc62.
See console output.

Failing tests:

projectroot.roottest.root.dataframe.roottest_root_dataframe_misc

phsft-bot · 2017-08-29T16:30:45Z

Build failed on centos7/gcc49.
See console output.

Failing tests:

projectroot.roottest.root.dataframe.roottest_root_dataframe_misc

gganis · 2017-08-30T08:16:05Z

Hi,
After some more thinking, I believe we have to close this and rethink the all thing.
I found particularly tricky this point raised by Philippe:

a good example is two free standings (not attached to any directory) histograms that happens to have the same name in two distinct and independent part of the code (e.g. two CMSSW modules).

Supporting this case makes basically impossible to have an identifier for the histogram. In this moment I do not see how we can synchronize objects that we cannot somehow tag being together. In PROOF we somehow implicitly assumed that this could not happen (PROOF is not supporting it).

Rather than a completely global state, wouldn't it make sense to have a wrapper object (for example TThreadedObject) be the holder of the lock and list for a single set of related histogram.

This looked an appealing idea. However, it means that the member of a TThreadedObject has to know that is part of a TThreadedObject (which is not the case now) or that we should have a specialized TThreadedObject for histograms that does some settings on the histograms to steer the special behavior. And remains the fact that people will be forced to use a TThreadedObject (which may be ok).

Perhaps it is also worth to investigate if we can find an improved bin-finding algorithm that gives consistent binnings in the first place that can be merged. That would solve the problem at the roots.

Cheers,
Gerri

gganis · 2017-08-30T15:16:47Z

Changing strategy

gganis added 4 commits August 29, 2017 12:36

hist TH1Merger: move up call to EmptyBuffer

fe096e5

To support the case where histograms without initial limits are filled in threads without reaching the buffersize . Takes advantage of the axes limits synchronization technique added in TH1.

hist TAxis: add implentation of Print()

1668284

For an easy way to printout number of bins and limits

tutorials/multicore: add tutorial to illustrate histogram auto-binning

e05bcfd

And its use with TThreadedObject .

gganis requested review from couet and lmoneta as code owners August 29, 2017 10:57

dpiparo reviewed Aug 29, 2017

View reviewed changes

hist TH{1,2,3} : increase classDef version

4d49b86

tutorial/multicore : adjustements in mt301_fillHistAutoBin.C

a76d8ab

Put stl headers at the end of ROOT includes. Comment out 'gROOTMutex = 0' which is a trick to enable re-running in the same ROOT shell.

dpiparo reviewed Aug 29, 2017

View reviewed changes

pcanal reviewed Aug 29, 2017

View reviewed changes

hist TAxis: use Printf instead of printf

3bc6369

hist TH{1,2,3}: revert previous patch

62da717

Making sure that no new member is persistent.

gganis closed this Aug 30, 2017

phsft-bot mentioned this pull request Jun 4, 2020

[cxxmodules] Add initial implementation of the semantic GMI. #5094

Merged

phsft-bot mentioned this pull request Aug 26, 2020

Use FullName in TTree::GetLeaf. #6258

Merged

phsft-bot mentioned this pull request Jun 29, 2023

[RF] Add methods to create owning RooFit proxies via std::unique_ptr #12924

Merged

Range synchronization for histograms filled in parallel in auto-bin mode #902

Range synchronization for histograms filled in parallel in auto-bin mode #902

Conversation

gganis commented Aug 29, 2017

phsft-bot commented Aug 29, 2017

phsft-bot commented Aug 29, 2017

Failing tests:

phsft-bot commented Aug 29, 2017

Failing tests:

phsft-bot commented Aug 29, 2017

Failing tests:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phsft-bot commented Aug 29, 2017

Failing tests:

phsft-bot commented Aug 29, 2017

Failing tests:

phsft-bot commented Aug 29, 2017

phsft-bot commented Aug 29, 2017

dpiparo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phsft-bot commented Aug 29, 2017

phsft-bot commented Aug 29, 2017

pcanal commented Aug 29, 2017

gganis commented Aug 29, 2017

pcanal commented Aug 29, 2017

pcanal commented Aug 29, 2017

phsft-bot commented Aug 29, 2017

Failing tests:

phsft-bot commented Aug 29, 2017

Failing tests:

phsft-bot commented Aug 29, 2017

Failing tests:

phsft-bot commented Aug 29, 2017

Failing tests:

phsft-bot commented Aug 29, 2017

Failing tests:

gganis commented Aug 30, 2017

gganis commented Aug 30, 2017