-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial ABCD method with RooParamtricHist #1002
Tutorial ABCD method with RooParamtricHist #1002
Conversation
To generate your own input data, run: | ||
|
||
``` | ||
python utils/produce_input_histograms_and_analyse.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be python3 (unfortunately, some systems will still default to python 2, for which this won't work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with python3 command in instructions
To run the workspace creation script: | ||
|
||
``` | ||
python utils/create_workspace.py -m 1500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about pyhton3.
But also there seems to be a small mismatch between this script (which looks for files under ./generated_histograms/
and the produce_input_histograms_and_analyze.py
script, which creates them in the current working directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have harmonized where files are saved and fetched in the code such that this is automatised and users have not to specify the path
print ("Reading histogram: ", hist_nameC) | ||
print ("Reading histogram: ", hist_nameD) | ||
histA = input_file.Get(hist_nameA) | ||
histA.SetDirectory(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a crash here when running over the signal file, because it seems the histogram is saved as A/h_sgn_mPhi_1500_A
, but the script is trying to read it as A/h_sgn_A
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solved issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still seeing the same issue. I don't see a change in either the name of the root histograms that are produced in the previous step or the ones that are checked for here. Did I miss an update, or maybe it didn't get committed?
|
||
|
||
``` | ||
python utils/create_datacards.py -m 1500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
``` | ||
|
||
The datacards can be combined then using the usual command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for these commands, they will first have to change directory into example_analysis/datacards/mPhi1500/
so this should either be included in the tutorial, or the scripts modified to create them in the working directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have added commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these added either. Are they pushed?
|
||
``` | ||
|
||
Using the output ```higgsCombineTest.FitDiagnostics.mH1500.root```, one can run the script ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/test/mlfitNormsToText.py``` to get the predictions for the normalizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this script needs the fitDiagnosticsTest.root
output file, rather than the one listed here. It might also be helpful to give the full command explicitlly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in the tutorial text
I went through your comments, and i have modified the tutorial accordingly. First, in the generation of the files i have fixed the seed such that we should all get the same results. Second, i have noticed a small issue in the datacards generation code that was creating the large r fitted value you were noticing, and i fixed it. I have added the results one should get to the tutorial and the new plots. Moreover i have added, as suggested, the commands to really reproduce each step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Cesare! Some of the issues are fixed, but I still had a few issues when running through the tutorial to be fixed.
## Generate input data | ||
<a id="inputs"></a> | ||
|
||
The histograms for the $z$ observable in the different regions A,B,C,D can be produced using the ```produce_input_histograms_and_analyse.py``` script in ```utils/produce_input_histograms_and_analyse.py```. In the script the expected rates for different signal hypotheses (as a function of $\Phi$ mass $m_{\Phi} \in \{1500, 2000, 3000, 4000, 5000 \}$ GeV) and the background yields are specified, as well as the distributions in $x,y,z$ of the signals and backgrounds. In the following steps of the tutorial we will just consider one of the mass points generated, $m_{\Phi} = 1500$ GeV, but the same analysis can be run separatelly on other mass points as well. In $x,y$, the signal and the background are assumed to be distributed as multivariate gaussians, with the background centred at $(0,2,0.2)$ in $(x,y)$ while the signals centred in the upper-right corner of the plane ($x,y>0.5$). For the $z$ feature, the background and the signal distributions are sampled from an exponential, for the signal the tails of the exponential get enhanced with the mass parameter $m_{\Phi}$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separatelly -> separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
print ("Reading histogram: ", hist_nameC) | ||
print ("Reading histogram: ", hist_nameD) | ||
histA = input_file.Get(hist_nameA) | ||
histA.SetDirectory(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still seeing the same issue. I don't see a change in either the name of the root histograms that are produced in the previous step or the ones that are checked for here. Did I miss an update, or maybe it didn't get committed?
|
||
``` | ||
|
||
The datacards can be combined then using the usual command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these added either. Are they pushed?
</details> | ||
|
||
|
||
Using the output ```fitDiagnosticsTest.mH1500.root```, one can run the script ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/test/mlfitNormsToText.py``` to get the predictions for the normalizations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output file in this case does not include the mass parameter, its just fitDiagnosticsTest.root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is strange, since i get exactly that outfile name running this command: combine -M FitDiagnostics combined_mPhi_1500_2018.txt -m 1500 --saveShapes --saveWithUncertainties --saveNormalizations
. I would expect the .mH1500.
part since the mass parameter is specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have updated, now is consistent with what you also get in terms of filename (was running with an older version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? The filenames produced for the files which contain the limit tree (which do always contain the mass parameter, but is not relevant here) do not follow the same output formatting as the fitDiagnostic file which saves the shapes and RooFitResults. The higgsCombineTest.FitDiagnostics.mH1500.root
file should also be produced but that just contains the limit tree, the fitDiagnosticsTest.root
file doesn't rely on the mass value and only uses the -n
argument.
Moreover, you can run the script in ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/longexercise/postFitPlot.py``` to get pre-fit and post-fit plots in the signal region (in the combined datacard ```ch4```): | ||
|
||
``` | ||
python3 $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/longexercise/postFitPlot.py --input_file fitDiagnosticsTest.mH1500.root --shape_type <shapes_type> --region <region> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same filename issue as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now changed, i was running with a slightly older version of combine where the mass was added in the name of the file
added uncommitted changes for create-workspace.py, updated documentation to match naming of files from most recent version of combine, updating results to match the ones from the latest version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Cesare! This is working for me now. I have one remaining minor comment. After that, I think this can be merged unless any one else has some comments.
It might be nice at some point to expand this to e.g. show how to setup different mass points with a single datacard using keywords. But I don't think that those kinds of future developments should stand in the way of including this as is for now.
|
||
``` | ||
|
||
The datacards will be created in the directory ```example_analysis/datacards/mPhi1500/``` inside the tutorial directory. The datacards can be combined then using the usual command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make sure to add a statement to go change into this directory (or add the cd command below, or both).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have updated the text as you suggested by adding explicit instruction to enter the directory
Pull request to propose a small tutorial to illustrate the application of RooParametricHist for a per-bin ABCD method in Combine.
Material added in the PR: