koza4ok

The package contains scikit-learn to TMVA convertor called skTMVA. The idea is to save scikit-learn BDT model to the TMVA xml-file. This allows you to use scikit-learn model directly from TMVA. Once the model is trained and converted, scikit-learn library is not needed anymore! The classification task can be performed with TMVA/ROOT only. This is particularly useful within ATLAS framework where there is no scikit-learn installed. A user can train the classifier with scikit-learn on his laptop and later use in ATLAS framework converted to the TMVA xml-file.

Dependencies

ROOT (with TMVA package)
NumPy
scikit-learn

Installation

Basically just add koza4ok root directory to your PYTHONPATH enviroment variable. Or you can do this,

> source setup_koza4ok.sh

skTMVA converter

To convert BDT to TMVA xml-file, use the following method in your python code (see Examples),

convert_bdt_sklearn_tmva(bdt, [('var1', 'F'), ('var2', 'F')], 'bdt_sklearn_to_tmva_example.xml')

where

bdt is your scikit-learn trained model,
'[('var1', 'F'), ('var2', 'F')]' is the input variable description for TMVA. It consists of variable names and their basic types (e.g. 'F' is for float). Please note, that the ordering here must be the same as the order of columns in your numpy array,
bdt_sklearn_to_tmva_example.xml is the output TMVA xml-file

Supports: AdaBoost or Gradient Boosting decision trees for binary classification.

In terms of High-Energy Physics jargon, AdaBoost or Gradient Boosting BDTs for signal and background discrimination.

Example

You can play with our example. No input dataset is needed. The dataset is generated on-fly - both signal and background follow Gaussian distribution with different mean values (thanks to root_numpy, I steal this part of the code from them).

The example is devided in two pieces,

Training and converting

Depending on the type of boosting you prefer more, there are two scripts to test. Both of these train BDT with sklearn, then save it to TMVA xml-file and to a pickle file for scikit-learn,
- examples/bdt_sklearn_to_tmva_AdaBoost.py - AdaBoost
- examples/bdt_sklearn_to_tmva_Grad.py - Gradient Boosting
Validation

After the previous step, it's critical to insure that scikit-learn and TMVA give you the same classification predictions on a test dataset. The following script performs the converter validation,
- examples/validate_sklearn_to_tmva.py - build two ROC-curves: one from sklearn by extracting BDT from pickle file and another from TMVA by using the reader on the input TMVA xml file from previous stage

To run the example, in the command line change directory to examples folder, and run

AdaBoost:

> python bdt_sklearn_to_tmva_AdaBoost.py  
> python -i validate_sklearn_to_tmva.py

Gradient Boosting:

> python bdt_sklearn_to_tmva_Grad.py  
> python -i validate_sklearn_to_tmva.py

You should notice two files created - bdt_sklearn_to_tmva_example.pkl and bdt_sklearn_to_tmva_example.xml - the first one contains trained BDT model whereas the second one is TMVA xml-file. validate_sklearn_to_tmva.py uses these two files to produce and compare two ROC-curves that are produced by scikit-learn and TMVA correspondingly. Ideally, the ROC-curves should be drawn one on top of another. The pop-up window will show up with the ROC-curve comparison.

Contacts

For any question, suggestion or comment, please don't hesitate to contact me - https://web2.ph.utexas.edu/~ilchenko/index.html

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
examples		examples
mva_tools		mva_tools
skTMVA		skTMVA
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup_koza4ok.sh		setup_koza4ok.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

koza4ok

Dependencies

Installation

skTMVA converter

Example

Contacts

About

Releases

Packages

Languages

License

lukedeo/koza4ok

Folders and files

Latest commit

History

Repository files navigation

koza4ok

Dependencies

Installation

skTMVA converter

Example

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages