Chapter 3 - Evaluation Tools

Part B - MultiSetEvaluator Basics and Usage

MultiSetEvaluator Basics

The MultiSetEvaluator is a tool that allows you to compare multiple sets of data against each other. It is useful for comparing the results of different algorithms, or for comparing the results of the same algorithm with different parameters. It is also useful for comparing the results of different data sets.

The MultiSetEvaluator is very similar to the GrooveEvaluator, with the exception that it can compare multiple sets of data at once.

Warning The MultiSetEvaluator can be computationally expensive, so it is recommended to use it after the training process

Note All codes used in this chapter are available here_

1. Prepare the sets used for cross comparison

The MultiSetEvaluator requires a dictionary containing GrooveEvaluator examples. The keys of the dictionary are the names of the sets, and the values are the GrooveEvaluator examples.

from eval.GrooveEvaluator import load_evaluator
from eval.MultiSetEvaluator import MultiSetEvaluator

# prepare input data
eval_1 = load_evaluator("demos/GrooveEvaluator/examples/test_set_full_robust_sweep_29.Eval.bz2")
eval_2 = load_evaluator("demos/GrooveEvaluator/examples/test_set_full_colorful_sweep_41.Eval.bz2")

groove_evaluator_sets={"Model 1": eval_1, "Model 2": eval_2, "Model 3": eval_2}

2. Initialization

The MultiSetEvaluator can be initialized by passing the dictionary of GrooveEvaluator examples to the constructor.

msEvaluator = MultiSetEvaluator(
    groove_evaluator_sets={"Model 1": eval_1, "Model 2": eval_2, "Model 3": eval_2}

The MultiSetEvaluator can be set-up such that only a subset of HVO_Sequence features are used for the comparison. This is done by passing a list of HVO_Sequence features to the ignore_feature_keys parameter.

Moreover, a number of flags are available to disable certain analysis included in the default mode of the MultiSetEvaluator. This means that, some methods can be used without specifying which analysis to perform, in these cases, the default mode is used.

msEvaluator = MultiSetEvaluator(
    groove_evaluator_sets={"Model 1": eval_1, "Model 2": eval_2, "Model 3": eval_2},
    ignore_feature_keys=["Statistical::NoI", "Statistical::Total Step Density"]
    need_pos_neg_hit_score_plots=True,  
    need_velocity_distribution_plots=True,
    need_offset_distribution_plots=True,
    need_inter_intra_pdf_plots=True,
    need_kl_oa_plots=True
)

3. Saving and Loading

# dump MultiSetEvaluator
msEvaluator.dump("demos/MultiSetEvaluator/misc/inter_intra_evaluator.MSEval.bz2")

# load MultiSetEvaluator
from eval.MultiSetEvaluator import load_multi_set_evaluator
msEvaluator = load_multi_set_evaluator("demos/MultiSetEvaluator/misc/inter_intra_evaluator.MSEval.bz2")

4. Available Analyzers

4.1 Accessing Evaluation Results

4.1.1 Inter-Intra Analysis (raw statistics, distribution plots and KL/OA Plots)

Raw Statistics

# save statistics
msEvaluator.save_statistics_of_inter_intra_distances(dir_path="demos/MultiSetEvaluator/misc/multi_set_evaluator")

>>> Saved statistics of inter intra distances to:  testers/MultiSetEvaluator/misc/multi_set_evaluator/GT_Model 1_Model 2_inter_intra_statistics.csv
>>> Saved statistics of inter intra distances to:  testers/MultiSetEvaluator/misc/multi_set_evaluator/GT_Model 1_Model 3_inter_intra_statistics.csv

Distribution Plots

# save inter intra pdf plots
iid_pdfs_bokeh = msEvaluator.get_inter_intra_pdf_plots(
    filename="demos/MultiSetEvaluator/misc/multi_set_evaluator/iid_pdfs.html")

KL/OA Plots

# save kl oa plots
KL_OA_plot = msEvaluator.get_kl_oa_plots(filename="demos/MultiSetEvaluator/misc/multi_set_evaluator")

4.2 Hit, Velocity, Offset Analysis

Hit Score Plots

# get pos neg hit score plots
pos_neg_hit_score_plots = msEvaluator.get_pos_neg_hit_score_plots(
    filename="demos/MultiSetEvaluator/misc/multi_set_evaluator/pos_neg_hit_scores.html")

Velocity Distribution Plots

# get velocity distribution plots
velocity_distribution_plots = msEvaluator.get_velocity_distribution_plots(
    filename="demos/MultiSetEvaluator/misc/multi_set_evaluator/velocity_distributions.html")

Offset Distribution Plots

# get offset distribution plots
offset_distribution_plots = msEvaluator.get_offset_distribution_plots(
    filename="demos/MultiSetEvaluator/misc/multi_set_evaluator/offset_distributions.html")

)

5. Compiling Results

Compile Bokeh Plots in a Dictionary

# get logging media
logging_media = msEvaluator.get_logging_media(identifier="Analysis X")

>>> logging_media
>>> {'pos_neg_hit_scores_plots': {'Analysis X': Tabs(id='66207', ...)},
     'offset_distribution_plots': {'Analysis X': Tabs(id='67274', ...)},
     'velocity_distribution_plots': {'Analysis X': Tabs(id='68341', ...)},
     'inter_intra_pdf_plots': {'Analysis X': Tabs(id='76918', ...)},
     'kl_oa_plots': {'Analysis X': Tabs(id='78057', ...)}}

Compile Some of the Bokeh Plots in a Dictionary

logging_media_partial = msEvaluator.get_logging_media(identifier="Analysis X", need_pos_neg_hit_score_plots=False)

>>> logging_media_partial
>>> {'offset_distribution_plots': {'Analysis X': Tabs(id='120757', ...)},
    'velocity_distribution_plots': {'Analysis X': Tabs(id='121824', ...)},
    'inter_intra_pdf_plots': {'Analysis X': Tabs(id='130401', ...)},
    'kl_oa_plots': {'Analysis X': Tabs(id='131540', ...)}}

Compile and Automatically Save Results

logging_media_and_saved = msEvaluator.get_logging_media(
    identifier="Analysis X",
    save_directory="demos/MultiSetEvaluator/misc/logging_media")

Compile for Logging in WandB and/or Automatically Save Results (with Bokeh Plots)

# get logging media for wandb
logging_media_wandb = msEvaluator.get_logging_media(
    identifier="Analysis X",
    save_directory="demos/MultiSetEvaluator/misc/logging_media",
    prepare_for_wandb=True, need_inter_intra_pdf_plots=False, need_kl_oa_plots=False,
    need_pos_neg_hit_score_plots=True, need_velocity_distribution_plots=True, need_offset_distribution_plots=True)

>>> logging_media_wandb
>>> {'pos_neg_hit_scores_plots': {'Analysis X': <wandb.sdk.data_types.html.Html at 0x12f4cad90>},
    'offset_distribution_plots': {'Analysis X': <wandb.sdk.data_types.html.Html at 0x133e9dfa0>},
    'velocity_distribution_plots': {'Analysis X': <wandb.sdk.data_types.html.Html at 0x1345db550>},
    'inter_intra_pdf_plots': {'Analysis X': <wandb.sdk.data_types.html.Html at 0x1345508e0>},
    'kl_oa_plots': {'Analysis X': <wandb.sdk.data_types.html.Html at 0x13526a610>}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3_multiseteval_demo.md

3_multiseteval_demo.md

Chapter 3 - Evaluation Tools

Part B - MultiSetEvaluator Basics and Usage

Table of Contents

MultiSetEvaluator Basics

1. Prepare the sets used for cross comparison

2. Initialization

3. Saving and Loading

4. Available Analyzers

4.1 Accessing Evaluation Results

4.1.1 Inter-Intra Analysis (raw statistics, distribution plots and KL/OA Plots)

Raw Statistics

Distribution Plots

KL/OA Plots

4.2 Hit, Velocity, Offset Analysis

Hit Score Plots

Velocity Distribution Plots

Offset Distribution Plots

5. Compiling Results

Compile Bokeh Plots in a Dictionary

Compile Some of the Bokeh Plots in a Dictionary

Compile and Automatically Save Results

Compile for Logging in WandB and/or Automatically Save Results (with Bokeh Plots)

Files

3_multiseteval_demo.md

Latest commit

History

3_multiseteval_demo.md

File metadata and controls

Chapter 3 - Evaluation Tools

Part B - MultiSetEvaluator Basics and Usage

Table of Contents

MultiSetEvaluator Basics

1. Prepare the sets used for cross comparison

2. Initialization

3. Saving and Loading

4. Available Analyzers

4.1 Accessing Evaluation Results

4.1.1 Inter-Intra Analysis (raw statistics, distribution plots and KL/OA Plots)

Raw Statistics

Distribution Plots

KL/OA Plots

4.2 Hit, Velocity, Offset Analysis

Hit Score Plots

Velocity Distribution Plots

Offset Distribution Plots

5. Compiling Results

Compile Bokeh Plots in a Dictionary

Compile Some of the Bokeh Plots in a Dictionary

Compile and Automatically Save Results

Compile for Logging in WandB and/or Automatically Save Results (with Bokeh Plots)