Code for the experiments of the paper "Learning kernel tests without data splitting". https://arxiv.org/abs/2006.02286 which will be presented at NeurIPS2020.
The implementations of the methods as described in the paper are in the directory 'methods'.
We strongly suggest you to install the package in a separate virtual environment. You can create one by executing
python -m venv --copies my_venv
from the root of the project and then activate it by running
. my_venv/bin/activate
You can then install the package as usual with the help of pip
by calling
pip install .
or using the install
target in the Makefile
by simply running
make install
If you want perform a two sample test on your own samples X and Y you can use the function pvalue()
in
tests-wo-split/methods/pvalue
.
A simple test of the validity of our method is to see whether the p-values are uniformly distributed
under the null hypothesis (samples come from the same distribution).
import matplotlib.pyplot as plt
from tests_wo_split.methods.pvalue import pvalue
import numpy as np
runs = 1000
size = 1000
p = []
for i in range(runs):
x = np.random.normal(0,1, size=size)
y = np.random.normal(0,1, size=size)
p.append(pvalue(x=x, y=y))
plt.hist(p)
plt.show()
To reproduce our results of Figure 2 you can use the provided
Makefile
. Simply execute
make fig
from the root of the project. This will run all the experiments,
render the figure and leave it as evaluation.pdf
in the root of the
project. You can run the experiments in parallel by calling this task as
make -j4 fig
The number after -j
determines the number of parallel processes. It
should be possible to run at least 4 processes on an average laptop.
The default setting is to reproduce the experiments for d=6
and the dataset 'diff_var'. To exactly reproduce the upper right plot
of Figure 2, please set runs: 5000
in the config file config.yml
(this increases the execution time linearly!). In order to create the
other subplots, please un-comment the corresponding section of the
config file. To asses type-I errors, change the parameter 'hypothesis'
to 'null'.
To test the method on your own distributions P and Q, go to the file 'config.yml' and set 'dataset' to 'own_dataet'. Further please go to 'datasets/generate_data.py' and specify how to draw samples from your custom distribution.
Jonas Kübler, Empirical Inference Department - Max Planck Institute for Intelligent Systems
MIT License (see LICENSE.md)
© 2020, Max Planck Society - Max Planck Institute for Intelligent Systems