Learning kernel tests without data splitting

Code for the experiments of the paper "Learning kernel tests without data splitting". https://arxiv.org/abs/2006.02286 which will be presented at NeurIPS2020.

The implementations of the methods as described in the paper are in the directory 'methods'.

Installation

We strongly suggest you to install the package in a separate virtual environment. You can create one by executing

python -m venv --copies my_venv

from the root of the project and then activate it by running

. my_venv/bin/activate

You can then install the package as usual with the help of pip by calling

pip install .

or using the install target in the Makefile by simply running

make install

Computing p-values

If you want perform a two sample test on your own samples X and Y you can use the function pvalue() in tests-wo-split/methods/pvalue. A simple test of the validity of our method is to see whether the p-values are uniformly distributed under the null hypothesis (samples come from the same distribution).

Example: uniform distribution of p-values

import matplotlib.pyplot as plt
from tests_wo_split.methods.pvalue import pvalue
import numpy as np
runs = 1000
size = 1000
p = []
for i in range(runs):
    x = np.random.normal(0,1, size=size)
    y = np.random.normal(0,1, size=size)
    p.append(pvalue(x=x, y=y))
plt.hist(p)
plt.show()

Reproducing Figure 2

To reproduce our results of Figure 2 you can use the provided Makefile. Simply execute

make fig

from the root of the project. This will run all the experiments, render the figure and leave it as evaluation.pdf in the root of the project. You can run the experiments in parallel by calling this task as

make -j4 fig

The number after -j determines the number of parallel processes. It should be possible to run at least 4 processes on an average laptop.

The default setting is to reproduce the experiments for d=6 and the dataset 'diff_var'. To exactly reproduce the upper right plot of Figure 2, please set runs: 5000 in the config file config.yml (this increases the execution time linearly!). In order to create the other subplots, please un-comment the corresponding section of the config file. To asses type-I errors, change the parameter 'hypothesis' to 'null'.

Your own dataset

To test the method on your own distributions P and Q, go to the file 'config.yml' and set 'dataset' to 'own_dataet'. Further please go to 'datasets/generate_data.py' and specify how to draw samples from your custom distribution.

Author

Jonas Kübler, Empirical Inference Department - Max Planck Institute for Intelligent Systems

License

MIT License (see LICENSE.md)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
scripts		scripts
tests_wo_split		tests_wo_split
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
config.yml		config.yml
cover.png		cover.png
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning kernel tests without data splitting

Installation

Computing p-values

Example: uniform distribution of p-values

Reproducing Figure 2

Your own dataset

Author

License

Copyright

About

Releases 1

Packages

Contributors 3

Languages

License

MPI-IS/tests-wo-splitting

Folders and files

Latest commit

History

Repository files navigation

Learning kernel tests without data splitting

Installation

Computing p-values

Example: uniform distribution of p-values

Reproducing Figure 2

Your own dataset

Author

License

Copyright

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages