Skip to content
/ ITCA Public

Information-theoretic classification accuracy: a criterion that guides data-driven combination of ambiguous outcome labels in multi-class classification

License

Notifications You must be signed in to change notification settings

JSB-UCLA/ITCA

Repository files navigation

ITCA - Guide the ambiguous outcome labels combination for multi-class classification

ITCA (Information-theoretic classification accuracy) is a criterion that guides data-driven combination of ambiguous outcome labels in multi-class classification (see ITCA documentation for detailed guides).

Installation

Requirements:

Install from PyPI by running (in the command line):

pip install itca

Install from source code:

   git clone https://github.com/JSB-UCLA/ITCA.git
   cd ITCA
   python setup.py install

ITCA is easy to use.

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from itca import itca, compute_y_dist, bidict, GreedySearch
# ===================  Data ========================
# `X` is the feature matrix, a 2D numpy array of size (n_smaples, n_features).
# `y_obs` is the observed labels, a 1D numpy array of size (n_samples, ) that takes values 
# in [0, 1, 2] (the observed classes number K0=3).
# The combination is represented by a bidirectional disctionary `bidict`.
# `bidict({0:0, 1:0, 2:1, 3:2})` indactes the mapping from the orignal labels to the combined labels.
true_combination = bidict({0:0, 1:0, 2:1, 3:2})
X1 = np.array([[0., 0.]]) + np.random.randn(200,  2)
X2 = np.array([[1.5, 1.5]]) + np.random.randn(200, 2)
X3 = np.array([[-1.5, 1.5]]) + np.random.randn(200, 2)
X = np.concatenate([X1, X2, X3])  # data matrix
y_true = np.concatenate([np.ones(200) * i for i in 
range(3)]).astype(int)            # true lables K^*=3
y_obs = true_combination.reverse_map(y_true) # observed labels K_0=4
#===========  Classsification algorithm =============
# `clf` can be any sklearn classifcation algorithm or any classifcation algorithm that implements 
# `clf.fit(X, y)` for fitting and `clf.predict(X)` for prediction.  
clf = LinearDiscriminantAnalysis()
# =================== Evaluate s-ITCA on the true combination ================
clf.fit(X, true_combination.map(y_obs))
y_pred = clf.predict(X)
itca(y_obs, y_pred, true_combination, compute_y_dist(y_obs))
# ============= Search class combination =============
gs = GreedySearch(class_type='ordinal')
gs.search(X, y_obs, clf, verbose=False, early_stop=True)
gs.selected # show the selected class combination
#>>>{0: 0, 1: 0, 2: 1, 3: 2}|ITCA=0.8807|

Please see the tutorial for more details.

Troubleshooting

For visualization, ITCA requires pygraphviz package. If you have trouble installing pygraphviz, please refer to pygraphviz for detailed installation guides. Please make sure that dot is in your PATH environment variable. If you are using Windows, please refer to Graphviz for detailed installation guides.

Citation

@article{JMLR:v23:21-1150,
  author  = {Chihao Zhang and Yiling Elaine Chen and Shihua Zhang and Jingyi Jessica Li},
  title   = {Information-theoretic Classification Accuracy: A Criterion that Guides Data-driven Combination of Ambiguous Outcome Labels in Multi-class Classification},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {341},
  pages   = {1--65},
  url     = {http://jmlr.org/papers/v23/21-1150.html}
}

Contribute

Contact

If you are having any issues, comments regarding this project, please feel free to contact [email protected]

About

Information-theoretic classification accuracy: a criterion that guides data-driven combination of ambiguous outcome labels in multi-class classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published