Skip to content

Contains research paper experiments and algorithmic implementation of the Class Separation Transformation (CST) novel ML technique

Notifications You must be signed in to change notification settings

richiebailey74/CST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CST

Class Separation Transformation (CST) is a novel machine learning technique that accomplishes the dual objcetive of significantly reducing the dimensionality of the feature space (down to one dimension) while separating classes as optimally as possible in "clusters" in that single dimension. CST does not preserve any sort of covariance or topologies in the feature space - the only goal is to reduce the dimensionality and so that the mapping provides optimal separation for classifiers. This technique solves a big issue many researchers face in machine learning - the input space reduction and accuracy tradeoff. With our technique, there is no tradeoff between these two properties. The technique is also extremely useful with respect to explainability in ML since the algorithm results in a learned transformation vector that we call f, where each weight corresponds to a particular feature in the original input space. This can lead to a greater understanding in how particular attributes contribute towards defining a features class, which can have enumerable use cases in many fields of research.

Base datasets used for the experiments in this repository can be found at https://treehousegenomics.soe.ucsc.edu/public-data/#tumor_v11_polyA

The zip file in the directory "CST/original_experiments_shell_scripts" contains the scripts that produce the results obtained in the original CST publication. If one wishes to reproduce these experiments, one should unzip the file and run the top level scripts. The zip file in the directory "CST/notebook_developing" contains all the Jupyter notebooks the coding contributors (Richie, Aisharjya, Aaditya) used for an interactive data science scripting environment to get a better feel of the data and tasks at hand.

The code logic is all contained in the "CST/business_logic" package, containing a few key components. The CST_implementation.py file in the algorithmic_implementation package contains the class implementation of CST. If any users or readers wish to use CST, use the class implementation there. An time and space optimized version of CST is available here: https://github.com/richiebailey74/HCST. To reproduce experiments used for the second CST publication (currently under review), one can execute the files in the "CST/business_logic" package called generate_base_experiments.py and generate_hypertune_experiments.py. These files call on the "CST/busineses_logic/experiments" package, which utilizes data produced from the "CST/business_logic/data_preprocessing" package.

There are produced figures in the "CST/figures" directory. The "CST/data" directory is where the downloaded data should be placed in order for the high level execution of experiments to load (cannot be uploaded to GitHub because files are too large).

Any questions about CST, please refer them to: [email protected]

About

Contains research paper experiments and algorithmic implementation of the Class Separation Transformation (CST) novel ML technique

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages