Skip to content

davidlevinwork/gbfs

Repository files navigation

GB-FS

gbfs is a comprehensive repository dedicated to advancing Graph-Based Feature Selection methodologies in machine learning. Our project houses two significant contributions to the field: GB-AFS and GB-BC-FS, each developed to address the intricate challenges of feature selection with graph-based solutions.

Downloads Downloads ci Status Tests Status

Table of contents

Our Contributions

  • GB-AFS (Graph-Based Automatic Feature Selection): A method that automates the process of feature selection for multi-class classification tasks, ensuring the minimal yet most effective set of features is utilized for model training.

  • GB-BC-FS (Graph-Based Budget-Constrained Feature Selection): Currently in development, this method seeks to enhance feature selection by integrating budget constraints, ensuring the cost of each feature is considered.

Installation

gbfs has been tested with Python 3.10.

pip

$ pip install gbfs 

Clone from GitHub

$ git clone https://github.com/davidlevinwork/gbfs.git && cd gbfs
$ poetry install
$ poetry shell

Usage

GB-AFS

Initialization

To begin working with GB-AFS, the first step is to initialize the GB-AFS object:

from gbfs import GBAFS

gbafs = GBAFS(
    dataset_path="path/to/your/dataset.csv",
    separability_metric="your_separability_metric",
    dim_reducer_model="your_dimensionality_reduction_method",
    label_column="class",
)

Feature-Selection

After initializing the GB-AFS object, you can move forward with the process of selecting features:

selected_features = gbafs.select_features()

print("Selected Feature Indices:", selected_features)

Visualization

GB-AFS also incorporates a technique for visualizing the chosen features within the feature space, offering insights into their distribution and how distinct they are:

gbafs.plot_feature_space()

GB-BC-FS

Initialization

To begin working with GB-AFS, the first step is to initialize the GB-AFS object:

from gbfs import GBAFS

gbbcfs = GBBCFS(
    dataset_path="path/to/your/dataset.csv",
    separability_metric="your_separability_metric",
    dim_reducer_model="your_dimensionality_reduction_method",
    label_column="class",
    budget=20,
    alpha=0.5,
    epochs=100,
)

Feature-Selection

After initializing the GB-BC-FS object, you can move forward with the process of selecting features:

selected_features = gbbcfs.select_features()

print("Selected Feature Indices:", selected_features)

Visualization

GB-BC-FS also incorporates a technique for visualizing the chosen features within the feature space, offering insights into their distribution and how distinct they are:

gbbcfs.plot_feature_space()

Documentation

For more information on available commands and usage, refer to the documentation.

Contribution

Contributions to gbfs are welcome! If you encounter any issues or have suggestions for improvements, please open an issue.

Citation

If you use this code in your research, please cite:

@article{levin2024gb,
  title={GB-AFS: graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette},
  author={Levin, David and Singer, Gonen},
  journal={Journal of Big Data},
  volume={11},
  number={1},
  pages={79},
  year={2024},
  publisher={Springer}
}