Skip to content

enriquea/fsspark

 
 

Repository files navigation

Python application Python Package using Conda

fsspark


Feature selection in Spark

Description

fsspark is a python module to perform feature selection and machine learning based on spark. Pipelines written using fsspark can be divided roughly in four major stages: 1) data pre-processing, 2) univariate filters, 3) multivariate filters and 4) machine learning wrapped with cross-validation (Figure 1).

Feature Selection flowchart Figure 1. Feature selection workflow example implemented in fsspark.

Documentation

The package documentation describes the data structures and features selection methods implemented in fsspark.

Installation

  • pip
git clone https://github.com/enriquea/fsspark.git
cd fsspark
pip install . -r requirements.txt
  • conda
git clone https://github.com/enriquea/fsspark.git
cd fsspark
conda env create -f environment.yml
conda activate fsspark-venv
pip install . -r requirements.txt

Maintainers

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%