This repo tries to reproduce scikit-learn tfidf in different machines.
Machine 1 (AKA aimore_machine)
Machine 2 (AKA marc_machine)
yes | conda create -n reproduce_tfidf python=3.8;
conda activate reproduce_tfidf;
pip install pip-tools;
pip-sync requirements.txt;
To download the data for the featurizer:
cd data/
dvc pull
There are two python codes in notebooks:
Generate_features.py
: Based on the downloaded data with dvc creates the features and saves it atdata/features/
python notebooks/Generate_features.py
Compare_features.py
: Compare two features. Make sure to point to the features (change the path inside the code)
python notebooks/Compare_features.py