This repository contains the code for the Genetic Engineering Attribution Challenge. The objective of this challenge is to predict the laboratory of origin for plasmid DNA sequences.
The competition saw 1211 competitors and our proposed CNN model was ranked 14 in the private leaderboard for the competition with a score of 0.9128 on the testset.
Note: Please look at the competition website for the data format
-
Download data for the competition
-
Configure the data directory and other desired parametes in the utils/config.py file
-
Create n folds of the training data for K-fold Cross-Validation
python utils/create_fold.py
-
Train the model
python engine.py
-
Evaluate the model on validation set for n-folds
python pred_val.py
-
Make predictions for test_set
python pred_test.py
-
Create submission file
python create_submission.py
Note: The submission script finds the top10 labels and assigns equal probability among them to reduce the final file size.
For more details, please contact:
- Seyedmostafa Sheikhalishahi: [email protected]
- Vevake Balaraman: [email protected]