Skip to content

The goal is to find the lab-of-origin for genetically engineered DNA with machine learning models

Notifications You must be signed in to change notification settings

mostafaalishahi/Genetic_engineering_attribution_challenge_2020

Repository files navigation

Genetic Engineering Attribution Challenge

This repository contains the code for the Genetic Engineering Attribution Challenge. The objective of this challenge is to predict the laboratory of origin for plasmid DNA sequences.

The competition saw 1211 competitors and our proposed CNN model was ranked 14 in the private leaderboard for the competition with a score of 0.9128 on the testset.

Note: Please look at the competition website for the data format

Executing the program

  1. Download data for the competition

  2. Configure the data directory and other desired parametes in the utils/config.py file

  3. Create n folds of the training data for K-fold Cross-Validation

    python utils/create_fold.py
    
  4. Train the model

    python engine.py
    
  5. Evaluate the model on validation set for n-folds

    python pred_val.py
    
  6. Make predictions for test_set

    python pred_test.py
    
  7. Create submission file

    python create_submission.py
    

    Note: The submission script finds the top10 labels and assigns equal probability among them to reduce the final file size.

Contact

For more details, please contact:

About

The goal is to find the lab-of-origin for genetically engineered DNA with machine learning models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published