Skip to content

maalcantar/promoter_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Promoter discovery using natural language processing

This repo contains code for promoter discovery and biodiversity mining using natural language processing. This effort is being pursued through three specific aims:

  1. Aim 1: Develop natural processing-based model for promoter identification
  2. Aim 2: Extend model to identify inducible promoter sequences
  3. Aim 3: Experimentally validate promoter predictions

Promoter sequences were collected from three main databases: EPDnew, RegulonDB, DBTBS.

Directory structure

data

All data files are found in and/or will be written to data/

  • data/DBTBS/
    • Contains raw data from DBTBS: Bacillus subtilis promoter database
  • data/EPDnew/
    • Contains raw data from EPDnew: Eukaryote promoter database (promoter data for 15 different organisms
  • data/RegulonDB/
    • Contains raw data from RegulonDB: Escherichia coli promoter database
  • data/parsed_promoter_data/
    • Promoter data parsed from each database
  • data/20191114promoter_identification_ML_curation
    • Manually curated information on other state-of-the-art ML models for promoter prediction

src

All code are found in and/or will be written to src/ in either notebook or script form

Notebooks:

  • src/notebooks/20191125_promoter_database_parsing.ipynb
    • Notebook containing code for parsing promoter data

figs

All raw and edits figures will be writted to figs/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published