Includes script and commands for generating datasets for putamen rat-only model, and putamen multi-species model with TACIT pipeline.
Followed with permulation methods for three different physical traits.
-
multiSpeciesModelPrediction
Using trainied putamen multi-species model to make predictions on OCRs of 224 mammals. -
negativeGCMatched
Generates random GC matched dataset as negatives using BiasAway. -
putamenMultiSpeciesModelDataset
Generates training, validation, evaluation used in multi-species (rat, macaque, bat) putamen model. -
phylogeneticPermulation
Find association between predicted enhancer activities and physical traits, including longevity(Maximum life span), total daily sleep, and vocal learning, with TACIT. -
ratModelDataset
Generates training, validation, evaluation used in rat putamen model. -
ratPutamenSVM: uses SVM to provide insights for CNN hyperparameter tuning.
-
humanPutamenDataset: Generates training, validation, evaluation used in human putamen model.
- TACIT pipelines: automate dataprocessing including filtering, plotting, and generating matrix with permulationList.py, plotPDist.R, bhCorrection.R, qValue.R, filterNFasta.py.
- ATAC seq helpers: automate mapping peaks from halLiftover and HALPER with halperMapPeak.sh. Ortholog conversion for bat sequences with batFormatConvert.sh.
- Model Prediction interpretation: automate testing p value distribution shift comparing prediction and literature with pValShiftTest.sh and plotWilcoxon.py
- Conda environment: keras2-tf27.yml used for CNN training/predictions. For other scripts, use hal.yml.
- Most scripts require: filterPeakName.py from OCROrthologPrediction; convertChromNames.py from TACIT.
- Phyloenetic permulation, requires TACIT.