Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python workflows #82

Merged
merged 89 commits into from
Jan 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
4a8ed65
modisco-lite-integration
panushri25 Dec 10, 2022
0ae25d8
removing logs from main, only keeping in branch
panushri25 Dec 10, 2022
e956291
integrating automatic shift detection scripts
panushri25 Dec 11, 2022
8df9590
updating docker with the bioconda pybigwig fix
panushri25 Dec 11, 2022
12a60b3
removed warning and changed data type to ATAC and DNASE
panushri25 Dec 11, 2022
a32fa70
added data directory files
panushri25 Dec 11, 2022
ebe8a06
fixed indirect import for meme and some file naming
panushri25 Dec 11, 2022
cc7f068
removed warning and added tomtom installation
panushri25 Dec 11, 2022
5f9e476
added pdf output for modisco reports
panushri25 Dec 11, 2022
c9ffd95
changed ATAC_PE to ATAC
panushri25 Dec 11, 2022
37e1de7
shift only end when + and start when - on strand, prevents negative v…
panushri25 Dec 11, 2022
727684d
remove print statements
panushri25 Dec 11, 2022
f79a708
adding chrombpnet_makebigwig output to logs
panushri25 Dec 12, 2022
72e7d8c
changing tutorial directory
panushri25 Dec 12, 2022
567ea74
changing chrombpnet_wo_bias.h5 to chrombpnet_nobias.h5
panushri25 Dec 12, 2022
a8ab3a2
changing chrombpnet_wo_bias.h5 to chrombpnet_nobias.h5
panushri25 Dec 12, 2022
44672c4
fixing typos
panushri25 Dec 12, 2022
95683fc
added flags for input and fixed type in step 6
panushri25 Dec 12, 2022
6054e2d
moved tutorial to a different folder
panushri25 Dec 12, 2022
73d012d
updated shorted readme
panushri25 Dec 12, 2022
786bccf
testing model pdf display
panushri25 Dec 12, 2022
0f08aa9
new pdf chrombpnet image
panushri25 Dec 12, 2022
6c204ad
README update, fix syntax in train_chrombpnet workflow script, add co…
panushri25 Dec 12, 2022
07173ee
Added outline for two main workflow commands in README
panushri25 Dec 12, 2022
b4c8778
Added hyperlinks to input file formats
panushri25 Dec 12, 2022
5a2bec8
hyperlink fragment, tagaligns and full documentation
panushri25 Dec 12, 2022
c7c9adc
workflow to input fragment and tagaligns too
panushri25 Dec 12, 2022
10334ae
Documentation for bias model training added
panushri25 Dec 12, 2022
35a0d88
png files with better names
panushri25 Dec 13, 2022
3009973
added bias threshold factor
panushri25 Dec 13, 2022
d108146
module for reference data and associated references, version bump to 1.3
annashcherbina Dec 14, 2022
0f2a1d4
meme_motif_file typo fix
annashcherbina Dec 14, 2022
3939650
updated README to reference pip installation
annashcherbina Dec 14, 2022
20d9309
can't publish to pypi with git url in requirements.txt, commenting ou…
annashcherbina Dec 14, 2022
088c46c
fix typo
annashcherbina Dec 14, 2022
a2251dd
updated README to point to latest docker tag
annashcherbina Dec 14, 2022
3843509
making sure data text/tsv files get included in pypi release
annashcherbina Dec 14, 2022
b3f423d
fixing data reference in step5
annashcherbina Dec 15, 2022
f62c7b2
pwm args
annashcherbina Dec 15, 2022
7c2ce4a
reverting to original
panushri25 Dec 15, 2022
113d2a5
reverting to original and arguments compatibility and main args input
panushri25 Dec 15, 2022
f72b0ba
chrombpnet main and parser
panushri25 Dec 15, 2022
b9f4caa
changin how args is input to main to enable function calls
panushri25 Dec 15, 2022
eda525d
removing old calls and adding only main chrombpnet
panushri25 Dec 15, 2022
b4c1a57
removed threshold from readme
panushri25 Dec 15, 2022
3ce66f0
workflow updates to enable input arguments
panushri25 Dec 15, 2022
fcc989e
fixed meme file path and moving a path
panushri25 Dec 16, 2022
338b783
epochs default changed back
panushri25 Dec 16, 2022
aa37f9a
changed directory creation step
panushri25 Dec 16, 2022
c996ddc
added notes on how to use bias threshold
panushri25 Dec 16, 2022
10b8474
fixed jsd metric to compare with worst case
panushri25 Dec 16, 2022
a0be8d9
Updated with python command line arguments
panushri25 Dec 16, 2022
262bfe5
Update README.md
panushri25 Dec 16, 2022
8f841f8
ifrag input format corrected and added comments
panushri25 Dec 16, 2022
4c6725a
generate summary reports
panushri25 Dec 19, 2022
99d8349
updated pipeline to generate reports
panushri25 Dec 19, 2022
fb0d361
added file prefix as an additional argument
panushri25 Dec 19, 2022
a315d8d
non peaks metrics for bias model pipeline
panushri25 Dec 19, 2022
94be5fc
restructuring to move pipelines to a different folder, added chrombpn…
panushri25 Dec 20, 2022
4a04ed6
seperated reports for training, qc and pipeline
panushri25 Dec 20, 2022
9d4e909
added output prefix
panushri25 Dec 20, 2022
2f0f7cd
added gc matching to python command line
panushri25 Dec 20, 2022
20b75d5
added init file for scripting
panushri25 Dec 20, 2022
5bb3f3a
adding metrics evaluation
panushri25 Dec 20, 2022
f8e2259
adding script to generate splits
panushri25 Dec 20, 2022
baa6eab
removing tutorial and gc matched negatives from setup
panushri25 Dec 20, 2022
9226200
1.3 version fix
panushri25 Dec 20, 2022
705eb3e
removed counts interpret and modisoc in the chrombpnet pipeline to re…
panushri25 Dec 21, 2022
22bc12e
reproducible gc matching nonpeaks with seed
panushri25 Dec 21, 2022
aabb4f1
downgrading weasyprint to not require recent versions of pango
annashcherbina Dec 22, 2022
748eb5e
added tn5 and dnase pwms to meme database
panushri25 Dec 22, 2022
f1adee7
fixing imports for modisco reports
panushri25 Dec 22, 2022
403388d
fixed stride to have -st
panushri25 Dec 22, 2022
97a049e
moved modisco html directory to evaluation
panushri25 Dec 22, 2022
85295a2
updated readme with latest command
panushri25 Jan 2, 2023
90d9e80
added input arguments link
panushri25 Jan 2, 2023
2770d0f
added input arguments link
panushri25 Jan 2, 2023
943264c
shorted meme tn5 motifs
panushri25 Jan 13, 2023
56d4e07
fixing tabs
panushri25 Jan 13, 2023
ba71212
removing redundant json file saving
panushri25 Jan 13, 2023
cfa0a28
added more detailed comments in the report, added warning for bias co…
panushri25 Jan 13, 2023
ebb3312
added html_prefix argument
panushri25 Jan 13, 2023
4a79a5f
increase bias model performance in peaks check, print statement to sh…
panushri25 Jan 13, 2023
929302d
removed findfont warnings, typos in printing, import fix
panushri25 Jan 14, 2023
14fdd37
updated modisco lite requirement that reduces findfont error
panushri25 Jan 14, 2023
4213b22
Fixed different syntax for uncorrected
panushri25 Jan 24, 2023
1c616b9
added html prefixing for bias model
panushri25 Jan 24, 2023
2887b56
updating setup file
panushri25 Jan 24, 2023
2b3bf6e
updated version to base
panushri25 Jan 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## Version - 1.3 - Inworks - 2022-12-11
- Changed pipelines to use modisco-lite, old modisco will soon be removed
- Added automatic shift scripts to repo and integrated with the pipeline
- modisco now outputs both html and pdf. pdf can be shared with anyone.
- Use ATAC, DNASE not ATAC_PE, DNASE_SE anymore
- Simplyfing workflows to include only two main workflows, chrombpnet_train_tf_model, chrombpnet_train_bias_model
- Restructuring README, moving tutorial to additional documentation and introducing FAQ, and only two pipelines (chrombpnet_train_tf_model, chrombpnet_train_bias_model)

## [Unreleased] - 2022-02-28
- Typo fix - (PR#31-36)
- We can now specify ylimit in marginal footprinting plots (PR#27)
Expand Down
10 changes: 3 additions & 7 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
include requirements.txt
include step1_download_bams_and_peaks.sh
include step2_make_bigwigs_from_bams.sh
include step3_get_background_regions.sh
include step4_train_bias_model.sh
include step5_interpret_bias_model.sh
include step6_train_chrombpnet_model.sh
include step7_interpret_chrombpnet_model.sh
include workflows/*
include tests/*
include chrombpnet/data/*
365 changes: 167 additions & 198 deletions README.md

Large diffs are not rendered by default.

180 changes: 180 additions & 0 deletions chrombpnet/CHROMBPNET.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
import chrombpnet.parsers as parsers
import os
from chrombpnet.data import DefaultDataFile, get_default_data_path
from chrombpnet.data import print_meme_motif_file
import chrombpnet.pipelines as pipelines
import copy
import pandas as pd
import logging
logging.getLogger('matplotlib.font_manager').disabled = True


# invoke pipeline modules based on command

def main():
args = parsers.read_parser()

if args.cmd == "pipeline" or args.cmd == "train":
os.makedirs(os.path.join(args.output_dir,"logs"), exist_ok=False)
os.makedirs(os.path.join(args.output_dir,"auxiliary"), exist_ok=False)
os.makedirs(os.path.join(args.output_dir,"models"), exist_ok=False)
os.makedirs(os.path.join(args.output_dir,"evaluation"), exist_ok=False)

pipelines.chrombpnet_train_pipeline(args)

elif args.cmd == "qc":
os.makedirs(os.path.join(args.output_dir,"auxiliary"), exist_ok=False)
os.makedirs(os.path.join(args.output_dir,"evaluation"), exist_ok=False)

pipelines.chrombpnet_qc(args)

elif args.cmd == "bias":
if args.cmd_bias == "pipeline" or args.cmd_bias == "train":
os.makedirs(os.path.join(args.output_dir,"logs"), exist_ok=False)
os.makedirs(os.path.join(args.output_dir,"auxiliary"), exist_ok=False)
os.makedirs(os.path.join(args.output_dir,"models"), exist_ok=False)
os.makedirs(os.path.join(args.output_dir,"evaluation"), exist_ok=False)

pipelines.train_bias_pipeline(args)

elif args.cmd_bias == "qc":
os.makedirs(os.path.join(args.output_dir,"auxiliary"), exist_ok=False)
os.makedirs(os.path.join(args.output_dir,"evaluation"), exist_ok=False)

pipelines.bias_model_qc(args)

else:
print("Command not found")



elif args.cmd == "pred_bw":

assert (args.bias_model is None) + (args.chrombpnet_model is None) + (args.chrombpnet_model_nb is None) < 3, "No input model provided!"
import chrombpnet.evaluation.make_bigwigs.predict_to_bigwig as predict_to_bigwig

predict_to_bigwig.main(args)

elif args.cmd == "contribs_bw":

import chrombpnet.evaluation.interpret.interpret as interpret
interpret.main(args)
import chrombpnet.evaluation.make_bigwigs.importance_hdf5_to_bigwig as importance_hdf5_to_bigwig
if "counts" in args.profile_or_counts:
args_copy = copy.deepcopy(args)
args_copy.hdf5 = args_copy.output_prefix + ".counts_scores.h5"
args_copy.output_prefix = args.output_prefix + ".counts_scores"

importance_hdf5_to_bigwig.main(args_copy)
if "profile" in args.profile_or_counts:
args_copy = copy.deepcopy(args)
args_copy.hdf5 = args_copy.output_prefix + ".profile_scores.h5"
args_copy.output_prefix = args.output_prefix + ".profile_scores"

importance_hdf5_to_bigwig.main(args_copy)

elif args.cmd == "footprints":

import chrombpnet.evaluation.marginal_footprints.marginal_footprinting as marginal_footprinting
marginal_footprinting.main(args)

elif args.cmd == "snp_score":

import chrombpnet.evaluation.variant_effect_prediction.snp_scoring as snp_scoring
snp_scoring.main(args)

elif args.cmd == "modisco_motifs":
import chrombpnet
chrombpnet_src_dir = os.path.dirname(chrombpnet.__file__)
meme_file=get_default_data_path(DefaultDataFile.motifs_meme)

modisco_command = "modisco motifs -i {} -n 50000 -o {} -w 500".format(args.h5py,args.output_prefix+"_modisco.h5")
os.system(modisco_command)
modisco_command = "modisco report -i {} -o {} -m {}".format(args.output_prefix+"_modisco.h5",args.output_prefix+"_reports",meme_dir)
os.system(modisco_command)

import chrombpnet.evaluation.modisco.convert_html_to_pdf as convert_html_to_pdf
convert_html_to_pdf.main(args.output_prefix+"_reports/motifs.html",args.output_prefix+"_reports/motifs.pdf")

elif args.cmd == "prep":

if args.cmd_prep == "nonpeaks":

assert(args.inputlen%2==0) # input length should be a multiple of 2

os.makedirs(args.output_prefix+"_auxiliary/", exist_ok=False)

from chrombpnet.helpers.make_gc_matched_negatives.get_genomewide_gc_buckets.get_genomewide_gc_bins import get_genomewide_gc
get_genomewide_gc(args.genome,args.output_prefix+"_auxiliary/genomewide_gc.bed",args.inputlen, args.stride)

# get gc content in peaks
import chrombpnet.helpers.make_gc_matched_negatives.get_gc_content as get_gc_content
args_copy = copy.deepcopy(args)
args_copy.input_bed = args_copy.peaks
args_copy.output_prefix = args.output_prefix+"_auxiliary/foreground.gc"
get_gc_content.main(args_copy)

# prepare candidate negatives

exclude_bed = pd.read_csv(args.peaks, sep="\t", header=None)
os.system("bedtools slop -i {peaks} -g {chrom_sizes} -b {flank_size} > {output}".format(peaks=args.peaks,
chrom_sizes=args.chrom_sizes,
flank_size=args.inputlen//2,
output=args.output_prefix+"_auxiliary/peaks_slop.bed"))
exclude_bed = pd.read_csv(args.output_prefix+"_auxiliary/peaks_slop.bed", sep="\t", header=None, usecols=[0,1,2])

if args.blacklist_regions:
os.system("bedtools slop -i {blacklist} -g {chrom_sizes} -b {flank_size} > {output}".format(blacklist=args.blacklist_regions,
chrom_sizes=args.chrom_sizes,
flank_size=args.inputlen//2,
output=args.output_prefix+"_auxiliary/blacklist_slop.bed"))

exclude_bed = pd.concat([exclude_bed,pd.read_csv(args.output_prefix+"_auxiliary/blacklist_slop.bed",sep="\t",header=None, usecols=[0,1,2])])

exclude_bed.to_csv(args.output_prefix+"_auxiliary/exclude_unmerged.bed", sep="\t", header=False, index=False)
os.system("bedtools sort -i {inputb} | bedtools merge -i stdin > {output}".format(inputb=args.output_prefix+"_auxiliary/exclude_unmerged.bed",
output=args.output_prefix+"_auxiliary/exclude.bed"))



bedtools_command = "bedtools intersect -v -a {genomewide_gc} -b {exclude_bed} > {candidate_bed}".format(
genomewide_gc=args.output_prefix+"_auxiliary/genomewide_gc.bed",
exclude_bed=args.output_prefix+"_auxiliary/exclude.bed",
candidate_bed=args.output_prefix+"_auxiliary/candidates.bed")
os.system(bedtools_command)

# get final negatives
import chrombpnet.helpers.make_gc_matched_negatives.get_gc_matched_negatives as get_gc_matched_negatives
args_copy = copy.deepcopy(args)
args_copy.candidate_negatives = args.output_prefix+"_auxiliary/candidates.bed"
args_copy.foreground_gc_bed = args.output_prefix+"_auxiliary/foreground.gc.bed"
args_copy.output_prefix = args.output_prefix+"_auxiliary/negatives"

get_gc_matched_negatives.main(args_copy)

negatives = pd.read_csv(args.output_prefix+"_auxiliary/negatives.bed", sep="\t", header=None)
negatives[3]="."
negatives[4]="."
negatives[5]="."
negatives[6]="."
negatives[7]="."
negatives[8]="."
negatives[9]=1057
negatives.to_csv(args.output_prefix+"_negatives.bed", sep="\t", header=False, index=False)

elif args.cmd_prep == "splits":
import chrombpnet.helpers.make_chr_splits.splits as splits
splits.main(args)

else:
print("Command not found")

else:
print("Command not found")


if __name__=="__main_-":
main()



2 changes: 0 additions & 2 deletions chrombpnet/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
print("WARNING: IF upgrading from v1.0 or v1.1 to v1.2, note that chrombpnet has undergone linting to generate a modular structure for release on pypi."
"Hard-coded script paths are no longer necessary. Please refer to the updated README to ensure your script calls are compatible with v1.2")
22 changes: 22 additions & 0 deletions chrombpnet/data/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from importlib import resources
from enum import Enum

class DefaultDataFile(Enum):
atac_ref_motifs = "ATAC.ref.motifs.txt"
dnase_ref_motifs = "DNASE.ref.motifs.txt"
motif_to_pwm_atac = "motif_to_pwm.ATAC.tsv"
motif_to_pwm_dnase = "motif_to_pwm.DNASE.tsv"
motif_to_pwm_tf = "motif_to_pwm.TF.tsv"
motifs_meme = "motifs.meme.txt"


def get_default_data_path(default_data_file_entry):
with resources.path("chrombpnet.data", default_data_file_entry.value) as f:
data_file_path=f
return data_file_path

def print_meme_motif_file():
with resources.path("chrombpnet.data", DefaultDataFile.motifs_meme.value) as f:
data_file_path=f
print(f)

File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading