-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snakemake workflow #52
Merged
Changes from 5 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
0dde64c
created snakefile
rnmitchell b7d759c
cleaned up code and fixed rule running x2
rnmitchell 88eeb8c
fixed double run error
rnmitchell 85cadca
updated code to accomodate EFM different outputs [skip ci]
rnmitchell f114939
fixed failing tests
rnmitchell 1eff1e2
reformatted snakemake
rnmitchell 835f352
fixed main error
rnmitchell c4c4986
fixing errors [skip ci]
rnmitchell 5e34778
fixed str scripts
rnmitchell f638e21
fixed STR cli issue [skip ci]
rnmitchell 656de31
added command for copy config file to working dir
rnmitchell fbd4841
updated manifest and setup.py [skip ci]
rnmitchell 72ad24c
added target for cli [skip ci]
rnmitchell 9869f4d
added option to change config file using cl arguments
rnmitchell 52ea09c
updated clis [skip ci]
rnmitchell 9c32166
updating code to accomodate nocombine option [skip ci]
rnmitchell 09e6faa
updated test_repeat.py
rnmitchell 5f87672
updated format tests [skip ci]
rnmitchell f468e57
updated test_marker.py [skip ci]
rnmitchell eeee192
fixed all tests for test_suite.py [skip ci]
rnmitchell 913750a
fixed error in test file [skip ci]
rnmitchell f78b560
updated snakefile to extract sampleIDs from format or annot output file
rnmitchell 6a59b6e
updated snakefile to assign correct output name for filter step [skip…
rnmitchell ea3efa1
update tests [skip ci]
rnmitchell ec99cd7
fixed format and removed separate from annot.py [skip ci]
rnmitchell 82bc6ae
removed separate from config for annotate step [skip ci]
rnmitchell 176672e
updated config file [skip ci]
rnmitchell effe222
remove .DS_Store file [skip ci]
rnmitchell d5af9bd
updated README [skip ci]
rnmitchell 7cb34b9
add error message for snps workflow, update setup.py [skip ci]
rnmitchell a0bbbba
removed old argument from config.py [skip ci]
rnmitchell ea86844
changed all "annotate" to "convert" [skip ci]
rnmitchell 26433e0
updated default config [skip ci]
rnmitchell 1fec516
banishing annotate and anno in tests and test files [skip ci]
rnmitchell 83ddecf
missed files [skip ci]
rnmitchell bce9efc
NotImplementedError (change to trigger a CI build)
d483f8e
pinned pandas version and deselected snp tests
rnmitchell 93f9ed3
Merge branch 'snakemake' of https://github.com/bioforensics/lusSTR in…
rnmitchell abbfc30
added missing test files
rnmitchell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
%YAML 1.2 | ||
--- | ||
|
||
## general settings | ||
uas: True ## True/False; if ran through UAS | ||
sex: False ## True/False; include sex-chromosome STRs | ||
output: "test/output_test" ## output file/directory name; Example: "test_030923" | ||
|
||
##format settings | ||
samp_input: "/Users/rebecca.mitchell/Documents/Human/lusSTR/lusSTR/tests/data/UAS_bulk_input/" ## input directory or sample | ||
|
||
##annotate settings | ||
kit: "forenseq" ## forenseq/powerseq | ||
nocombine: False ## True/False; do not combine identical sequences (if using STRait Razor data) | ||
separate: False ## True/False; create individual files for each sample | ||
|
||
##filter settings | ||
output_type: "efm" ## strmix/efm | ||
profile_type: "evidence" ## evidence/reference | ||
data_type: "ce" ## ce/ngs | ||
info: True ## True/False; create allele information file | ||
filter_sep: False ##True/False; for EFM only, if True will create individual files for samples; if False, will create one file with all samples | ||
nofilters: False ##True/False; skip all filtering steps |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
import glob | ||
import openpyxl | ||
import os | ||
import pandas as pd | ||
from pathlib import Path | ||
import re | ||
|
||
|
||
configfile: "config.yaml" | ||
output_name = config["output"] | ||
input_name = config["samp_input"] | ||
software = config["output_type"] | ||
prof = config["profile_type"] | ||
data = config["data_type"] | ||
filter_sep = config["filter_sep"] | ||
|
||
|
||
def get_sample_IDs(input, uas, output, software, separate): | ||
file_ext = ".xlsx" if uas is True else ".txt" | ||
if software == "efm" and separate is False: | ||
return os.path.basename(output) | ||
else: | ||
if uas is True: | ||
if os.path.isdir(input): | ||
files = glob.glob(os.path.join(input, f"[!~]*{file_ext}")) | ||
else: | ||
files = input | ||
ID_list = get_uas_ids(files) | ||
else: | ||
if os.path.isdir(input): | ||
files = glob.glob(os.path.join(input, f"[!~]*{file_ext}")) | ||
else: | ||
files = input | ||
files = [sub.replace(dir, "") for sub in files] | ||
ID_list = [sub.replace(file_ext, "") for sub in files] | ||
return ID_list | ||
|
||
|
||
def get_uas_ids(files): | ||
samplelist = [] | ||
if isinstance(files, list): | ||
for filename in sorted(files): | ||
if "Sample Details" not in filename: | ||
continue | ||
sampleID = parse_sample_details(filename) | ||
samplelist.append(sampleID) | ||
else: | ||
samplelist = parse_sample_details(files) | ||
return samplelist | ||
|
||
|
||
def parse_sample_details(filename): | ||
file = openpyxl.load_workbook(filename) | ||
file_sheet = file["Autosomal STRs"] | ||
table = pd.DataFrame(file_sheet.values) | ||
sampleID = re.sub(" ", "_", table.iloc[2, 1]) | ||
return sampleID | ||
|
||
|
||
rule all: | ||
input: | ||
expand("{name}.csv", name=output_name), | ||
expand("{name}.txt", name=output_name), | ||
expand( | ||
"{outdir}/{samplename}_{prof_t}_{data_t}.csv", outdir=output_name, | ||
samplename=get_sample_IDs(input_name, config["uas"], output_name, software, | ||
filter_sep), prof_t=prof, data_t=data | ||
) | ||
|
||
|
||
rule format: | ||
input: | ||
expand("{samp_input}", samp_input=input_name) | ||
output: | ||
expand("{name}.csv", name=output_name) | ||
params: | ||
uas="--uas" if config["uas"] is True else "", | ||
sex="--include-sex" if config["sex"] is True else "" | ||
shell: | ||
"lusstr format '{input}' -o {output} {params.uas} {params.sex}" | ||
|
||
|
||
rule annotate: | ||
input: | ||
rules.format.output | ||
output: | ||
expand("{name}.txt", name=output_name) | ||
params: | ||
uas="--uas" if config["uas"] is True else "", | ||
sex="--include-sex" if config["sex"] is True else "", | ||
combine="--nocombine" if config["nocombine"] is True else "", | ||
separate="--separate" if config["separate"] is True else "", | ||
kit=config["kit"] | ||
shell: | ||
"lusstr annotate {input} -o {output} --kit {params.kit} {params.uas} {params.sex} " | ||
"{params.combine} {params.separate}" | ||
|
||
|
||
rule filter: | ||
input: | ||
rules.annotate.output | ||
output: | ||
expand( | ||
"{outdir}/{samplename}_{prof_t}_{data_t}.csv", outdir=output_name, | ||
samplename=get_sample_IDs(input_name, config["uas"], output_name, software, | ||
filter_sep), prof_t=prof, data_t=data | ||
) | ||
params: | ||
output_type=config["output_type"], | ||
profile_type=config["profile_type"], | ||
data_type=config["data_type"], | ||
output_dir=config["output"], | ||
info="--info" if config["info"] is True else "", | ||
filter_sep="--separate" if config["filter_sep"] is True else "", | ||
filters="--no-filters" if config["nofilters"] is True else "" | ||
shell: | ||
"lusstr filter {input} -o {params.output_dir} --output-type {params.output_type} " | ||
"--profile-type {params.profile_type} --data-type {params.data_type} {params.info} " | ||
"{params.filters} {params.filter_sep}" | ||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope.