Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: remove pybedtools as a dependency. This slightly changes the interpretation of BED files. From now on, the end position of each region will not be included. #184

Merged
merged 15 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .readthedocs_conda_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ dependencies:
- pytest
- pytest-cov
- numpy
- pybedtools
- matplotlib-base
- pysam
- scipy
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ RUN apt-get update && DEBIAN_FRONTEND="noninteractive" apt-get install -qqy \
unzip \
zlib1g-dev

RUN pip3 install pybedtools pyvcf scipy numpy statsmodels
RUN pip3 install pyvcf scipy numpy statsmodels

# Install samtools (needed to index reference fasta files)
RUN wget -O samtools-1.9.tar.bz2 https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
Expand Down
2 changes: 0 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,6 @@ checkout the branch you're interested in, and run the following command from the
pip install --upgrade pip
pip install -e .

Note: required package :code:`pybedtools` requires zlib. If you receive an error about a missing file :code:`zlib.h`, you can install on Ubuntu using :code:`sudo apt-get install zlib1g-dev` or CentOS using :code:`sudo yum install zlib-devel`.

Note: make sure TRTools is not installed in the environment via a different method before installing from source. :code:`which dumpSTR` should return nothing

Tools
Expand Down
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ dependencies = [
"matplotlib",
"numpy",
"pandas",
"pybedtools",
"pysam",
"scikit-learn",
"scipy",
Expand Down
26 changes: 20 additions & 6 deletions trtools/dumpSTR/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import os

import numpy as np
from pybedtools import BedTool
from pysam import TabixFile, asBed

import trtools.utils.common as common
import trtools.utils.tr_harmonizer as trh
Expand Down Expand Up @@ -263,7 +263,7 @@ def LoadRegions(self, filename):
common.WARNING("Could not find tabix index %s.tbi"%filename)
self.pass_checks = False
return
self.regions = BedTool(filename)
self.regions = TabixFile(filename, parser=asBed())
def __call__(self, record: trh.TRRecord):
interval = "%s:%s-%s"%(record.chrom, record.pos,
record.pos + record.ref_allele_length)
Expand All @@ -272,10 +272,24 @@ def __call__(self, record: trh.TRRecord):
interval2 = interval.replace("chr","")
else: interval2 = "chr"+interval
# Try with and without chr
tb1 = self.regions.tabix_intervals(interval)
if tb1.count() > 0: return self.name
tb2 = self.regions.tabix_intervals(interval2)
if tb2.count() > 0: return self.name
# A ValueError or StopIteration indicates that there weren't any matching
# regions within the given interval
try:
next(self.regions.fetch(region=interval, multiple_iterators=True))
except ValueError:
pass
except StopIteration:
pass
else:
return self.name
try:
next(self.regions.fetch(region=interval2, multiple_iterators=True))
except ValueError:
pass
except StopIteration:
pass
else:
return self.name
return None
def filter_name(self):
return self.name
Expand Down
2 changes: 1 addition & 1 deletion trtools/dumpSTR/tests/test_dumpSTR.py
Original file line number Diff line number Diff line change
Expand Up @@ -653,7 +653,7 @@ def test_beagle_disallowed_call_filters(tmpdir, vcfdir):

"""
These tests run dumpSTR and compare its output
to output that has been generated by a pervious version of
to output that has been generated by a previous version of
dumpSTR and saved in the repo. The results are expected
to be identical.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ FILTER:CALLRATE0.5 473
FILTER:HWE0.5 213
FILTER:HETLOW0.05 8005
FILTER:HETHIGH0.45 584
FILTER:foo_region 341
FILTER:foo_region 342
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ chr21 14383027 GANGSTR_437028 TTTTGTTGTTGTTGTTG TGTTGTTGTTGTTGTTG . HETLOW0.05;f
chr21 14384512 GANGSTR_437029 CACTCCACTC CACTCCATTC . foo_region INFRAME_PGEOM=0.95;INFRAME_UP=0.01;INFRAME_DOWN=0.01;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0.01;OUTFRAME_DOWN=0.01;START=14384512;END=14384521;PERIOD=5;NSKIP=0;NFILT=0;BPDIFFS=0;DP=244;DSNP=0;DSTUTTER=0;DFLANKINDEL=5;AN=6;REFAC=4;AC=2;HRUN=2;HET=0.444444;HWEP=0.588477 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS:FILTER 0|1:0|0:1:0.5:81:0:0:4:53.04|27.96:0|0:69.99:-2.47:74:-0.35:0|57:0|53:PASS 0|0:0|0:1:1:81:0:0:1:40.50|40.50:0|0:22.26:0:.:0:0|62:0|62:PASS 0|1:0|0:1:0.5:82:0:0:0:60.16|21.84:0|0:51.35:-5.14:75:-0.23:0|54:0|54:PASS
chr21 14390199 GANGSTR_437030 TGGTGAGTGTGTGTGTGTGT TAGTGAGTGTGTGTGTGTGT . HETLOW0.05;foo_region INFRAME_PGEOM=0.95;INFRAME_UP=0.01;INFRAME_DOWN=0.01;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0.01;OUTFRAME_DOWN=0.01;START=14390205;END=14390218;PERIOD=2;NSKIP=0;NFILT=0;BPDIFFS=0;DP=222;DSNP=0;DSTUTTER=2;DFLANKINDEL=7;AN=6;REFAC=3;AC=3;HRUN=2;HET=0;HWEP=1 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS:FILTER 0|1:0|0:1:0.5:77:0:1:2:48.14|28.86:0|0:66.66:-1.65:63:-0.96:-2|1;0|57:0|53:PASS 0|1:0|0:1:0.5:67:0:1:4:54.58|12.42:0|0:5.56:-8.49:54:-0.65:-2|1;0|50:0|42:PASS 0|1:0|0:1:0.5:78:0:0:1:54.02|23.98:0|0:44.75:-3.64:64:-0.4:0|55:0|53:PASS
chr21 14394189 GANGSTR_437031 ACAAAAACAAACAAACAAAC AAAACAAACAAACAAAC . CALLRATE0.5;foo_region;NO_CALLS_REMAINING INFRAME_PGEOM=0.59;INFRAME_UP=0.02;INFRAME_DOWN=0;OUTFRAME_PGEOM=0.69;OUTFRAME_UP=0;OUTFRAME_DOWN=0.01;START=14394193;END=14394208;PERIOD=4;NSKIP=0;NFILT=3;BPDIFFS=-3;DP=0;DSNP=0;DSTUTTER=0;DFLANKINDEL=0;AN=0;REFAC=0;AC=0;HRUN=5;HET=-1;HWEP=-1 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS:FILTER .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:NOCALL .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:NOCALL .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:NOCALL
chr21 14405799 GANGSTR_437032 ACCCACCCACCC ACCCGCCCACCC . HWE0.5;HETHIGH0.45 INFRAME_PGEOM=0.95;INFRAME_UP=0;INFRAME_DOWN=0;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0;OUTFRAME_DOWN=0;START=14405799;END=14405810;PERIOD=4;NSKIP=0;NFILT=0;BPDIFFS=0;DP=342;DSNP=0;DSTUTTER=0;DFLANKINDEL=1;AN=6;REFAC=3;AC=3;HRUN=3;HET=0.5;HWEP=0.25 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS:FILTER 0|1:0|0:1:0.5:115:0:0:1:59.46|55.54:0|0:179.46:-0.11:104:-0.16:0|87:0|86:PASS 0|1:0|0:1:0.5:109:0:0:0:53.35|55.65:0|0:152.82:-0.04:98:-0.53:0|83:0|81:PASS 0|1:0|0:1:0.5:118:0:0:0:74.97|43.03:0|0:123.77:-2.66:104:0:0|82:0|78:PASS
chr21 14405799 GANGSTR_437032 ACCCACCCACCC ACCCGCCCACCC . HWE0.5;HETHIGH0.45;foo_region INFRAME_PGEOM=0.95;INFRAME_UP=0;INFRAME_DOWN=0;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0;OUTFRAME_DOWN=0;START=14405799;END=14405810;PERIOD=4;NSKIP=0;NFILT=0;BPDIFFS=0;DP=342;DSNP=0;DSTUTTER=0;DFLANKINDEL=1;AN=6;REFAC=3;AC=3;HRUN=3;HET=0.5;HWEP=0.25 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS:FILTER 0|1:0|0:1:0.5:115:0:0:1:59.46|55.54:0|0:179.46:-0.11:104:-0.16:0|87:0|86:PASS 0|1:0|0:1:0.5:109:0:0:0:53.35|55.65:0|0:152.82:-0.04:98:-0.53:0|83:0|81:PASS 0|1:0|0:1:0.5:118:0:0:0:74.97|43.03:0|0:123.77:-2.66:104:0:0|82:0|78:PASS
aryarm marked this conversation as resolved.
Show resolved Hide resolved
chr21 14407537 GANGSTR_437033 AACAACAACAACAAC . . HETLOW0.05 INFRAME_PGEOM=1;INFRAME_UP=0;INFRAME_DOWN=0.01;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0;OUTFRAME_DOWN=0;START=14407537;END=14407551;PERIOD=3;NSKIP=0;NFILT=0;DP=384;DSNP=0;DSTUTTER=1;DFLANKINDEL=0;AN=6;REFAC=6;HRUN=2;HET=0;HWEP=1;AC=0 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS:FILTER 0|0:0|0:1:1:120:0:0:0:60.00|60.00:0|0:.:0:.:0:0|80:0|78:PASS 0|0:0|0:1:1:134:0:1:0:67.00|67.00:0|0:.:0:.:0:-3|1;0|101:-3|1;0|98:PASS 0|0:0|0:1:1:130:0:0:0:65.00|65.00:0|0:.:0:.:0:0|93:0|90:PASS
chr21 14410775 GANGSTR_437034 AGGAGGAGGAGGAGG . . HETLOW0.05 INFRAME_PGEOM=0.95;INFRAME_UP=0;INFRAME_DOWN=0;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0;OUTFRAME_DOWN=0;START=14410775;END=14410789;PERIOD=3;NSKIP=0;NFILT=0;DP=291;DSNP=0;DSTUTTER=0;DFLANKINDEL=1;AN=6;REFAC=6;HRUN=2;HET=0;HWEP=1;AC=0 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS:FILTER 0|0:0|0:1:1:89:0:0:0:44.50|44.50:0|0:.:0:.:0:-2|1;0|60:0|59:PASS 0|0:0|0:1:1:104:0:0:0:52.00|52.00:0|0:.:0:.:0:0|80:0|76:PASS 0|0:0|0:1:1:98:0:0:1:49.00|49.00:0|0:.:0:.:0:0|66:0|61:PASS
chr21 14411793 GANGSTR_437035 AATAAATAAATAAATAAATAAATAAATA AAATAAATAAATAAATAAATAAATAAATA,AAATAAATAAATAAATAAATAAATAAATAAATAAATA . CALLRATE0.5;NO_CALLS_REMAINING INFRAME_PGEOM=0.58;INFRAME_UP=0.05;INFRAME_DOWN=0.01;OUTFRAME_PGEOM=0.27;OUTFRAME_UP=0.04;OUTFRAME_DOWN=0.03;START=14411793;END=14411820;PERIOD=4;NSKIP=0;NFILT=3;BPDIFFS=1,9;DP=0;DSNP=0;DSTUTTER=0;DFLANKINDEL=0;AN=0;REFAC=0;AC=0,0;HRUN=3;HET=-1;HWEP=-1 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS:FILTER .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:NOCALL .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:NOCALL .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:NOCALL
Expand Down
Loading