Created to call consensus SVs from the SV calls of GDAN DLBCL groups.
samples_file
,metadata
has to be generatedref
should be relinked to your local references
- Highly recommended that you install snakemake using the best practices in https://snakemake.readthedocs.io/en/stable/getting_started/installation.html
- Since I haven't set up a
setup.py
yet, you could just type the following after having installed snakemake.
pip install tabix numpy pandas pybedtools wgs_analysis
- You may be interested in fixing
CHROMS
,SOURCES
,SAMPLES
to your liking.
- Now the fun part! A snakemake dry-run is highly recommended.
bash run_snakemake.sh
- The output BEDPE files created by this pipeline has the following columns:
#chrom1
: chromosome of breakpoints 1start1
,end1
: position of breakpoint 1, in 0-based semi-noninclusive range (i.e. if a 1-based UCSC breakpoint ischr1:1
, the breakpoint coordinate in the BEDPE ischr1 0 1
chrom2
,start2
,end2
: coordinates of breakpoint 2name
: sources of the SV separated by__
score
: number of the sources of the SVstrand1
,strand2
: orientation of breakpoint 1 and 2, respectivelytype
: type of the SV, in {'DEL', 'DUP', 'INV', 'TRA'}gene1
,gene2
: gene annotation of breakpoint 1 and 2
- For
DEL
andDUP
- For breakpoint 1, the overlapping or the closest downstream gene that is encompassed by the deletion/duplication (i.e. has overlapping coordinates with the breakpoint pair interval) is annotated
- For breakpoint 2, the overlapping or the closest upstream gene that is encompassed by the deletion/duplication is annotated
- For
INV
andTRA
- If a breakpoint orientation is upstream (i.e. '+'), the overlapping or the closest upstream gene to the breakpoint is annotated
- If a breakpoint orientation is downstream (i.e. '-'), the overlapping or the closest downstream gene to the breakpoint is annotated