XenoSplit is a fast computational tool to detect the true origin of the graft RNA-Seq and DNA-Seq libraries prior to profiling of patient-derived xenografts (PDXs). To classify ambiguous reads from a PDX experiment, XenoSplit operates on host and graft BAM files and computes a "goodness of mapping" metric using CIGAR strings and edit distances. XenoSplit is compatible with Subread, Bowtie2, STAR, Subjunc, TopHat2 and BWA, and is freely available under the terms of GNU General Public License v2.0.
Download XenoSplit repository from GitHub in a chosen location:
git clone https://github.com/goknurginer/XenoSplit.git
This will create a library folder called XenoSplit in the current location.
Go into the XenoSplit folder and view the help page as below:
python xenosplit.py --help
or specify the location yourself by replacing path
in the following code with your home directory, i.e. user/Documents
.
python path/XenoSplit/xenosplit.py --help
This will output the following help page:
usage: xenosplit.py [-h] [--count] [--min MIN] [--out OUT]
[--aligner {subread,subjunc,star,tophat,bowtie,bwa}]
[--pairedEnd]
graft host
Compare two BAM files for the same reads and allocate them into new files
based on matches
positional arguments:
graft graft BAM file
host host BAM file
optional arguments:
-h, --help show this help message and exit
--count switch to reporting mode (default: False)
--min MIN minimum difference in matches for assignment to either
file (default: 1)
--out OUT graft output BAM file (default: graftOut.bam)
--aligner {subread,subjunc,star,tophat,bowtie,bwa}
aligner type (default: subread)
--pairedEnd switch to pairedEnd mode so the mapping scores will be
computed using pairs (default: False)
-
XenoSplit requires both mapped and unmapped reads to be reported in one BAM file. Therefore, while working with STAR and TopHat2 the following steps should be followed:
-
Warnings when aligning with STAR: STAR does not report the unmapped reads unless the option
--outSAMunmapped Within
is set during the alignment. Set this option during the alignment with STAR. -
Warnings when aligning with TopHat2: TopHat2 reports mapped (accepted_hits.bam) and unmapped (unmapped.bam) reads in separate files. Moreover, accepted_hits.bam file comprises secondary and supplementary reads. Prior to running XenoSplit, these reads should be removed and accepted_hits.bam and unmapped.bam should be merged. samtools can be used
-
to remove secondary and supplementary reads from accepted.bam files:
samtools view -h -F 256 -F 2048 accepted_hits.bam > accepted_hits_rm.bam
-
to concatenate
accepted_hits_rm.bam
andunmapped.bam
files:samtools merge -o output.bam accepted_hits_rm.bam unmapped.bam
-
-
-
At the moment XenoSplit only accepts BAM files that are sorted by read names (i.e., the QNAME field) rather than by chromosomal coordinates. samtools can be used to sort bam files by name as below:
samtools sort -n -o input.sorted.bam input.bam
If the samples are single-end:
python xenosplit.py --out graftOut.bam graft.bam host.bam
If the samples are paired-end, --pairedEnd
mode should be turned on:
python xenosplit.py --pairedEnd --out graftOut.bam graft.bam host.bam
If the samples are aligned with STAR then --aligner
should be specified as below:
python xenosplit.py --out graftOut.bam --aligner star graft.bam host.bam
XenoSplit can switch into reporting mode using --count
argument. In this mode, XenoSplit does not output the graftOut.bam
file, only prints out the graft and host "goodness of mapping" scores for each read on the screen. These scores can be saved in goodnessOfMapping.txt
file and further analysed. To save the scores the following code can be run:
python xenosplit.py --count graft.bam host.bam > goodnessOfMapping.txt
Bug reports and feature requests are welcome. Please follow the steps below to do so:
- Create an issue to discuss your idea
- Fork XenoSplit
- Create your feature branch with
git checkout -b new-feature
- Commit your changes with
git commit -am 'Add some feature'
- Push to the branch with
git push origin new-feature
- Create a new Pull Request