STAR mapping on slurm clusters in bags.
python3.4+
STAR
in yourPATH
git clone https://github.com/iosonofabio/bag_of_stars.git
Call bag of stars from the bos
folder:
python bag_of_stars.py --genomeDir <your genome folder> --output <your output folder> --fastqFolder <your fastq root folder>
- The fastq root folder must contain subfolders with each a read1 and read2 file.
- New subfolders with the same names will be made inside the output folder.
- The genome folder must contain STAR's hash files (e.g.
SA
,SAindex
,Genome
) STAR
must be in yourPATH
, you can check your.bashrc
for what folders are there.
python bag_of_stars.py --help
usage: bag_of_stars.py [-h] [--dry] --output OUTPUT [-n N]
[--genomeDir GENOMEDIR] [--local]
[--cpus-per-task CPUS_PER_TASK] [--mem MEM]
[--time TIME]
fastq_folder
STAR mapping in bags
required arguments:
--fastqFolder INPUT Parent folder of subfolders with 2 fastq.gz files in
each.
--output OUTPUT Parent folder for the output. For each input
subfolder, an output subfolder will be made
--genomeDir GENOMEDIR
Folder with the STAR genome hash
optional arguments:
-h, --help show this help message and exit
--dry Dry run
-n N Number of samples per STAR call
--local Do not send to cluster, do everything locally
--cpus-per-task CPUS_PER_TASK
Number of CPUs for each STAR call
--mem MEM RAM memory in MB for each STAR call
--time TIME Time limit on each group of STAR jobs (see slurm docs
for format info)
--partition PART String with comma-separated list of partitions on
Sherlock (e.g. 'quake,owners,normal'). Use quotes
--htseq Also perform `htseq-count` to count features
--annotationFile FN Path to the GTF file for htseq
--delete-empty-BAM Delete empty BAM files and remap them