NanoStat is largely superseded by Cramino - a much faster alternative - and NanoStat will most probably no longer receive any updates
Calculate various statistics from a long read sequencing dataset in fastq, bam or albacore sequencing summary format.
NanoStat is written for Python3 and will not work in Python2.7 or older.
pip install nanostat
or
conda install -c bioconda nanostat
NanoStat [-h] [-v] [-o OUTDIR] [-p PREFIX] [-n NAME] [-t N]
[--barcoded] [--readtype {1D,2D,1D2}]
(--fastq file [file ...] | --fasta file [file ...] | --summary file [file ...] | --bam file [file ...])
Calculate statistics of long read sequencing dataset.
General options:
-h, --help show the help and exit
-v, --version Print version and exit.
-o, --outdir OUTDIR Specify directory in which output has to be created.
-p, --prefix PREFIX Specify an optional prefix to be used for the output file.
-n, --name NAME Specify a filename/path for the output, stdout is the default.
-t, --threads N Set the allowed number of threads to be used by the script.
--tsv, Print the output in a tab-separated-values format
Input options.:
--barcoded Use if you want to split the summary file by barcode
--readtype {1D,2D,1D2}
Which read type to extract information about from summary. Options are 1D, 2D,
1D2
Input data sources, one of these is required.:
--fastq file [file ...]
Data is in one or more (compressed) fastq file(s).
--fasta file [file ...]
Data is in one or more (compressed) fasta file(s).
--summary file [file ...]
Data is in one or more (compressed) summary file(s)generated by albacore or guppy.
--bam file [file ...]
Data is in one or more sorted bam file(s).
EXAMPLES:
NanoStat --fastq reads.fastq.gz --outdir statreports
NanoStat --summary sequencing_summary1.txt sequencing_summary2.txtsequencing_summary3.txt --readtype 1D2
NanoStat --bam alignment.bam alignment2.bam
NanoStat --fastq reads.fastq.gz --outdir statreports
NanoStat --summary sequencing_summary1.txt sequencing_summary2.txt sequencing_summary3.txt --readtype 1D2
NanoStat --bam alignment.bam alignment2.bam
General summary:
Active channels: 502
Mean read length: 8593.5
Mean read quality: 10.8
Median read length: 5168.0
Median read quality: 11.2
Number of reads: 408254
Read length N50: 15141
Total bases: 3508315665
Number, percentage and megabases of reads above quality cutoffs
>Q5: 406428 (99.6%) 3502.0Mb
>Q7: 395016 (96.8%) 3234.5Mb
>Q10: 305509 (74.8%) 2475.9Mb
>Q12: 87903 (21.5%) 422.9Mb
>Q15: 124 (0.0%) 0.1Mb
Top 5 highest mean basecall quality scores and their read lengths
1: 16.2 (407; a803bcfc-9d7a-4a87-84e4-1a0296113700)
2: 16.2 (880; f5fee32a-9471-4a68-8697-a71887599757)
3: 16.1 (729; 3ea23a79-641e-41ab-bb5b-c22609977136)
4: 16.1 (1057; b0cef5fd-c5e1-4539-9591-b7376b2953e8)
5: 15.8 (841; 3d4f8075-6151-4147-bdc3-e5d53ff66084)
Top 5 longest reads and their mean basecall quality score
1: 255821 (6.8; 7d069f04-d4db-4f12-a1b9-c19d70993492)
2: 254573 (7.1; a245999b-de28-4720-a8c3-0d5cbb26e473)
3: 253711 (7.0; a84b106b-13d3-4bfa-b548-71a47c9032c3)
4: 245784 (7.0; 2a60ee11-8793-46c1-a3d9-667bc4e70405)
5: 245776 (7.1; 72a8cf33-75fd-4c07-8a4c-7516b690938b)
I welcome all suggestions, bug reports, feature requests and contributions. Please leave an issue or open a pull request. I will usually respond within a day, or rarely within a few days.
If you use this tool, please consider citing our publication.