-
Notifications
You must be signed in to change notification settings - Fork 23
read_sff
read_sff read in sequence entries from SFF files. Quality scores will be converted to base 64 phred type scores (like Illumina). The resulting recods look like this:
SEQ_NAME: FQIBXOY01DRIMT
SEQ: TCAGTCATATTTTT...
SEQ_LEN: 279
CLIP_QUAL_LEFT: 4
CLIP_QUAL_RIGHT: 277
CLIP_ADAPTOR_LEFT: -1
CLIP_ADAPTOR_RIGHT: -1
SCORES: aaa`[[[_[NNNNNN...
X_POS: 1426
Y_POS: 1923
---
Negative values for any of the CLIP_ADAPTOR_
keys indicates that no adaptor was found.
read_sff don't work on gzipped files.
For more about the SFF format:
http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=show&f=formats&m=doc&s=format#sff
read_sff [options] -i <SFF file(s)>
[-? | --help] # Print full usage description.
[-i <files!> | --data_in=<files!>] # Comma separated list of files or glob expression to read.
[-n <uint> | --num=<uint>] # Limit number of records to read.
[-m | --mask] # Mask sequence according to clipping information.
[-c | --clip] # Clip sequence according to clipping information.
[-I <file> | --stream_in=<file!>] # Read input stream from file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output stream to file - Default=STDOUT
[-v | --verbose] # Verbose output.
To read in 1 entry from a SFF file use read_sff with the -n
switch:
(SEQ
and SCORES
truncated for brievity)
read_sff -n 1 -i test.sff
SEQ_NAME: FQIBXOY01DRIMT
SEQ: TCAGTCATATTTTT...
SEQ_LEN: 279
CLIP_QUAL_LEFT: 4
CLIP_QUAL_RIGHT: 277
CLIP_ADAPTOR_LEFT: -1
CLIP_ADAPTOR_RIGHT: -1
SCORES: aaa`[[[_[NNNNNN...
X_POS: 1426
Y_POS: 1923
---
Use the -m
switch to soft mask the sequences according to the CLIP_QUAL
information:
read_sff -n 1 -i test.sff -m
SEQ_NAME: FQIBXOY01DRIMT
SEQ: tcagTCATATTTTT...
SEQ_LEN: 279
CLIP_QUAL_LEFT: 4
CLIP_QUAL_RIGHT: 277
CLIP_ADAPTOR_LEFT: -1
CLIP_ADAPTOR_RIGHT: -1
SCORES: aaa`[[[_[NNNNNN...
X_POS: 1426
Y_POS: 1923
---
Or use the -c
switch to clip the sequences according to the CLIP_QUAL
information:
read_sff -n 1 -i test.sff -c
SEQ_NAME: FQIBXOY01DRIMT
SEQ: TCATATTTTT...
SEQ_LEN: 275
CLIP_QUAL_LEFT: 4
CLIP_QUAL_RIGHT: 277
CLIP_ADAPTOR_LEFT: -1
CLIP_ADAPTOR_RIGHT: -1
SCORES: [[[_[NNNNNN...
X_POS: 1426
Y_POS: 1923
---
[read_454]
[write_454]
Martin Asser Hansen - Copyright (C) - All rights reserved.
Februar 2011
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
read_sff is part of the Biopieces framework.