Skip to content
Martin Asser Hansen edited this page Oct 1, 2015 · 6 revisions

#summary Assemble ordered overlapping pair-end sequences in the stream.

Biopiece: assemble_pairs

Description

[assemble_pairs] assembles overlapping pair-end sequences into single sequences that are output to the stream - the orginal sequences are not output. Assembly works by progressively considering all overlaps between the maximum considered overlap using the -p switch (default is the length of the shortest sequence) until the minimum required overlap supplied with the -o switch (default 1). For each overlap a percentage of mismatches can be allowed using the -m switch (default 5%).

Mismatches in the overlapping regions are resolved so that the residues with the highest quality score is used in the assembled sequence. The quality scores are averaged in the overlapping region. The sequence of the overlapping region is output in upper case and the remaining in lower case.

Paired sequences must follow the Illuina 1.5 scheme where names end on /1 or /2 or the Illumina 1.8 scheme where the names contain a space followed by 1 or 2 and then a :. Futhermore, sequences must be in interleaved order in the stream - use [read_fastq] for that.

The additional keys are added to records with merged sequences:

  • OVERLAP_LEN - the length of the located overlap.
  • HAMMING_DIST - the number of mismatches in the assembly.

Usage

... | assemble_pairs [options]

Options

[-?         | --help]               #  Print full usage description.
[-m <uint>  | --mismatches=<uint>   #  Allowed overlap mismatches in percent  -  Default=5
[-o <uint>  | --overlap_min=<uint>  #  Minimum overlap require                -  Default=1
[-p <uint>  | --overlap_max=<uint>  #  Minimum overlap considered             -  Default=(length of shortest sequence)
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file            -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to stream file            -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

If you have two pair-end sequence files with the Illumina 1.5 or 1.8 scheme of naming pairs then you can assemble these using [assemble_pairs] like this:

read_fastq -i in1.fq -j in2.fq | assemble_pairs | write_fastq -o out.fq -x

See also

[read_fastq]

[write_fastq]

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

March 2013

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

[assemble_pairs] is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally