Script needed for assembly evaluation #36

jorvis · 2016-01-29T18:36:06Z

We have need of a script which simulates fragmented sequences based on more-complete input sequence. This is perhaps best illustrated with a current use case.

We are using unsheared, paired-end reads aligned to transcriptome assemblies to determine real evidence for each, or even possibly group them further. We expect overlapping transcripts like this to be assembled:

5'---------------------3'
               5'----------------------------------3'

But paired-end grouping might also be able to pull these together, even inserting Ns given a known library insert size, if read mate pairs span the gap between them:

5'---------------------3'
                                          5'----------------------------------3'

So, here, this proposed script would allow me to take a known set of transcripts and artificially fragment them, generating some fragments that overlap and others that are separate from one another. This could be controlled with user-configurable options such as:

--min_overlap_distance=-200
--max_overlap_distance=100
--fragmentation_factor=6

Notice the negative value above, which allows for the 2nd case above where sequence fragments do not overlap. With these options, the script would transform a FASTA file with 1000 sequences into one with around 6000 sequences, with fragments generated with an overlap distance of up to 100bp and as far as 200bp apart from each other based on their parent sequence.

Data should be appended to the header descriptions in the product sequences to indicate their source and coordinates.

jorvis · 2016-01-29T18:54:00Z

As a side note, we should evaluate this with what this does: http://www.ncbi.nlm.nih.gov/pubmed/22962361

jorvis added the enhancement label Jan 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script needed for assembly evaluation #36

Script needed for assembly evaluation #36

jorvis commented Jan 29, 2016

jorvis commented Jan 29, 2016

Script needed for assembly evaluation #36

Script needed for assembly evaluation #36

Comments

jorvis commented Jan 29, 2016

jorvis commented Jan 29, 2016