Skip to content
This repository has been archived by the owner on Mar 16, 2022. It is now read-only.

safe guard for memory error #6

Merged
merged 4 commits into from
Aug 22, 2015

Conversation

pb-jchin
Copy link

No description provided.

@pb-jchin
Copy link
Author

Used this new code for NIST Trio Child assembly. It works on-par with the previous version but with much better I/O performance and less useless computation during the consensus stage. I am merging the code.

pb-jchin added a commit that referenced this pull request Aug 22, 2015
Used this new modification for NIST Trio Child assembly. It works on-par with the previous version but with much better I/O performance and less useless computation during the consensus stage. I am merging the code.
@pb-jchin pb-jchin merged commit 7ef40d8 into PacificBiosciences:master Aug 22, 2015
@pb-jchin pb-jchin deleted the LA4Falcon_Improvement branch August 22, 2015 18:22
@pb-cdunn
Copy link

No objection here. I don't fully understand the change, so I didn't comment.

@pb-jchin
Copy link
Author

The change is to reduce the amount of sequence data that LA4Falcon to fetch and send to the consensus code. The basic problem is that when one seqences large or very repetitive genomes, sometimes there are many false positive hits. For example, you can find some reads have more 50,000 reads aligned. They can't be all from the same genomic location. The idea is to use some simple predictor (overlap length - overhang length used here) to select those that have higher likelihood to be from the same genomic location and limit the total amount of sequences to fetch and ouput. This may solve some I/O bandwidth issue and if the predictor is good, the results could be even better in theory. With this change, I was able to have 120 concurrent I/O proceses reading the .raw_read.bps and feed to the computation to utilize the CPUs with about 10% sys call overheads and some even smaller i/o wait. The older code will waste a lot of time on the lseek sys calls as the code has to fetch many sequences even most of them are not used at the end.

Also, ideally, I would use a priority queue to keep the top n-hits. It could be done later. The flat array + qsort solve the current problem.

@pb-jchin
Copy link
Author

oh. the MIN macro was not used at the end. we can remove it.

@pb-cdunn
Copy link

Nice.

Also, ideally, I would use a priority queue to keep the top n-hits. It could be done later.

Maybe I'll try that myself someday.

pb-jchin added a commit to pb-jchin/FALCON-integrate that referenced this pull request Sep 7, 2015
…are done

summary:

The graph to layout code add a new rule to reduce mis-assembly, see PacificBiosciences/FALCON#179
Initial raw read alignment hit are sorted by overlap length to get error correction reads more efficiently.  See https://github.com/PacificBiosciences/DALIGNER/pull/, PacificBiosciences/DALIGNER#6
pb-cdunn pushed a commit to pb-cdunn/DALIGNER that referenced this pull request Dec 8, 2015
…vement

Used this new modification for NIST Trio Child assembly. It works on-par with the previous version but with much better I/O performance and less useless computation during the consensus stage. I am merging the code.
pb-cdunn pushed a commit to pb-cdunn/DALIGNER that referenced this pull request Dec 8, 2015
…vement

Used this new modification for NIST Trio Child assembly. It works on-par with the previous version but with much better I/O performance and less useless computation during the consensus stage. I am merging the code.
pb-cdunn pushed a commit to pb-cdunn/DALIGNER that referenced this pull request Dec 8, 2015
…vement

Used this new modification for NIST Trio Child assembly. It works on-par with the previous version but with much better I/O performance and less useless computation during the consensus stage. I am merging the code.
pb-jchin added a commit that referenced this pull request Dec 9, 2015
Used this new modification for NIST Trio Child assembly. It works on-par with the previous version but with much better I/O performance and less useless computation during the consensus stage. I am merging the code.
pacbbbbot pushed a commit that referenced this pull request Apr 15, 2017
…evelop

* commit '18f2fb2e33f37dc9a61d5584845041a7e6925069':
  stop including the dazzdb repo
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants