Skip to content
This repository has been archived by the owner on Mar 16, 2022. It is now read-only.

Copied from cDNA_primer #17

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

Conversation

pb-cdunn
Copy link

@pb-cdunn pb-cdunn commented Nov 4, 2015

cDNA_primer/pbtranscript-tofu/external_daligner/DALIGNER-d4aa4871122b35ac92e2cc13d9b1b1e9c5b5dc5c-ICEmod/LA4Ice.c

Christopher Dunn and others added 27 commits June 20, 2015 08:20
    daligner.c:696:17: warning: ‘broot’ may be used uninitialized in this function [-Wmaybe-uninitialized]
             free(broot);
    warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
but with filter_p.c deleted, instead using a macro in filter.c

old commits
-----------
Add option "-m" for outputing "m4" like overlap summary data

complie DB.c to DB.so to be called from python using ctypes

python ctypes interface to read Dazzler Assembler read database and
overlap database

The DBLA_to_falcon.py generates output from an read db and overlap that
can be sent to falcon_sense.py for generating consensus. For example,

python DBLA_to_falcon.py yeast.1.las yeast.db | falcon_sense.py \
--min_cov 4 --output_multi --min_idt 0.70 --trim_size 10 \
--n_core 24 > preads.1.fa

a note as an example

the "3" should be "4"

DBLib shosuld be DB.so

mmm... forget something, fixing it

experimental code to out data for using falcon_sense.py to generate
consensus.

fix various bugs for output format and reverse complement issues
allowing more flexible ends / hard code length threshold cut

add threholds for the read length to control what reads get corrected

Two experimental to drive the jobs from the output from HPCdaligner on a
SGE cluster

DAPipe.py: running the daligner jobs
LAPipe.py: sort / merge / doing error corrections

relax the condition for proper overlapping
the original one is too strigent for error correction

add experimental code for reducing data sizes

copied filter.c over filter_p.c, then copied change from other branch - cd

squash corresponding to merge commit ef58b19 plus "Cosmetic mods"

(remove -m option)

edited Makefile to compile LA4Falcon ERR

make proper overlapping detection more strict

Add DB2Falcon.c and modify Makefile accordingly

copied a correction from DAZZ_DB/DB2fasta.c - cd

add missing LA4Falcon as targets ERR

properly trimming the DB before dumping the sequences to get the id
mapping right.

change fopen to use mmap mode for reading

When a read is detected as contained reads within other reads, it will ouput "* *" and ignore the rest of the bread.
The fc_consensus.py should catch it and ignore to precess the aread as seed reads.  Since the error corrected reads
will be contained with other reads, this should reduce the computation time for those reads which has no impact
in the string graph construction.

add "s" option to skip contained reads

add some testing code to output alignment from Gene's alinger

Cosmetic mods (typos, etc.)

mapping read_id to dalinger internal ID

Adapt FALCON specific code for the newer version of Daligner

modify the code for Gene's new struct

free the memory properly

must not put foo.h on command-line

This actually breaks clang (and on my mac, gcc is actually clang) with
this obscure error message:

    clang: error: cannot specify -o when generating multiple output files
* simpler _p
* relative to DAZZ_DB, for now
* README_PB
mobs will call this.
for generating consensus. This limits innecessary I/O without much lost
in my test.  Some repeats might be lost but the overall efficiency for
getting the useful information is higher.
…vement

For read that has many repeat hit, we only output top n supporting reads sorted by overlapping lengths. This reduces the I/O load outputing many redundant sequences for highly repeatitive sequences with minimum effect on other region of a genome.
…OUNT to catch potential more longer overlap if avilable
…vement

Used this new modification for NIST Trio Child assembly. It works on-par with the previous version but with much better I/O performance and less useless computation during the consensus stage. I am merging the code.
Not really MT bug. Simple buffer over-run caused one thread
to write into next. Root cause was logic error in dividing
hits into threads when nhits < NTHREADS.

Fixes PacificBiosciences#9
Soon, we can drop VERY_VERBOSE altogether.
cDNA_primer/pbtranscript-tofu/external_daligner/DALIGNER-d4aa4871122b35ac92e2cc13d9b1b1e9c5b5dc5c-ICEmod/LA4Ice.c
@pb-cdunn
Copy link
Author

pb-cdunn commented Nov 4, 2015

Liz (@Magdoll), This might need changes. I have no idea. But it definitely builds.

@pb-cdunn
Copy link
Author

pb-cdunn commented Nov 4, 2015

Magdoll@

@Magdoll
Copy link

Magdoll commented Nov 4, 2015

thanks i'll give it a try

@pb-cdunn pb-cdunn force-pushed the master branch 4 times, most recently from 029bfa8 to 3f1b474 Compare December 15, 2015 13:49
pb-cdunn pushed a commit that referenced this pull request Mar 9, 2018
…nate-buffer to develop

* commit '87189252aa9b3c216e12aad904c86a545735d125':
  Minor fix to null-terminate buffer (properly, I think)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants