Idea for speed optimization. #48

iamjli · 2017-09-23T18:40:03Z

Outlining one way to speed up vanilla Garnet. Won't implement now but probably worth looking into when we decide on a version of Garnet we like.

From our discussion last week, it appeared the only way to get accurate motif matches is to calculate background from user-specified open chromatin regions, then run motif searching, all on the fly. We concluded we may have to compromise and just generate a motif file within windows around every TSS. But the motif matching algorithm is different that what I thought - specifically here are the steps:

Calculate score threshold from user-defined p-value and background.
Scan sequences for any subsequences whose score exceed this threshold. Note that score is only determined by the subsequence and the PWM, not the background.

This means we can do a lot of the preprocessing work beforehand. We can generate a motif file with a low threshold, and everytime we run garnet, we only have to calculate the threshold we want, then we filter the motif file based on these thresholds. This would be really fast, though the motif file would be very large (probably 10s of GB).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea for speed optimization. #48

Idea for speed optimization. #48

iamjli commented Sep 23, 2017

Idea for speed optimization. #48

Idea for speed optimization. #48

Comments

iamjli commented Sep 23, 2017