Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea for speed optimization. #48

Open
iamjli opened this issue Sep 23, 2017 · 0 comments
Open

Idea for speed optimization. #48

iamjli opened this issue Sep 23, 2017 · 0 comments

Comments

@iamjli
Copy link
Contributor

iamjli commented Sep 23, 2017

Outlining one way to speed up vanilla Garnet. Won't implement now but probably worth looking into when we decide on a version of Garnet we like.


From our discussion last week, it appeared the only way to get accurate motif matches is to calculate background from user-specified open chromatin regions, then run motif searching, all on the fly. We concluded we may have to compromise and just generate a motif file within windows around every TSS. But the motif matching algorithm is different that what I thought - specifically here are the steps:

  1. Calculate score threshold from user-defined p-value and background.
  2. Scan sequences for any subsequences whose score exceed this threshold. Note that score is only determined by the subsequence and the PWM, not the background.

This means we can do a lot of the preprocessing work beforehand. We can generate a motif file with a low threshold, and everytime we run garnet, we only have to calculate the threshold we want, then we filter the motif file based on these thresholds. This would be really fast, though the motif file would be very large (probably 10s of GB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant