Skip to content

Latest commit

 

History

History
31 lines (29 loc) · 2.06 KB

brainstorming.md

File metadata and controls

31 lines (29 loc) · 2.06 KB

Brainstorming!

Different ways to improve imeter:

  • account for degenerative sequences
  • account for – and + strands
  • set up a way to weigh the beginning of the intron vs the end of the intron (should work on finding a geometric decay in it)
    • would be calculated during counts, so like count += decay, total += decay
  • account for sequence repetition due to splice variants
  • decoding, putting things into libraries (proper program in argparse, put functions into some shared library)
    • trainer program
    • decoder program
  • scientific studies
  • controls? Experimental data and controls ( R squared value
  • do certain kmers predict long introns? Long introns will contribute more kmers, if one is more common in long kmers, will be disproportionately weighted.
    • could weight kmers on length of parent intron
    • could weight kmers based on composition of sequence? might be something good
  • use expression as a response variable, like a checker.
  • incorporate 4-5 fold cross val, eventually*
  • sort sequences at end, by score, - to +
  • we could score the kmers separately??
    • if we score without weight based on position in intron?
    • then after with weight? or developing a secondary metric?
      • like adding a second log-odds, across the single intron cutoff at an arbitrary(?) point, sortof the same math ? ? ?
    • is there a virtue in more stringent cutoffs? i.e, is there a virtue in making a deadzone, so to speak, where introns are ignored because they aren't 'distal' or 'proximal' enough?
    • in general, the approach relies on big data sets, is there something we could account for for the future for smaller ones (this is out of scope for a rotation, more out of curiosity)
    • split based on expression level?
      • could expression be a separate function? perhaps accounting for constitutive expression or high expression
      • what is the optimal way to split? there are many parameters
  • potential to experimentally determine best split options....
  • investigate python itertools combinations, or something to handle permutations, ie (ACTG, 5), then it makes all the possible combinations