Provide custom cohorts #162

mitalia · 2014-03-11T21:22:56Z

One of the problems with automated cohort generation is that end-users don't have good control of cohorts. This can cause things like duplicately loaded samples to skew numbers without an obvious way to empower them to mitigate it. This also impacts #161 since the internal cohorts are updated constantly and are a moving target every night. Users should therefore be able to create their own cohorts with the following properties:

Cohorts can be created per-user, per project, and globally
Automatic cohort creation at load time should be eliminated
Cohorts created by end users should be versioned so that it is possible to use any version of a cohort for the purposes of query (and so cohort updates don't impact ongoing analyses).

There are some practical implications for this to work:

There should be a "grace period" before a cohort change is processed to prevent unnecessary churn in the database as someone adds and removes samples
Somehow, all the allele frequencies for all versions of the cohort will need to be stored. This could become unwieldy. One option is to age out older versions and archive them. This would allow users to bounce between the last several versions without issue, but old cohorts could be offloaded to a parallel table or offline storage.
Allele frequency calculations might need to happen outside the Varify application server (maybe even the database?) to preserve performance. This may not be a problem we have to worry about yet.

mitalia added the type:feature label Mar 11, 2014

naegelyd added the version:future label Mar 12, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide custom cohorts #162

Provide custom cohorts #162

mitalia commented Mar 11, 2014

Provide custom cohorts #162

Provide custom cohorts #162

Comments

mitalia commented Mar 11, 2014