-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot always reconstruct denominator
and group_by
parameters
#17
Comments
I think deciles-charts should probably assume the columns in a measure table are
but it's not that simple. Why?
|
This is the gift that keeps giving, isn't it? 😆 What if |
It's not that simple; but for different reasons. We can ignore how an input file is read. What's important is which columns are dropped and how rows are grouped. In all cases:
I'm unsure whether it's possible to determine the denominator column reliably: that is, to not mistake it for another column. However, as @ccunningham101 has pointed out, we don't need to: instead, we need to drop rows where the value of the |
I think that dropping rows where the value of the To recap: our aim is to drop rows where the value of the denominator column is zero. Reliably determining the denominator column is troublesome. Whilst we could assume the columns in a measure table are
this assumption wouldn't hold in all cases. However, if the value of the denominator column is zero, then the value of the Behaviour against real data isn't any different to behaviour against dummy data: at least, there's nothing in |
It's non-trivial to identify the denominator column without the associated Measure instance. It's much easier to test the value column for inf, which is returned by Pandas when the second argument of a division operation is zero. If we test value for inf, then we can also remove the _get_denominator helper function, which reduces our use of DataFrame.attrs. Fixes #17
We cannot always recreate groupby (see opensafely-actions/deciles-charts#17) So we cannot only use the length of groupby to decide whether to do a groupby plot or a normal plot. Instead, check that the word "total" is in the measure file name
This issue was reported by @ccunningham101. Thanks, Christine!
deciles-charts assumes the columns in a measure table are:
[group_by, ]numerator, denominator, value, date
. This assumption allows deciles-charts to reconstruct thedenominator
andgroup_by
parameters passed toMeasure
.1 However, it doesn't always hold.In the following example, the assumption holds because cohort-extractor drops duplicate columns from a measure table.
group_by
columns are dropped before thedenominator
column; and thedenominator
column is dropped before thenumerator
column.However, in the following example, the assumption doesn't hold. Why?
I think this is because although. (Note thatgroup_by=["population']
should be the same asgroup_by=None
, it isn'tgroup_by=["population"]
andgroup_by="population"
are equivalent.)Footnotes
deciles-charts doesn't use the
group_by
parameter, so_get_group_by
should be removed. ↩The text was updated successfully, but these errors were encountered: