-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
Time aggregation is now a parameter passed in during Indicator creation, based on a list of allowed indicator aggregations. Some indicators only support yearly or daily aggregation, but most support yearly or monthly. This dramatically cuts down on the number of individual indicators, but is a huge shift in the Indicator design. The final implementation is likely to shift as we adjust for actual usage and try to incorporate MaxConsecutiveDaysIndicator to being aggregation-agnostic.
Code looks good at a high level. I'd give the entire thing another once over to verify all docstrings and descriptive text are updated. I left comments in a few places but I almost certainly missed more. I don't think the IndicatorParams rebase for this is going to be terrible. Just a lot of renaming/reshuffling. |
@@ -30,7 +46,9 @@ class Indicator(object): | |||
available_units = (None,) | |||
parameters = None | |||
|
|||
def __init__(self, city, scenario, models=None, years=None, | |||
monthly_range_config = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A docstring on what this means would be helpful.
|
||
|
||
class YearlyCountIndicator(YearlyAggregationIndicator): | ||
class CountIndicator(Indicator): | ||
""" Class to count days on which a condition is met. | ||
|
||
Essentially a specialized version of the YearlyAggregationIndicator where all values count as 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update docstring
'(Default 1st) percentile of observations from 1960 to 1995') | ||
class ExtremeColdEvents(CountUnitsMixin, CountIndicator): | ||
label = 'Extreme Cold Events' | ||
description = ('Total number of times per year daily minimum temperature is below the specified' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of many description strings that still refers to the time aggregation in the description. We should probably reword these description strings to remove those references, or update them.
Something like: "Total number of times per aggregation period that the daily minimum temperature is below the specified percentile..."?
filters = None | ||
conditions = None | ||
agg_function = F |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline docs for all of these new parameters would be helpful.
|
||
Caches the resulting range config as a class attribute. | ||
""" | ||
if cls.monthly_range_config is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like any part of generating the monthly range config relies on the Indicator class object. It might make sense to move this to some other utility class so that it only needs to be created once, rather than once for each indicator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good thought, should help keep distinct concepts encapsulated in their own area as well
data = indicator.calculate() | ||
self.assertEqual(data, self.test_models_filter_equals) | ||
|
||
def test_unit_conversion_definitions(self): | ||
""" Some sanity checks for unit conversion class attributes """ | ||
self.assertIn(self.indicator_class.default_units, self.indicator_class.available_units) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the indicators working for you in the lab? I'm getting no graphs and errors like d3.js:7679 Error: <path> attribute d: Expected number, "MNaN,74.366125287…".
for all of them.
@@ -77,6 +81,10 @@ class YearlyDrySpells(CountUnitsMixin, YearlySequenceIndicator): | |||
variables = ('pr',) | |||
raw_condition = 'pr = 0' | |||
|
|||
def row_group_key(self, row): | |||
""" Key function for groupby to use to break input stream into chunks.""" | |||
return (row['data_source__model'], row['data_source__year']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simplified away if
- The version in
YearlyMaxConsecutiveDaysIndicator
were moved up intoYearlySequenceIndicator
aggregate
below were revised to match the one inYearlyMaxConsecutiveDaysIndicator
except for the calculation of thevalue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow what we're going for... if we're moving row_group_key
to YearlySequenceIndicator
, what is being simplified away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now YearlySequenceIndicator
has two direct children (YearlyDrySpells
and YearlyMaxConsecutiveDaysIndicator
), both of which have a row_group_key
method. But with some changes to YearlyDrySpells.aggregate
to make it more like YearlyMaxConsecutiveDaysIndicator.aggregate
(which would be good regardless to make the similarities and differences show up more clearly), row_group_key
could be defined only once, on YearlySequenceIndicator
.
MonthlyAggregationIndicator, MonthlyCountIndicator, | ||
DailyRawIndicator, BasetempIndicatorMixin) | ||
from .abstract_indicators import (Indicator, CountIndicator, BasetempIndicatorMixin, | ||
YearlyMaxConsecutiveDaysIndicator, YearlySequenceIndicator) | ||
from .unit_converters import (TemperatureUnitsMixin, PrecipUnitsMixin, DaysUnitsMixin, | ||
CountUnitsMixin, TemperatureDeltaUnitsMixin) | ||
|
||
|
||
########################## | ||
# Yearly indicators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second the call for a general inline documentation review.
agg_function = Avg | ||
default_units = 'in/year' | ||
agg_function = Sum | ||
default_units = 'in/day' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't Avg
correct? I may be failing once again to reason clearly about this particular indicator, but summing rates across different timescales and treating them as in/day regardless seems wrong.
Also this is now overriding default_units
with the same value that's set in the mixin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #166 in part for this, but the issue is that TotalPrecipitation
shouldn't be returning a rate, it should be returning a total. Using units to give an average over a timespan that matches the aggregation means that the numeric results match the desired value, but the quantity they represent is wrong... 1 in/month is the same quantity as 12 in/year, so a year of constant 1 in/month rainfall should be represented as 144 in/year.
in/day
is a closer approximation that gives workable values for both monthly and yearly aggregation, under the idea that it's the sum of precipitation "per day in the aggregation". Ideally we'd resolve this by reporting our precipitation total indicators using simple mass per area or linear distance, but I didn't want to unilaterally make that distinction for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Pending the refactor in #166, though, with Avg
you get an answer that's technically correct, if confusing and not intuitive, and you can override the units to match your aggregation period if you want a number you can think of as just "inches".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Technically correct" is a bit tricky since the thing that we're trying to do is fundamentally flawed. One option gives correct numerical results with a misleading unit, another gives quantitatively misleading results with a "correct" unit.
I don't think Avg
is workable for this PR, though, since it only works if the unit corresponds with the aggregation. We could invest the effort to make the unit change dynamically with the time aggregation, but if we are going to invest additional effort here for this PR I'd prefer just go ahead and implement #166 (It's a really straightforward change, only notable part is updating unit tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you're more bothered by the mismatch between the indicator names and what they have been doing (returning rates) whereas I'm more bothered by the mismatch that Sum
causes between the units they say they're returning and the units they're actually returning.
I.e.
looks worse than
to me because the numbers+units label seems like it makes a stronger claim than the indicator name does, and that's the claim that's misleading in the Sum
case.
But it doesn't matter provided we do #166.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and did #166 here, so it should just read kg/m^2
or mm
or in
. I left precipitation rates alone for the raw daily indicator and in case we come across a situation where rate is the conceptually preferable way of representing precipitation.
Also, I tried to make the sequence indicators work for monthly aggregation, assigning streaks to the month in which they start. But I was not successful. I'm sure there's a way, but on the other hand I'm not sure monthly streaks are a particularly useful indicator anyway. Might be worth taking another run at it for seasonal. |
Yeah, I did some looking at the lab before filing this but it looks like the lab assumes that indicators have a 1-to-1 correlation with their time aggregation, which this breaks. I didn't see a trivial way to bring it up to line, so this PR would break the lab without a corresponding PR there to fix it :( |
- Abstract monthly range logic to external class - Documentation enhancements
4758f76
to
fa6423e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Seeing if making this a "review comment" enables threading.)
Re the Lab breakage, the super-quick fix is to add || 'yearly'
at https://github.com/azavea/climate-change-lab/blob/develop/src/app/services/chart.service.ts#L63
Not worth committing, probably. Hopefully azavea/climate-change-lab#119 won't be bad. But useful for seeing this in action.
Looks like Github dislikes threaded comments 😬 Adding Also, with the focus on documentation, I'm thinking of stripping the boilerplate |
Oops, should have been more specific about where to add the hack. Changing that line to
worked for me. 👍 to removing the boilerplate documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precip sorted!
This all looks great.
# Precipitation is stored per-second, and we want a total for all days in the aggregation, | ||
# so we need to multiple each day's value by 86400 to get the total for that day and then | ||
# sum the results | ||
expression = F('pr') * SECONDS_PER_DAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indicator_name = 'total_precipitation' | ||
time_aggregation = 'monthly' | ||
units = 'kg/m^2' | ||
test_indicator_rcp85_equals = {'2000-01': {'avg': 3024000, 'min': 2592000, 'max': 3456000}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did the test condition for the TotalPrecipitation indicators (both yearly/monthly) change? Does it make sense that both aggregation levels have the same totals and responses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test condition changed because it was previously looking for kg/m^2/s
, which isn't an option anymore... we're giving the total rainfall, which is the rainfall per day summed across the whole year, and since the raw data is in rainfall per second everything had to be scaled up by 84600.
And unfortunately upon inspection it does make sense that the monthly and yearly have the same values... every single one of our test data points is set for January 1st. We need a lot more test data to start to get diverging results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Thanks for clarifying.
Overview
Refactors time aggregation to be determined by configuration and parameter instead of defined as part of the indicator. This reduces the need for redundant yearly or monthly aggregated indicators and means additional aggregations like quarterly can be added without needing to redefine all existent indicators.
Notes
This appears to break the Climate Change Lab by breaking the assumption that indicators have a 1-to-1 correlation with their time aggregation. Now that aggregation is a configurable parameter the lab may need some refactoring to allow for indicators to either be monthly or yearly or otherwise.
Because of the use of raw SQL, indicators that measure consecutive days that fit a criteria are yearly only:
These indicators were yearly only previously, so this limitation should be acceptable for the time being.
In order to make TotalPrecipitation work with configurable time aggregation, it's been changed to report in
in/day
and to report the sum of daily precipitation instead of the average. This should be equivalent (The average being the sum divided by number of values, and in this case we divided the unit reported by number of days in a year), and promotes an opportunity to simplify our handling of precipitation by removing the/day
suffix and introducing linear distance measurements for precipitation, with "per aggregation period" implied.With this simplification, we may be able to consolidate indicators further by introducing configurable value aggregation for HighTemperature, LowTemperature, and Precipitation indicators, and making them have all available time aggregations allowed.
Testing Instructions
Closes #149 and #166