Refactor Time Aggregation #162

rmartz · 2016-12-19T18:01:24Z

Overview

Refactors time aggregation to be determined by configuration and parameter instead of defined as part of the indicator. This reduces the need for redundant yearly or monthly aggregated indicators and means additional aggregations like quarterly can be added without needing to redefine all existent indicators.

Notes

This appears to break the Climate Change Lab by breaking the assumption that indicators have a 1-to-1 correlation with their time aggregation. Now that aggregation is a configurable parameter the lab may need some refactoring to allow for indicators to either be monthly or yearly or otherwise.

Because of the use of raw SQL, indicators that measure consecutive days that fit a criteria are yearly only:

YearlyMaxConsecutiveDryDays
YearlyDrySpells
HeatWaveDurationIndex
These indicators were yearly only previously, so this limitation should be acceptable for the time being.

In order to make TotalPrecipitation work with configurable time aggregation, it's been changed to report in in/day and to report the sum of daily precipitation instead of the average. This should be equivalent (The average being the sum divided by number of values, and in this case we divided the unit reported by number of days in a year), and promotes an opportunity to simplify our handling of precipitation by removing the /day suffix and introducing linear distance measurements for precipitation, with "per aggregation period" implied.

With this simplification, we may be able to consolidate indicators further by introducing configurable value aggregation for HighTemperature, LowTemperature, and Precipitation indicators, and making them have all available time aggregations allowed.

Testing Instructions

All unit tests should still pass
Run API queries for any indicator, with any combination of parameters. They should still return correct data.

Closes #149 and #166

Time aggregation is now a parameter passed in during Indicator creation, based on a list of allowed indicator aggregations. Some indicators only support yearly or daily aggregation, but most support yearly or monthly. This dramatically cuts down on the number of individual indicators, but is a huge shift in the Indicator design. The final implementation is likely to shift as we adjust for actual usage and try to incorporate MaxConsecutiveDaysIndicator to being aggregation-agnostic.

CloudNiner · 2016-12-19T20:12:39Z

Code looks good at a high level.

I'd give the entire thing another once over to verify all docstrings and descriptive text are updated. I left comments in a few places but I almost certainly missed more.

I don't think the IndicatorParams rebase for this is going to be terrible. Just a lot of renaming/reshuffling.

CloudNiner · 2016-12-19T19:42:34Z

django/climate_change_api/indicators/abstract_indicators.py

@@ -30,7 +46,9 @@ class Indicator(object):
    available_units = (None,)
    parameters = None

-    def __init__(self, city, scenario, models=None, years=None,
+    monthly_range_config = None


A docstring on what this means would be helpful.

CloudNiner · 2016-12-19T19:44:59Z

django/climate_change_api/indicators/abstract_indicators.py

-
-
-class YearlyCountIndicator(YearlyAggregationIndicator):
+class CountIndicator(Indicator):
    """ Class to count days on which a condition is met.

    Essentially a specialized version of the YearlyAggregationIndicator where all values count as 1


Update docstring

CloudNiner · 2016-12-19T19:47:47Z

django/climate_change_api/indicators/indicators.py

-                   '(Default 1st) percentile of observations from 1960 to 1995')
+class ExtremeColdEvents(CountUnitsMixin, CountIndicator):
+    label = 'Extreme Cold Events'
+    description = ('Total number of times per year daily minimum temperature is below the specified'


This is one of many description strings that still refers to the time aggregation in the description. We should probably reword these description strings to remove those references, or update them.

Something like: "Total number of times per aggregation period that the daily minimum temperature is below the specified percentile..."?

CloudNiner · 2016-12-19T19:58:34Z

django/climate_change_api/indicators/abstract_indicators.py

    filters = None
+    conditions = None
+    agg_function = F


Inline docs for all of these new parameters would be helpful.

CloudNiner · 2016-12-19T20:04:14Z

django/climate_change_api/indicators/abstract_indicators.py

+
+        Caches the resulting range config as a class attribute.
+        """
+        if cls.monthly_range_config is None:


It doesn't look like any part of generating the monthly range config relies on the Indicator class object. It might make sense to move this to some other utility class so that it only needs to be created once, rather than once for each indicator.

That's a good thought, should help keep distinct concepts encapsulated in their own area as well

CloudNiner · 2016-12-19T20:10:41Z

django/climate_change_api/indicators/tests/test_indicators.py

        data = indicator.calculate()
        self.assertEqual(data, self.test_models_filter_equals)

+    def test_unit_conversion_definitions(self):
+        """ Some sanity checks for unit conversion class attributes """
+        self.assertIn(self.indicator_class.default_units, self.indicator_class.available_units)


KlaasH

Are the indicators working for you in the lab? I'm getting no graphs and errors like d3.js:7679 Error: <path> attribute d: Expected number, "MNaN,74.366125287…". for all of them.

KlaasH · 2016-12-20T06:35:54Z

django/climate_change_api/indicators/indicators.py

@@ -77,6 +81,10 @@ class YearlyDrySpells(CountUnitsMixin, YearlySequenceIndicator):
    variables = ('pr',)
    raw_condition = 'pr = 0'

+    def row_group_key(self, row):
+        """ Key function for groupby to use to break input stream into chunks."""
+        return (row['data_source__model'], row['data_source__year'])


I think this could be simplified away if

The version in YearlyMaxConsecutiveDaysIndicator were moved up into YearlySequenceIndicator

aggregate below were revised to match the one in YearlyMaxConsecutiveDaysIndicator except for the calculation of the value

I don't think I follow what we're going for... if we're moving row_group_key to YearlySequenceIndicator, what is being simplified away?

Right now YearlySequenceIndicator has two direct children (YearlyDrySpells and YearlyMaxConsecutiveDaysIndicator), both of which have a row_group_key method. But with some changes to YearlyDrySpells.aggregate to make it more like YearlyMaxConsecutiveDaysIndicator.aggregate (which would be good regardless to make the similarities and differences show up more clearly), row_group_key could be defined only once, on YearlySequenceIndicator.

KlaasH · 2016-12-20T12:48:16Z

django/climate_change_api/indicators/indicators.py

-                                  MonthlyAggregationIndicator, MonthlyCountIndicator,
-                                  DailyRawIndicator, BasetempIndicatorMixin)
+from .abstract_indicators import (Indicator, CountIndicator, BasetempIndicatorMixin,
+                                  YearlyMaxConsecutiveDaysIndicator, YearlySequenceIndicator)
 from .unit_converters import (TemperatureUnitsMixin, PrecipUnitsMixin, DaysUnitsMixin,
                              CountUnitsMixin, TemperatureDeltaUnitsMixin)


 ##########################
 # Yearly indicators


I second the call for a general inline documentation review.

KlaasH · 2016-12-20T13:26:12Z

django/climate_change_api/indicators/indicators.py

-    agg_function = Avg
-    default_units = 'in/year'
+    agg_function = Sum
+    default_units = 'in/day'


Isn't Avg correct? I may be failing once again to reason clearly about this particular indicator, but summing rates across different timescales and treating them as in/day regardless seems wrong.

Also this is now overriding default_units with the same value that's set in the mixin.

I filed #166 in part for this, but the issue is that TotalPrecipitation shouldn't be returning a rate, it should be returning a total. Using units to give an average over a timespan that matches the aggregation means that the numeric results match the desired value, but the quantity they represent is wrong... 1 in/month is the same quantity as 12 in/year, so a year of constant 1 in/month rainfall should be represented as 144 in/year.

in/day is a closer approximation that gives workable values for both monthly and yearly aggregation, under the idea that it's the sum of precipitation "per day in the aggregation". Ideally we'd resolve this by reporting our precipitation total indicators using simple mass per area or linear distance, but I didn't want to unilaterally make that distinction for this PR.

Agreed. Pending the refactor in #166, though, with Avg you get an answer that's technically correct, if confusing and not intuitive, and you can override the units to match your aggregation period if you want a number you can think of as just "inches".

"Technically correct" is a bit tricky since the thing that we're trying to do is fundamentally flawed. One option gives correct numerical results with a misleading unit, another gives quantitatively misleading results with a "correct" unit.

I don't think Avg is workable for this PR, though, since it only works if the unit corresponds with the aggregation. We could invest the effort to make the unit change dynamically with the time aggregation, but if we are going to invest additional effort here for this PR I'd prefer just go ahead and implement #166 (It's a really straightforward change, only notable part is updating unit tests).

I guess you're more bothered by the mismatch between the indicator names and what they have been doing (returning rates) whereas I'm more bothered by the mismatch that Sum causes between the units they say they're returning and the units they're actually returning.

I.e.

looks worse than

to me because the numbers+units label seems like it makes a stronger claim than the indicator name does, and that's the claim that's misleading in the Sum case.

But it doesn't matter provided we do #166.

I went ahead and did #166 here, so it should just read kg/m^2 or mm or in. I left precipitation rates alone for the raw daily indicator and in case we come across a situation where rate is the conceptually preferable way of representing precipitation.

KlaasH · 2016-12-20T13:32:13Z

Also, I tried to make the sequence indicators work for monthly aggregation, assigning streaks to the month in which they start. But I was not successful. I'm sure there's a way, but on the other hand I'm not sure monthly streaks are a particularly useful indicator anyway. Might be worth taking another run at it for seasonal.

rmartz · 2016-12-20T17:00:29Z

@KlaasH

Are the indicators working for you in the lab? I'm getting no graphs and errors like d3.js:7679 Error: attribute d: Expected number, "MNaN,74.366125287…". for all of them.

Yeah, I did some looking at the lab before filing this but it looks like the lab assumes that indicators have a 1-to-1 correlation with their time aggregation, which this breaks. I didn't see a trivial way to bring it up to line, so this PR would break the lab without a corresponding PR there to fix it :(

- Abstract monthly range logic to external class - Documentation enhancements

…ically

KlaasH

(Seeing if making this a "review comment" enables threading.)
Re the Lab breakage, the super-quick fix is to add || 'yearly' at https://github.com/azavea/climate-change-lab/blob/develop/src/app/services/chart.service.ts#L63

Not worth committing, probably. Hopefully azavea/climate-change-lab#119 won't be bad. But useful for seeing this in action.

rmartz · 2016-12-20T21:18:48Z

Looks like Github dislikes threaded comments 😬

Adding || 'yearly' didn't seem to take for me, was there anything else you needed to do?

Also, with the focus on documentation, I'm thinking of stripping the boilerplate averaged across all requested models from the indicators it appears on, since that hasn't been strictly true since we supported min/max reporting.

KlaasH · 2016-12-20T21:36:56Z

Oops, should have been more specific about where to add the hack. Changing that line to

let timeFormat = this.timeOptions[indicator.time_aggregation || 'yearly'];

worked for me.

👍 to removing the boilerplate documentation.

KlaasH

Precip sorted!
This all looks great.

KlaasH · 2016-12-20T22:07:27Z

django/climate_change_api/indicators/indicators.py

+    # Precipitation is stored per-second, and we want a total for all days in the aggregation,
+    # so we need to multiple each day's value by 86400 to get the total for that day and then
+    # sum the results
+    expression = F('pr') * SECONDS_PER_DAY


CloudNiner · 2016-12-21T20:34:30Z

django/climate_change_api/indicators/tests/test_indicators.py

+    indicator_name = 'total_precipitation'
+    time_aggregation = 'monthly'
+    units = 'kg/m^2'
+    test_indicator_rcp85_equals = {'2000-01': {'avg': 3024000, 'min': 2592000, 'max': 3456000}}


Why did the test condition for the TotalPrecipitation indicators (both yearly/monthly) change? Does it make sense that both aggregation levels have the same totals and responses?

The test condition changed because it was previously looking for kg/m^2/s, which isn't an option anymore... we're giving the total rainfall, which is the rainfall per day summed across the whole year, and since the raw data is in rainfall per second everything had to be scaled up by 84600.

And unfortunately upon inspection it does make sense that the monthly and yearly have the same values... every single one of our test data points is set for January 1st. We need a lot more test data to start to get diverging results.

Gotcha. Thanks for clarifying.

Refactor Time Aggregation

rmartz assigned CloudNiner and KlaasH Dec 19, 2016

rmartz requested review from CloudNiner and KlaasH December 19, 2016 18:01

CloudNiner mentioned this pull request Dec 19, 2016

Indicator Parameters #158

Merged

CloudNiner reviewed Dec 19, 2016

View reviewed changes

KlaasH reviewed Dec 20, 2016

View reviewed changes

rmartz mentioned this pull request Dec 20, 2016

Revisit Precipitation Units #166

Closed

Reed Martz added 9 commits December 20, 2016 14:21

Update unit tests to specify time aggregation

f3a29c1

Add time_aggregation and valid_aggregations to Indicator response

650d3fc

Add documentation for time_aggregation parameter

a046c74

Remove references to implied aggregation from indicator descriptions

3309c22

Tweak TotalPrecipitation to give better monthly values

c7f7e25

Remove references to year from descriptions

908e40e

PR review changes

78a6e7a

- Abstract monthly range logic to external class - Documentation enhancements

Abstract unit conversion tests so all combinations are tested automat…

8b27963

…ically

Refactor MonthRanges to simplify output format

fa6423e

rmartz force-pushed the feature/refactor-aggregations branch from 4758f76 to fa6423e Compare December 20, 2016 19:24

CloudNiner mentioned this pull request Dec 20, 2016

Update lab to handle new time_aggregation parameter azavea/climate-change-lab#119

Closed

Simplify YearlySequenceIndicator logic

df9faad

KlaasH reviewed Dec 20, 2016

View reviewed changes

Reed Martz added 2 commits December 20, 2016 16:42

Introduce static units for total precipitation

de9be08

Remove outdated boilerplate from indicator descriptions

61a2a0d

KlaasH approved these changes Dec 20, 2016

View reviewed changes

CloudNiner reviewed Dec 21, 2016

View reviewed changes

CloudNiner approved these changes Dec 21, 2016

View reviewed changes

rmartz merged commit 245c780 into develop Dec 21, 2016

rmartz deleted the feature/refactor-aggregations branch December 21, 2016 21:24

rmartz mentioned this pull request Dec 29, 2016

Remove raw daily indicators #199

Merged

rbreslow unassigned CloudNiner Feb 22, 2021

nanotubing unassigned KlaasH Apr 7, 2023

ddohler pushed a commit that referenced this pull request Aug 21, 2023

Merge pull request #162 from azavea/feature/refactor-aggregations

b37470b

Refactor Time Aggregation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Time Aggregation #162

Refactor Time Aggregation #162

rmartz commented Dec 19, 2016 •

edited

Loading

CloudNiner commented Dec 19, 2016

CloudNiner Dec 19, 2016

CloudNiner Dec 19, 2016

CloudNiner Dec 19, 2016

CloudNiner Dec 19, 2016

CloudNiner Dec 19, 2016

rmartz Dec 20, 2016

CloudNiner Dec 19, 2016

KlaasH left a comment

KlaasH Dec 20, 2016

rmartz Dec 20, 2016

KlaasH Dec 20, 2016

KlaasH Dec 20, 2016

KlaasH Dec 20, 2016

rmartz Dec 20, 2016 •

edited

Loading

KlaasH Dec 20, 2016

rmartz Dec 20, 2016 •

edited

Loading

KlaasH Dec 20, 2016

rmartz Dec 20, 2016

KlaasH commented Dec 20, 2016 •

edited

Loading

rmartz commented Dec 20, 2016

KlaasH left a comment

rmartz commented Dec 20, 2016

KlaasH commented Dec 20, 2016

KlaasH left a comment

KlaasH Dec 20, 2016

CloudNiner Dec 21, 2016

rmartz Dec 21, 2016

CloudNiner Dec 21, 2016

Refactor Time Aggregation #162

Refactor Time Aggregation #162

Conversation

rmartz commented Dec 19, 2016 • edited Loading

Overview

Notes

Testing Instructions

CloudNiner commented Dec 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KlaasH left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmartz Dec 20, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmartz Dec 20, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KlaasH commented Dec 20, 2016 • edited Loading

rmartz commented Dec 20, 2016

KlaasH left a comment

Choose a reason for hiding this comment

rmartz commented Dec 20, 2016

KlaasH commented Dec 20, 2016

KlaasH left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmartz commented Dec 19, 2016 •

edited

Loading

rmartz Dec 20, 2016 •

edited

Loading

rmartz Dec 20, 2016 •

edited

Loading

KlaasH commented Dec 20, 2016 •

edited

Loading