Skip to content
This repository has been archived by the owner on Aug 28, 2023. It is now read-only.

Refactor Time Aggregation #162

Merged
merged 13 commits into from
Dec 21, 2016
Merged

Refactor Time Aggregation #162

merged 13 commits into from
Dec 21, 2016

Conversation

rmartz
Copy link
Contributor

@rmartz rmartz commented Dec 19, 2016

Overview

Refactors time aggregation to be determined by configuration and parameter instead of defined as part of the indicator. This reduces the need for redundant yearly or monthly aggregated indicators and means additional aggregations like quarterly can be added without needing to redefine all existent indicators.

Notes

This appears to break the Climate Change Lab by breaking the assumption that indicators have a 1-to-1 correlation with their time aggregation. Now that aggregation is a configurable parameter the lab may need some refactoring to allow for indicators to either be monthly or yearly or otherwise.

Because of the use of raw SQL, indicators that measure consecutive days that fit a criteria are yearly only:

  • YearlyMaxConsecutiveDryDays
  • YearlyDrySpells
  • HeatWaveDurationIndex
    These indicators were yearly only previously, so this limitation should be acceptable for the time being.

In order to make TotalPrecipitation work with configurable time aggregation, it's been changed to report in in/day and to report the sum of daily precipitation instead of the average. This should be equivalent (The average being the sum divided by number of values, and in this case we divided the unit reported by number of days in a year), and promotes an opportunity to simplify our handling of precipitation by removing the /day suffix and introducing linear distance measurements for precipitation, with "per aggregation period" implied.

With this simplification, we may be able to consolidate indicators further by introducing configurable value aggregation for HighTemperature, LowTemperature, and Precipitation indicators, and making them have all available time aggregations allowed.

Testing Instructions

  • All unit tests should still pass
  • Run API queries for any indicator, with any combination of parameters. They should still return correct data.

Closes #149 and #166

Time aggregation is now a parameter passed in during Indicator creation,
based on a list of allowed indicator aggregations. Some indicators only
support yearly or daily aggregation, but most support yearly or monthly.

This dramatically cuts down on the number of individual indicators, but
is a huge shift in the Indicator design. The final implementation is
likely to shift as we adjust for actual usage and try to incorporate
MaxConsecutiveDaysIndicator to being aggregation-agnostic.
@CloudNiner
Copy link
Contributor

Code looks good at a high level.

I'd give the entire thing another once over to verify all docstrings and descriptive text are updated. I left comments in a few places but I almost certainly missed more.

I don't think the IndicatorParams rebase for this is going to be terrible. Just a lot of renaming/reshuffling.

@@ -30,7 +46,9 @@ class Indicator(object):
available_units = (None,)
parameters = None

def __init__(self, city, scenario, models=None, years=None,
monthly_range_config = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring on what this means would be helpful.



class YearlyCountIndicator(YearlyAggregationIndicator):
class CountIndicator(Indicator):
""" Class to count days on which a condition is met.

Essentially a specialized version of the YearlyAggregationIndicator where all values count as 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update docstring

'(Default 1st) percentile of observations from 1960 to 1995')
class ExtremeColdEvents(CountUnitsMixin, CountIndicator):
label = 'Extreme Cold Events'
description = ('Total number of times per year daily minimum temperature is below the specified'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of many description strings that still refers to the time aggregation in the description. We should probably reword these description strings to remove those references, or update them.

Something like: "Total number of times per aggregation period that the daily minimum temperature is below the specified percentile..."?

filters = None
conditions = None
agg_function = F
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline docs for all of these new parameters would be helpful.


Caches the resulting range config as a class attribute.
"""
if cls.monthly_range_config is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like any part of generating the monthly range config relies on the Indicator class object. It might make sense to move this to some other utility class so that it only needs to be created once, rather than once for each indicator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good thought, should help keep distinct concepts encapsulated in their own area as well

data = indicator.calculate()
self.assertEqual(data, self.test_models_filter_equals)

def test_unit_conversion_definitions(self):
""" Some sanity checks for unit conversion class attributes """
self.assertIn(self.indicator_class.default_units, self.indicator_class.available_units)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@KlaasH KlaasH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the indicators working for you in the lab? I'm getting no graphs and errors like d3.js:7679 Error: <path> attribute d: Expected number, "MNaN,74.366125287…". for all of them.

@@ -77,6 +81,10 @@ class YearlyDrySpells(CountUnitsMixin, YearlySequenceIndicator):
variables = ('pr',)
raw_condition = 'pr = 0'

def row_group_key(self, row):
""" Key function for groupby to use to break input stream into chunks."""
return (row['data_source__model'], row['data_source__year'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simplified away if

  • The version in YearlyMaxConsecutiveDaysIndicator were moved up into YearlySequenceIndicator
  • aggregate below were revised to match the one in YearlyMaxConsecutiveDaysIndicator except for the calculation of the value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I follow what we're going for... if we're moving row_group_key to YearlySequenceIndicator, what is being simplified away?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now YearlySequenceIndicator has two direct children (YearlyDrySpells and YearlyMaxConsecutiveDaysIndicator), both of which have a row_group_key method. But with some changes to YearlyDrySpells.aggregate to make it more like YearlyMaxConsecutiveDaysIndicator.aggregate (which would be good regardless to make the similarities and differences show up more clearly), row_group_key could be defined only once, on YearlySequenceIndicator.

MonthlyAggregationIndicator, MonthlyCountIndicator,
DailyRawIndicator, BasetempIndicatorMixin)
from .abstract_indicators import (Indicator, CountIndicator, BasetempIndicatorMixin,
YearlyMaxConsecutiveDaysIndicator, YearlySequenceIndicator)
from .unit_converters import (TemperatureUnitsMixin, PrecipUnitsMixin, DaysUnitsMixin,
CountUnitsMixin, TemperatureDeltaUnitsMixin)


##########################
# Yearly indicators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second the call for a general inline documentation review.

agg_function = Avg
default_units = 'in/year'
agg_function = Sum
default_units = 'in/day'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Avg correct? I may be failing once again to reason clearly about this particular indicator, but summing rates across different timescales and treating them as in/day regardless seems wrong.

Also this is now overriding default_units with the same value that's set in the mixin.

Copy link
Contributor Author

@rmartz rmartz Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #166 in part for this, but the issue is that TotalPrecipitation shouldn't be returning a rate, it should be returning a total. Using units to give an average over a timespan that matches the aggregation means that the numeric results match the desired value, but the quantity they represent is wrong... 1 in/month is the same quantity as 12 in/year, so a year of constant 1 in/month rainfall should be represented as 144 in/year.

in/day is a closer approximation that gives workable values for both monthly and yearly aggregation, under the idea that it's the sum of precipitation "per day in the aggregation". Ideally we'd resolve this by reporting our precipitation total indicators using simple mass per area or linear distance, but I didn't want to unilaterally make that distinction for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Pending the refactor in #166, though, with Avg you get an answer that's technically correct, if confusing and not intuitive, and you can override the units to match your aggregation period if you want a number you can think of as just "inches".

Copy link
Contributor Author

@rmartz rmartz Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Technically correct" is a bit tricky since the thing that we're trying to do is fundamentally flawed. One option gives correct numerical results with a misleading unit, another gives quantitatively misleading results with a "correct" unit.

I don't think Avg is workable for this PR, though, since it only works if the unit corresponds with the aggregation. We could invest the effort to make the unit change dynamically with the time aggregation, but if we are going to invest additional effort here for this PR I'd prefer just go ahead and implement #166 (It's a really straightforward change, only notable part is updating unit tests).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you're more bothered by the mismatch between the indicator names and what they have been doing (returning rates) whereas I'm more bothered by the mismatch that Sum causes between the units they say they're returning and the units they're actually returning.

I.e.
super_rainy
looks worse than
not_so_rainy
to me because the numbers+units label seems like it makes a stronger claim than the indicator name does, and that's the claim that's misleading in the Sum case.

But it doesn't matter provided we do #166.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and did #166 here, so it should just read kg/m^2 or mm or in. I left precipitation rates alone for the raw daily indicator and in case we come across a situation where rate is the conceptually preferable way of representing precipitation.

@KlaasH
Copy link
Contributor

KlaasH commented Dec 20, 2016

Also, I tried to make the sequence indicators work for monthly aggregation, assigning streaks to the month in which they start. But I was not successful. I'm sure there's a way, but on the other hand I'm not sure monthly streaks are a particularly useful indicator anyway. Might be worth taking another run at it for seasonal.

@rmartz
Copy link
Contributor Author

rmartz commented Dec 20, 2016

@KlaasH

Are the indicators working for you in the lab? I'm getting no graphs and errors like d3.js:7679 Error: attribute d: Expected number, "MNaN,74.366125287…". for all of them.

Yeah, I did some looking at the lab before filing this but it looks like the lab assumes that indicators have a 1-to-1 correlation with their time aggregation, which this breaks. I didn't see a trivial way to bring it up to line, so this PR would break the lab without a corresponding PR there to fix it :(

Copy link
Contributor

@KlaasH KlaasH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Seeing if making this a "review comment" enables threading.)
Re the Lab breakage, the super-quick fix is to add || 'yearly' at https://github.com/azavea/climate-change-lab/blob/develop/src/app/services/chart.service.ts#L63

Not worth committing, probably. Hopefully azavea/climate-change-lab#119 won't be bad. But useful for seeing this in action.

@rmartz
Copy link
Contributor Author

rmartz commented Dec 20, 2016

Looks like Github dislikes threaded comments 😬

Adding || 'yearly' didn't seem to take for me, was there anything else you needed to do?

Also, with the focus on documentation, I'm thinking of stripping the boilerplate averaged across all requested models from the indicators it appears on, since that hasn't been strictly true since we supported min/max reporting.

@KlaasH
Copy link
Contributor

KlaasH commented Dec 20, 2016

Oops, should have been more specific about where to add the hack. Changing that line to

let timeFormat = this.timeOptions[indicator.time_aggregation || 'yearly'];

worked for me.

👍 to removing the boilerplate documentation.

Copy link
Contributor

@KlaasH KlaasH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precip sorted!
This all looks great.

# Precipitation is stored per-second, and we want a total for all days in the aggregation,
# so we need to multiple each day's value by 86400 to get the total for that day and then
# sum the results
expression = F('pr') * SECONDS_PER_DAY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
about_the_right_amount_of_rain

indicator_name = 'total_precipitation'
time_aggregation = 'monthly'
units = 'kg/m^2'
test_indicator_rcp85_equals = {'2000-01': {'avg': 3024000, 'min': 2592000, 'max': 3456000}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the test condition for the TotalPrecipitation indicators (both yearly/monthly) change? Does it make sense that both aggregation levels have the same totals and responses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test condition changed because it was previously looking for kg/m^2/s, which isn't an option anymore... we're giving the total rainfall, which is the rainfall per day summed across the whole year, and since the raw data is in rainfall per second everything had to be scaled up by 84600.

And unfortunately upon inspection it does make sense that the monthly and yearly have the same values... every single one of our test data points is set for January 1st. We need a lot more test data to start to get diverging results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for clarifying.

@rmartz rmartz merged commit 245c780 into develop Dec 21, 2016
@rmartz rmartz deleted the feature/refactor-aggregations branch December 21, 2016 21:24
ddohler pushed a commit that referenced this pull request Aug 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove redundancy between yearly and monthly indicators
3 participants