-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
b292f28
to
cd28b65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Here are a few notes from looking at the code. I'm having VM issues again, so I'll try it out once I get them resolved.
for param in self.PARAMS: | ||
value = self._params.get(param, None) | ||
validation_fn = getattr(self, 'validate_' + param) | ||
clean_value = validation_fn(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be friendlier to add errors to a list and raise at the end so that we don't keep some errors secret until they've cleared the previous one. On the other hand sometimes all the errors is more than you need, but if we don't expect too many interdependent errors then all at once is probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They way you suggest is generally better I think, as it presents all errors to the user at once. It's also the way DRF does it with their validators, so there would be some consistency there.
If we stick with the bulk of what's in this PR already, I'll switch over.
def _check_is_string(self, param, value): | ||
""" Private helper method to raise ValidationError if value is not a string """ | ||
if not isinstance(value, basestring): | ||
raise ValidationError('{} must be a string comma separated list'.format(param)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing error message. It can just be a single value, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll clean this up a bit.
class PercentileIndicatorParams(IndicatorParams): | ||
""" Extend IndicatorParams with a non-optional 'percentile' parameter """ | ||
|
||
PARAMS = IndicatorParams.PARAMS + ('percentile',) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it's simple, a method would probably be better. Could be a class method. If we ever end up with a multi-level class hierarchy, having it be
@classmethod
def params(cls):
return super(ThisClassName, cls).params + ('percentile',)
would make that smoother (because no parent class name hard-coded outside class declaration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tested this yet, but left a couple thoughts on different options to consider. If we're good with this way then I'll boot up my CC API box and give it a good run through.
def validate_years(self, value): | ||
return self._check_is_string('years', value) if value is not None else None | ||
|
||
def validate_agg(self, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These feel kind of boilerplate-y... since we're using self inspection to detect these functions, we might be able to refactor the parameters themselves into independent classes that handle assignment, validation and default values invisibly internally (Like how fields in a Django model work).
except ValueError: | ||
raise ValidationError('percentile param must be an integer') | ||
if int_value < 1 or int_value > 99: | ||
raise ValidationError('percentile param must be within the range (0-100)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can be inclusive in the range, for instance a 100th percentile limit might mean filter for values that are greater than any value ever seen previously, which could be a sensible thing to ask for in certain situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, makes sense. Same for the 0th percentile.
@@ -0,0 +1,95 @@ | |||
|
|||
class ValidationError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Django already provides its own ValidationError
exception, might be better to rename this to avoid confusion or switch to using theirs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left this separate so we could specifically distinguish a params.ValidationError
. Maybe a different name is appropriate?
return value | ||
|
||
|
||
class PercentileIndicatorParams(IndicatorParams): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of on the fence about having too many one-stop-shop classes that define specific combinations of features internally as a black box. Object inheritance is pretty handy, but for things like this I tend to feel that giving the end user the ability to choose which tools they want is more friendly than giving them a pre-defined box that has the tools we think they'll want baked in.
The alternative would be to build parameter detection and handling in an abstract indicator (sub?)class and requiring the indicators define which additionally parameters they want individually. It's a tradeoff between flexibility and risking repeated configuration, but in this case I think parameters have the chance to be sufficiently specific to their individual uses that the extra flexibility could be worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not super sold on this strategy here, although when I first looked at doing what you describe in your comment above (param classes like the Django field classes), I ran into some trouble with the specifics that would potentially make them unwieldy.
I'll play around with this a bit more this am to see if I can come up with something better.
df44184
to
44b97c6
Compare
Updated with a new commit that does some refactoring which should better align with the suggestions made. Some additional context from the new commit's log:
There's still the weirdness that some IndicatorParam instances may need indicator specific context. Worked around this for now by adding the Indicator class method that essentially acts as an IndicatorParams factory, and could be overridden by subclasses. IndicatorParams also retains the non-optional init arguments. All ears on a better way to get around this. Side effect of this refactor is that IndicatorParam instances can be self-documenting, so I added returning the IndicatorParam help text to the API response on 400 errors. Seems good. Note that I forced pushed to keep the refactor to a single commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new setup looks good. A hierarchy of IndicatorParams
classes doesn't seem so onerous when there's zero boilerplate required in the subclasses.
|
||
""" | ||
value = value if value is not None else self.default | ||
if value is None and self.default is None and self.required: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking self.default is None
is redundant given the previous line, right?
Should not validate the IndicatorParams | ||
|
||
""" | ||
return cls.params_class(cls.default_units, cls.available_units) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about passing cls
and reading the properties off of it in IndicatorParams.__init__
? It seems squirrelly, but this mandatory-positional-arguments thing is not pretty either. I feel like passing the class doesn't change the essentials of what's happening but removes one of the moving parts and would make adding further context-sensitive parameters easier. I also am not seeing a way to do this statically,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmartz offered the same suggestion. The problem is that then two things happen. One, it makes the relatively simple thing you're trying to decouple still depend on the complex thing, which makes test setup and instantiation far more complicated, and makes the simple thing (IndicatorParams) have a lot of context about things it doesn't care about. Two, it's very dangerously close to a circular dependency because the Indicator creates an instance of IndicatorParams, and at the same time IndicatorParams requires Indicator to initialize.
The design as it stands now is very very close to the Django Model+ModelField/DRF Serializer+SerializerField designs which are all doing the same core things. In the Serializer design, the solution is for the parent class (IndicatorParams) in our case, to take an open-ended 'context' dict. I could explore that, rather than the required positional parameters route.
I left it as two required positional parameters for now because those two params are required for proper functioning of the IndicatorParams so it makes sense to not hide them in a context dict, when in reality you'd always have to provide them. Maybe the solution here is to add context=None
to the function signature and docstring now, even though its not used, to guide future changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to look at the units from the other direction? Perhaps rather than have the units parameter read state from the indicator about what units it should allow, we create a small set of parallel fixed-scope parameters for each type... like TemperatureUnitsParameter
, PrecipitationUnitsParameter
, and maybe others as needed, and the indicator use those to define what its units are and what converter it should create?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, makes sense. context
sounds good as a way to make it more flexible, but maybe it's better to stop being overprotective of future-us and let them decide whether to add that when it's needed or do something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmartz potentially. It would require more subclassing of the IndicatorParams by default than there is now, as I think there would need to be an IndicatorParams subclass for each unit group. I'm inclined to hold off for now and revisit later if it becomes more of an issue.
2313222
to
1bec444
Compare
Updated with a rebase against develop, which brings in the DegreeDay indicators. At this point will probably wait to merge this until #162 is merged as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's too much scope creep for this PR, but do we want to add defined parameters to the indicator serialization output? It would be helpful for the lab to know which indicators support which parameters, but I can totally see that being something we save for another task later on.
|
||
BASETEMP_PARAM_DOCSTRING = "The base temperature used to calculate the daily difference for degree days summations. See 'basetemp_units' for a discussion of the units this value uses." | ||
|
||
BASETEMP_UNITS_PARAM_DOCSTRING = "Units for the value of the 'basetemp' parameter. If not provided, the API assumes the basetemp units match whatever is provided to the 'units' parameter." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage for having these set out at the top of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None in particular, I could define them on whichever IndicatorParams class uses them instead. Didn't want them to be inline strings in the description field (where they're used) because it looks odd and was super unwieldy.
""" Extend IndicatorParams with a non-optional 'percentile' parameter """ | ||
percentile = IndicatorParam('percentile', | ||
description=PERCENTILE_PARAM_DOCSTRING, | ||
required=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not want this to be optional? It used to be hardcoded to 99th percentile before #146, makes sense to me to let it default to that if not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I don't think that was entirely intentional. I can swap these indicators back to providing a default for this parameter.
It also looks like I made the DegreeDays basetemp param required as well. That should definitely default to 65F, so set the default there to basetemp=65&basetemp_units=F
re: Adding the parameters to the Indicator serialization...I will definitely add that. It should be trivial to do since the parameter serialization methods already exist. |
1bec444
to
a5339a3
Compare
Hey, its me again. Updated once more with a rebase against develop to include the time aggregation parameter. Also updated:
Tests should continue to pass, this could use one more look before merging. The commit history is kind of a mess due to the many rebases, so I'll probably rebase this down to one commit with a detailed log message before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A couple minor comments, only thing that really stands out to me is telling Swagger about basetemp_units
, if we get that added then this would be perfect 👌
|
||
BASETEMP_PARAM_DOCSTRING = "The base temperature used to calculate the daily difference for degree days summations. Defaults to 65. See the 'basetemp_units' for a discussion of the units this value uses." | ||
|
||
BASETEMP_UNITS_PARAM_DOCSTRING = "Units for the value of the 'basetemp' parameter. Defaults to 'F'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swagger doesn't know about this guy, he still thinks basetemp
should include the unit as part of the string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is fallout from now defining our documentation in two places. Hopefully we'll be able to consolidate this once we start building out the documentation stuff I'm working on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the docstring in views.py regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I figured this was a case of things being caught in the middle but thanks for tossing it there for now :)
param_class.validate(value) | ||
|
||
def to_dict(self): | ||
return [c.to_dict() for c in self._get_params_classes()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make this a dictionary we could replace a few higher-level fields like default_units
and let them be accessed by [..].parameters.units.default
, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, potentially. Not sure if that would be more trouble than its worth.
Opened #173 to address this down the road.
|
||
|
||
class Percentile5IndicatorParams(IndicatorParams): | ||
""" Extend IndicatorParams with a non-optional 'percentile' parameter """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still says non-optional here
|
||
|
||
class Percentile95IndicatorParams(IndicatorParams): | ||
""" Extend IndicatorParams with a non-optional 'percentile' parameter """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto to non-optional
percentile = IndicatorParam('percentile', | ||
description=PERCENTILE_5_PARAM_DOCSTRING, | ||
required=False, | ||
default=95, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PERCENTILE_5_PARAM_DOCSTRING
says "Defaults to 5"
if the IndicatorParam in question has no run-time dependencies. | ||
|
||
""" | ||
models = IndicatorParam('models', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to give this and the years
parameter default values? The description says "If not provided, defaults to all models.", but the JSON serialization says default=null
which feels a little confusing. Maybe have the serializer omit the default
parameter if it's unset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that works for now, I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #171 to address this further.
@@ -524,18 +536,19 @@ class MonthlyCoolingDegreeDaysTestCase(IndicatorTests, TestCase): | |||
indicator_class = indicators.CoolingDegreeDays | |||
indicator_name = 'cooling_degree_days' | |||
time_aggregation = 'monthly' | |||
parameters = { | |||
units = 'C' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on this 👍
self.time_aggregation = IndicatorParam('time_aggregation', | ||
description=TIME_AGGREGATION_PARAM_DOCSTRING, | ||
required=False, | ||
default='yearly', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to 'yearly' is a problem for the raw data indicators, since they don't support yearly aggregation. There is a argument to just move the daily aggregation into their yearly/monthly counterparts and get rid of them, though, so it may end up not being worth being concerned about here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. What should we default to then?
Maybe we should consider removing the daily 'raw data' indicators, since they don't provide additional value over the existing raw data endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm down for getting rid of them. I've played with the idea of changing AverageHighTemperature
, MaxHighTemperature
, etc, to all be HighTemperature
, LowTemperature
, Precipitation
, and the average, max, min, etc, would all be defined by parameter. That would cut down on the number of indicators we need to maintain even further and would give a good use for the daily aggregation if we want to keep it.
Edit We could plan to do that as part of #164 if wanted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Sounds good. Opened #172 to address removing the daily indicators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is looking very nice.
A few comments and questions, some overlapping with what Reed just said. The one bug I'm seeing is that providing an agg
parameter doesn't work (it starts returning empty objects for all data points).
|
||
TIME_AGGREGATION_PARAM_DOCSTRING = "Time granularity to group data by for result structure. Valid aggregations depend on indicator. Can be 'yearly', 'monthly' or 'daily'. Defaults to 'yearly'." | ||
|
||
UNITS_PARAM_DOCSTRING = "Units in which to return the data. Defaults to Imperial units (Fahrenheit for temperature indicators and inches per day for precipitation)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now just "inches" for precip. Or inches per aggregation period, but it's probably clearer not to write that...
|
||
|
||
class ExtremeHeatEvents(CountUnitsMixin, CountIndicator): | ||
label = 'Extreme Heat Events' | ||
description = ('Total number of times per period daily maximum temperature exceeds the ' | ||
'specified (Default 99th) percentile of observations from 1960 to 1995') | ||
'specified percentile of observations from 1960 to 1995') | ||
params_class = Percentile95IndicatorParams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a discussion around changing the definition of "extreme" from 99th to 95th? Is that more standard? Intuitively, conditions expected on roughly 18 days per year don't seem to rise to the level of "extreme".
For one reference point, Philly's numbers (in F and inches/day):
tasmax 95th: 89.1, tasmax 99th: 93.7
pr 95th: 0.72, pr 99th: 1.36
tasmin 5th: 18.3, tasmin 1st: 9.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. No. Lets just leave it at 1/99 for now. Will update.
if the IndicatorParam in question has no run-time dependencies. | ||
|
||
""" | ||
models = IndicatorParam('models', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These base params ('models', 'years', 'agg') do have default values (all, all, 'min, max, avg' respectively), and they say so in the documentation in the view and in their "description" strings, but now that the parameters are generating their own documentation and there's a default
property in the objects they return, that seems like the most natural place for people to look, and it's currently not reflecting the actual defaults.
Not sure how much bother would be needed to make them show up and how much we're willing to accept, but it would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a bit more effort to have years/models handle the special case of an 'all' option. I'll probably leave this as is for now, using Reed's suggestion that we don't display default
if its not set. I attempted to set the default for agg to be 'min,max,avg' and things broke. I expect that is related to the issue you noted below.
Can add issue to better handle the years/models param. Wasn't a well exposed issue until now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #171 to address this further.
description=YEARS_PARAM_DOCSTRING, | ||
required=False, | ||
validators=None) | ||
agg = IndicatorParam('agg', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, when I try to override agg
I get a whole lot of nothing. years
and models
seem to work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll take a look. I'd expect a test to be failing here, but that doesn't appear to be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#158 (comment) TODO: Fix agg param so we can give it a static default of 'min,max,avg' to match the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks great!
Think that should do it for the recent round of comments. Opened issues for related items that don't fit as well in the scope of this (already large) PR. |
439ca70
to
f5da68a
Compare
Overview: Breaks out the definition, documentation and validation of Indicator parameters to a separate class. Like other parts of Indicator, the params_class is provided to each indicator and used to validate them. Any class which needs a different set of IndicatorParams is free to override with its own definition. IndicatorParams contains class or instance defined IndicatorParam classes, which encapsulate definition and validation logic for each parameter. IndicatorParam is self-contained and self-documenting and provides extensible validation via an API similar to the one used in DRF via the serializer classes. Because IndicatorParams are self-documenting, their definition is returned as part of the API response at `/api/indicator/` See params.py for more documentation about IndicatorParams specifics IndicatorParams notes: Updates the agg param to have a self-documented default. The agg param has no e2e test, opened #174 to address this separately. PercentileIndicatorParams notes: This extra parameter remains unchanged. DegreeDayIndicatorParams notes: Handles the DegreeDay indicators via extra 'basetemp' and 'basetemp_units' parameters. Replaces the singular param `basetemp=<value>[K|C|F]` with the two parameters above so that each parameter is only responsible for one thing, which hopefully reduces confusion for users. Fixed a bug in the MonthlyCoolingDegreeDaysIndicatorTestCase where the test case was not performing the correct conversion from basetemp_units.
f5da68a
to
39931f2
Compare
Overview
Breaks out the definition, documentation and validation of Indicator parameters to a separate class. Like other parts of Indicator, the params_class is provided to each indicator and used to validate them. Any class which needs a different set of IndicatorParams is free to override with its own definition.
See params.py for more documentation about IndicatorParams specifics
Demo
None. The application should continue to function as before.
Notes
The 'percentile' parameter for the extreme events indicators is no longer optional, now that the API throws clear errors if its not provided.
I'm sure there are other things I didn't consider. Happy to consider alternative approaches.
Testing Instructions
Closes #148