Add input checks for interpolator (and other?) classes #190

mnwhite · 2018-08-31T19:21:47Z

As is, the __init__ methods for the interpolator classes in HARK.interpolation do absolutely no input checking, which can lead to bizarre behavior and errors when illegal / incorrect inputs are passed. Input checks can be added fairly easily. These could include:

Type checking: Throw an error if something is not an array that should be, etc.
Size checking: Make sure that passed arrays have the right size relative to each other.
Other stuff: E.g. for LinearInterp make sure that xSearchFunc (etc) are correct if specified.

The text was updated successfully, but these errors were encountered:

pkofod · 2019-04-25T20:01:49Z

For any potential "sprinters", here's what the problem is:

Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from HARK.interpolation import LinearInterp
>>> x = [1,2,3]
>>> y = [3, 4]
>>> linterp = LinearInterp(x, y)
>>> linterp(1.1)
array(3.1)
>>> linterp(2.1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pkofod/anaconda3/lib/python3.6/site-packages/HARK/interpolation.py", line 55, in __call__
    return (self._evaluate(z.flatten())).reshape(z.shape)
  File "/home/pkofod/anaconda3/lib/python3.6/site-packages/HARK/interpolation.py", line 816, in _evaluate
    return self._evalOrDer(x,True,False)[0]
  File "/home/pkofod/anaconda3/lib/python3.6/site-packages/HARK/interpolation.py", line 777, in _evalOrDer
    y = (1.-alpha)*self.y_list[i-1] + alpha*self.y_list[i]
IndexError: index 2 is out of bounds for axis 1 with size 2

What happens is that the LinearInterp object is allowed to be constructed even if x and y are of different lengths. This causes hard to find errors, because a potential user may not realize why on earth linterp(1.1) works but linterp(2.1) doesn't. We should fail early and hard here.

Basically, to close this issue, someone should add a type check on the inputs and are the sizes correct. @mnwhite may want to chime in on the classes and methods where this might be relevant.

keithblaha · 2019-05-08T15:26:16Z

Type checking: Throw an error if something is not an array that should be, etc.

For this issue, you can actually use a type checker such as mypy (http://mypy-lang.org/) in order to validate typing. A type checker is just a program which can analyze your code and tell you if your declared types are incorrect. It is much nicer than runtime validation for a number of reasons:

performance: adding runtime validation makes your program slower, using a static analysis tool such as mypy instead allows you to pay that cost at "compile time" instead
readability: type annotations are much easier to read and understand than runtime validation code, allowing the function signature to convey the necessary information so that the actual body of the function doesn't need to be read to understand it
writeability: type annotations are much faster and easier to write, and less error-prone, than writing runtime validation code
workflow: using a type checker allows type errors to surface much more quickly than if you had to wait until you ran the code to find out that you passed the wrong type, allowing for a better workflow

The hitch here would be that python3.5 introduces native syntax for type annotations, and to be compatible with prior versions (e.g. python2) a clunky comment-based syntax can be used. I believe your plan is to drop support for python2 soon-ish, so it could be worth waiting until then and only supporting python3.5+ and then using the native syntax for it (https://www.python.org/dev/peps/pep-0484/).

A simple example to make this all more concrete, consider these two functions which do the same thing:

# runtime validation
def head(my_list):
    if not isinstance(my_list, list) or any(not isinstance(x, int) for x in my_list):
        raise TypeError('my_list must be a list of ints')
    return my_list[0] if len(my_list) else None

# type checker validation using native syntax
def head(my_list: List[int]) -> Optional[int]:
    return my_list[0] if len(my_list) else None

For larger numbers of parameters and more complex types, this can massively reduce the amount of code you need to write while also removing a large potential for bugs while simultaneously making it much easier for other people to understand how to use the code (when I read the bottom example, I only need to read the def line to understand the types of the parameters and what the returned type is). An Optional means that the value could be None, so I know that when I call head I may have to expect and handle that case.

Size checking: Make sure that passed arrays have the right size relative to each other.

There are a lot of approaches to this, but the simplest could be some assert statements at the top of the function, e.g.

def head(my_list: List[int]) -> int:
    assert len(my_list), 'my_list must not be empty'
    return my_list[0]

Instead of asserting you can also raise a more specific exception type. Decorators can also help with readability, it would be simple to implement a decorator such as non_empty to do the validation:

@non_empty('my_list')
def head(my_list: List[int]) -> int:
    return my_list[0]

Now we can promise that head returns int instead of Optional[int] because it will raise an exception instead of returning None when my_list is empty. Decorators could be made to validate all kinds of common cases in a utility library, e.g. @equal_len('list_a', 'list_b') and used throughout your codebase. This consistency can make it easy for people to know that when they call a non_empty-decorated function, they need to either validate what they are passing in or be prepared to handle a raised exception.

Sorry if this is an information overload- I'm happy to talk through any aspects in more detail if you're interested. There are a lot of tradeoffs to consider when looking at typing and validation, especially when it comes to a community project where you want to ease accessibility by reducing new concepts contributors would have to learn to be productive.

shaunagm added the PyCon sprints label Apr 25, 2019

shaunagm assigned pkofod Apr 25, 2019

MridulS mentioned this issue May 6, 2019

add length checks in interpolation #267

Merged

keithblaha mentioned this issue May 8, 2019

Add non_empty argument validator #282

Merged

shaunagm added Tag: Sprints and removed PyCon sprints labels Jul 3, 2019

MridulS self-assigned this Jul 30, 2020

MridulS closed this as completed Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add input checks for interpolator (and other?) classes #190

Add input checks for interpolator (and other?) classes #190

mnwhite commented Aug 31, 2018 •

edited

Loading

pkofod commented Apr 25, 2019

keithblaha commented May 8, 2019

Add input checks for interpolator (and other?) classes #190

Add input checks for interpolator (and other?) classes #190

Comments

mnwhite commented Aug 31, 2018 • edited Loading

pkofod commented Apr 25, 2019

keithblaha commented May 8, 2019

mnwhite commented Aug 31, 2018 •

edited

Loading