Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input checks for interpolator (and other?) classes #190

Closed
mnwhite opened this issue Aug 31, 2018 · 2 comments
Closed

Add input checks for interpolator (and other?) classes #190

mnwhite opened this issue Aug 31, 2018 · 2 comments
Assignees

Comments

@mnwhite
Copy link
Contributor

mnwhite commented Aug 31, 2018

As is, the __init__ methods for the interpolator classes in HARK.interpolation do absolutely no input checking, which can lead to bizarre behavior and errors when illegal / incorrect inputs are passed. Input checks can be added fairly easily. These could include:

  1. Type checking: Throw an error if something is not an array that should be, etc.

  2. Size checking: Make sure that passed arrays have the right size relative to each other.

  3. Other stuff: E.g. for LinearInterp make sure that xSearchFunc (etc) are correct if specified.

@pkofod
Copy link
Contributor

pkofod commented Apr 25, 2019

For any potential "sprinters", here's what the problem is:

Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from HARK.interpolation import LinearInterp
>>> x = [1,2,3]
>>> y = [3, 4]
>>> linterp = LinearInterp(x, y)
>>> linterp(1.1)
array(3.1)
>>> linterp(2.1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pkofod/anaconda3/lib/python3.6/site-packages/HARK/interpolation.py", line 55, in __call__
    return (self._evaluate(z.flatten())).reshape(z.shape)
  File "/home/pkofod/anaconda3/lib/python3.6/site-packages/HARK/interpolation.py", line 816, in _evaluate
    return self._evalOrDer(x,True,False)[0]
  File "/home/pkofod/anaconda3/lib/python3.6/site-packages/HARK/interpolation.py", line 777, in _evalOrDer
    y = (1.-alpha)*self.y_list[i-1] + alpha*self.y_list[i]
IndexError: index 2 is out of bounds for axis 1 with size 2 

What happens is that the LinearInterp object is allowed to be constructed even if x and y are of different lengths. This causes hard to find errors, because a potential user may not realize why on earth linterp(1.1) works but linterp(2.1) doesn't. We should fail early and hard here.

Basically, to close this issue, someone should add a type check on the inputs and are the sizes correct. @mnwhite may want to chime in on the classes and methods where this might be relevant.

@keithblaha
Copy link

Type checking: Throw an error if something is not an array that should be, etc.

For this issue, you can actually use a type checker such as mypy (http://mypy-lang.org/) in order to validate typing. A type checker is just a program which can analyze your code and tell you if your declared types are incorrect. It is much nicer than runtime validation for a number of reasons:

  • performance: adding runtime validation makes your program slower, using a static analysis tool such as mypy instead allows you to pay that cost at "compile time" instead
  • readability: type annotations are much easier to read and understand than runtime validation code, allowing the function signature to convey the necessary information so that the actual body of the function doesn't need to be read to understand it
  • writeability: type annotations are much faster and easier to write, and less error-prone, than writing runtime validation code
  • workflow: using a type checker allows type errors to surface much more quickly than if you had to wait until you ran the code to find out that you passed the wrong type, allowing for a better workflow

The hitch here would be that python3.5 introduces native syntax for type annotations, and to be compatible with prior versions (e.g. python2) a clunky comment-based syntax can be used. I believe your plan is to drop support for python2 soon-ish, so it could be worth waiting until then and only supporting python3.5+ and then using the native syntax for it (https://www.python.org/dev/peps/pep-0484/).

A simple example to make this all more concrete, consider these two functions which do the same thing:

# runtime validation
def head(my_list):
    if not isinstance(my_list, list) or any(not isinstance(x, int) for x in my_list):
        raise TypeError('my_list must be a list of ints')
    return my_list[0] if len(my_list) else None

# type checker validation using native syntax
def head(my_list: List[int]) -> Optional[int]:
    return my_list[0] if len(my_list) else None

For larger numbers of parameters and more complex types, this can massively reduce the amount of code you need to write while also removing a large potential for bugs while simultaneously making it much easier for other people to understand how to use the code (when I read the bottom example, I only need to read the def line to understand the types of the parameters and what the returned type is). An Optional means that the value could be None, so I know that when I call head I may have to expect and handle that case.

Size checking: Make sure that passed arrays have the right size relative to each other.

There are a lot of approaches to this, but the simplest could be some assert statements at the top of the function, e.g.

def head(my_list: List[int]) -> int:
    assert len(my_list), 'my_list must not be empty'
    return my_list[0]

Instead of asserting you can also raise a more specific exception type. Decorators can also help with readability, it would be simple to implement a decorator such as non_empty to do the validation:

@non_empty('my_list')
def head(my_list: List[int]) -> int:
    return my_list[0]

Now we can promise that head returns int instead of Optional[int] because it will raise an exception instead of returning None when my_list is empty. Decorators could be made to validate all kinds of common cases in a utility library, e.g. @equal_len('list_a', 'list_b') and used throughout your codebase. This consistency can make it easy for people to know that when they call a non_empty-decorated function, they need to either validate what they are passing in or be prepared to handle a raised exception.

Sorry if this is an information overload- I'm happy to talk through any aspects in more detail if you're interested. There are a lot of tradeoffs to consider when looking at typing and validation, especially when it comes to a community project where you want to ease accessibility by reducing new concepts contributors would have to learn to be productive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants