TestingScientificCodes

How to Test Scientific/Numerical Codes

Advice for scientists who want to write automated tests.

Motivation

While automated testing has exploded in the software community, it hasn't really taken off in science. While it may be perceived as a "nice to have", it really is a "must have". Software has a tendency to stick around, and without tests, it gets more and more brittle to the point where only a few people know how to maintain and extend it. This is not a path to scientific advancement.

Testing is an empirical science in its simplest form. Each test case proposes a hypothesis, a method for validating the hypthosis, and an expected result. The difference with the physical sciences is that when a hypthosesis is disconfirmed, you have two paths: fix the code or the test. Scientific codes, unlike our universe, do not have fixed laws. All parts of software are fungible.

This last point is crucial. When laws change on both sides of the hypthothesis, you need a process in addition to the scientific method to ensure codes remain stable. The first and most important one is continuous integration and validation so that disconfirmations are caught and repaired before the code becomes invalid.

Another major reason to write automated tests is to leverage computers at what they are good at: known, repetitive tasks. Each software change can typically be validated in seconds without any attention, and if done right, without your involvement: source repositories notify test runners, which in turn notify you of disconfirmations.

Finally, automated tests enforce good design. This means you have to spend less time coding. Good design makes code easier to understand (tests are executable documenation) and to reuse (design for testability). This makes it easier to share libraries between scientists. Less is more.

Start Small

We have built on pytest to eliminate much of the boilerplate in testing. This allows you to write simple tests. For example, let's use the pkexample.EMA module, which computes an exponential moving average. Here's a simple test case:

def test_ema():
    e = pkexample.EMA(4)
    assert 5.0 == e.compute(5.0), \
        'First values sets initial average'

You initialize the EMA with a length of 4 and set its initial value with a call to compute.

With pytest you can use ordinary assert, and it will put the

You run the test with:

pykern test tests/pkexample_test.py

You could also run this particular test with py.test pkmath_test.py. You'll want to get used to using pykern pytest, because it sets up the configuration environment properly, which you'll need for more complex tests. Note also that if you are using Bivio's emacs macros, you can run the test with \C-c \C-m.)

The output looks like:

py.test pkmath_test.py
================================= test session starts =================================
platform linux2 -- Python 2.7.10, pytest-2.9.0, py-1.4.31, pluggy-0.3.1
rootdir: /home/vagrant/src/radiasoft/pykern, inifile:
plugins: xdist-1.13.1
collected 1 items

pkmath_test.py .

============================== 1 passed in 0.00 seconds ===============================

The dot (.) after pkmath_test.py tells you it ran one test case, which equates to one function call (test_ema in this case).

Assert Constraints

You can't have an EMA with a negative length so the EMA class constrains length to be greater than 0 so let's test that:

    with pytest.raises(AssertionError):
        pkmath.EMA(0)

This is good enough. You don't need to test a bunch of different numbers. It's the boundary condition (0) that's important.

Limit cases

One of the things that happens with tests is that you add cases as you run into problems. In this case, you are unlikely to run into a problem with negative numbers being passed to EMA and/or the condition defining the constraint being incorrect. If you do, you'll add a case at that time, not before.

We'll add a test case to make sure the formula is right, since this is the meat of the module:

    assert 7.0 == e.compute(10.0), \
        'Adding a value to the series changes the average'

There are many other cases we can think of, e.g. passing an int, None, testing self.average gets updated, or calling compute multiple times. Again, there's almost no point in doing this. Yes, it would improve the quality of the test, but there's a cost: brittleness.

Going back to the concept a software universe, we want to avoid rigidity in the tests as well as the code. The more test cases you have, the harder it is to change the code. In the case of an EMA, we are talking about a fixed mathematical concept, but that's not the case with a majority of the code you'll be writing. Fixed mathematical concepts are probably already coded so you don't have to rewrite them anyway.

Advancing the Art

We should think of ways to simplify testing. For example, in RadTrack, AnalyticCalc1_test uses YAML to define expected inputs and outputs, e.g.

CriticalEnergyWiggler:
  -
    Input:
      Bx: 0.0
      By: 0.865
      Gam: 5870.925
    Expect: 4.478e+03

There's a simple parser in AnalyticCalc1_test.py, which reads this file and calls functions in the order they are defined in the file. The results are compared with a fudge factor (1e-300) so we can avoid digital arithmetic anomalies.

We can consider including this approach in PyKern if we want. Or maybe there's a better way. By writing more tests, we'll see what it is that we need to simplify testing, and refactor the testing infrastructure to make it even easier to write tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly