TestingScientificCodes

How to Test Scientific/Numerical Codes

Advice for scientists who want to write automated tests.

Motivation

While automated testing has exploded in the software community, it hasn't really taken off in science. While it may be perceived as a "nice to have", it really is a "must have". Software has a tendency to stick around, and without tests, it gets more and more brittle to the point where only a few people know how to maintain and extend it. This is not a path to scientific advancement.

Testing is an empirical science in its simplest form. Each test case proposes a hypothesis, a method for validating the hypthosis, and an expected result. The difference with the physical sciences is that when a hypthosesis is disconfirmed, you have two paths: fix the code or the test. Scientific codes, unlike our universe, do not have fixed laws. All parts of software are fungible.

This last point is crucial. When laws change on both sides of the hypthothesis, you need a process in addition to the scientific method to ensure codes remain stable. The first and most important one is continuous integration and validation so that disconfirmations are caught and repaired before the code becomes invalid.

Another major reason to write automated tests is to leverage computers at what they are good at: known, repetitive tasks. Each software change can typically be validated in seconds without any attention, and if done right, without your involvement: source repositories notify test runners, which in turn notify you of disconfirmations.

Finally, automated tests enforce good design. This means you have to spend less time coding. Good design makes code easier to understand (tests are executable documenation) and to reuse (design for testability). This makes it easier to share libraries between scientists. Less is more.

Start Small

We have built on pytest to eliminate much of the boilerplate in testing. This allows you to write simple tests. For example, let's use the pkexample.EMA module, which computes an exponential moving average. Here's a few examples of simple test cases from pykern/tests/pkexample_test.py:

def test_ema_compute():
    from pykern import pkexample
    ema = pkexample.EMA(4)
    ema.compute(20.)
    assert 16. == ema.compute(10.), \
        'Adding a value to the series changes the average'


def test_ema_init():
    from pykern import pkexample
    assert 5.0 == pkexample.EMA(1).compute(5.), \
        'First values sets initial average'


def test_ema_init_deviance():
    from pykern import pkexample
    with pytest.raises(AssertionError) as e:
        pkexample.EMA(0)
    assert 'must be greater' in str(e.value)

The first case initializes the EMA with a length of 4 and set its initial value with a call to compute.

You run the test with:

pykern test tests/pkexample_test.py

If you want to run a specific test case then you can use pykern test <path to python test file> case=<pattern to match the case name> For example if I just wanted to run test_ema_compute I could run

pykern test tests/pkexample_test.py case=com

You could also run these test cases with py.test tests/pkexample_test.py. You'll want to get used to using pykern test, because it sets up the configuration environment properly, which you'll need for more complex tests. Note also that if you are using Bivio's emacs macros, you can run the test with \C-c \C-m.)

The output looks like:

 py.test tests/pkexample_test.py
=========================== test session starts ===========================
platform linux -- Python 3.7.2, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/vagrant/src/radiasoft/pykern
plugins: anyio-3.5.0
collected 3 items                                                                                                                                                                                                                                                                                                                                                                                                                                        

tests/pkexample_test.py ...                                                                                                                                                                                                                                                                                                                                                                                                                        [100%]
=========================== 3 passed in 0.22s =============================

The dots (.) after tests/pkexample_test.py tells you it ran 3 test cases, each of which equates to one function call (test_ema, in this case).

Assert Constraints

You can't have an EMA with a negative length so the EMA class constrains length to be greater than 0 so let's test that:

    from pykern import pkunit
    with pkunit.pkexcept(AssertionError):
        pkexample.EMA(0)

(The pykern.pkunit module contains useful operations for unit tests)

This is good enough. You don't need to test a bunch of different numbers. It's the boundary condition (0) that's important.

Limit cases

One of the things that happens with tests is that you add cases as you run into problems. In this case, you are unlikely to run into a problem with negative numbers being passed to EMA and/or the condition defining the constraint being incorrect. If you do, you'll add a case at that time, not before.

We'll add a test case to make sure the formula is right, since this is the meat of the module:

    pkunit.pkeq(
       7.0, 
       e.compute(10.0), 
       'Adding a value to the series changes the average'
    )

There are many other cases we can think of, e.g. passing an int, None, testing self.average gets updated, or calling compute multiple times. Again, there's almost no point in doing this. Yes, it would improve the quality of the test, but there's a cost: brittleness.

Going back to the concept a software universe, we want to avoid rigidity in the tests as well as the code. The more test cases you have, the harder it is to change the code. In the case of an EMA, we are talking about a fixed mathematical concept, but that's not the case with a majority of the code you'll be writing. Fixed mathematical concepts are probably already coded so you don't have to rewrite them anyway.

Advancing the Art

We should think of ways to simplify testing. For example, in RadTrack, AnalyticCalc1_test uses YAML to define expected inputs and outputs, e.g.

CriticalEnergyWiggler:
  -
    Input:
      Bx: 0.0
      By: 0.865
      Gam: 5870.925
    Expect: 4.478e+03

There's a simple parser in AnalyticCalc1_test.py, which reads this file and calls functions in the order they are defined in the file. The results are compared with a fudge factor (1e-300) so we can avoid digital arithmetic anomalies.

We can consider including this approach in PyKern if we want. Or maybe there's a better way. By writing more tests, we'll see what it is that we need to simplify testing, and refactor the testing infrastructure to make it even easier to write tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly