-
Notifications
You must be signed in to change notification settings - Fork 7
TestingScientificCodes
Advice for scientists who want to write automated tests.
While automated testing has exploded in the software community, it hasn't really taken off in science. While it may be perceived as a "nice to have", it really is a "must have". Software has a tendency to stick around, and without tests, it gets more and more brittle to the point where only a few people know how to maintain and extend it. This is not a path to scientific advancement.
Testing is an empirical science in its simplest form. Each test case proposes a hypothesis, a method for validating the hypthosis, and an expected result. The difference with the physical sciences is that when a hypthosesis is disconfirmed, you have two paths: fix the code or the test. Scientific codes, unlike our universe, do not have fixed laws. All parts of software are fungible.
This last point is crucial. When laws change on both sides of the hypthothesis, you need a process in addition to the scientific method to ensure codes remain stable. The first and most important one is continuous integration and validation so that disconfirmations are caught and repaired before the code becomes invalid.
Another major reason to write automated tests is to leverage computers at what they are good at: known, repetitive tasks. Each software change can typically be validated in seconds without any attention, and if done right, without your involvement: source repositories notify test runners, which in turn notify you of disconfirmations.
Finally, automated tests enforce good design. This means you have to spend less time coding. Good design makes code easier to understand (tests are executable documenation) and to reuse (design for testability). This makes it easier to share libraries between scientists. Less is more.
We have built on pytest to eliminate much of the boilerplate in testing. This allows you to write simple tests. For example, let's use the pkexample.EMA module, which computes an exponential moving average. Here's a few examples of simple test cases from pykern/tests/pkexample_test.py:
def test_ema_compute():
from pykern import pkexample
ema = pkexample.EMA(4)
ema.compute(20.)
assert 16. == ema.compute(10.), \
'Adding a value to the series changes the average'
def test_ema_init():
from pykern import pkexample
assert 5.0 == pkexample.EMA(1).compute(5.), \
'First values sets initial average'
def test_ema_init_deviance():
from pykern import pkexample
with pytest.raises(AssertionError) as e:
pkexample.EMA(0)
assert 'must be greater' in str(e.value)
The first case initializes the EMA with a length of 4 and set its initial value
with a call to compute
.
You run the test with:
pykern test tests/pkexample_test.py
If you want to run a specific test case then you can use pykern test <path to python test file> case=<pattern to match the case name>
For example if I just wanted to run test_ema_compute
I could run
pykern test tests/pkexample_test.py case=com
You could also run these test cases with py.test tests/pkexample_test.py
.
You'll want to get used to using pykern test
, because it sets up
the configuration environment properly, which you'll need for more
complex tests. Note also that if you are using Bivio's emacs macros,
you can run the test with \C-c \C-m
.)
The output looks like:
py.test tests/pkexample_test.py
=========================== test session starts ===========================
platform linux -- Python 3.7.2, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/vagrant/src/radiasoft/pykern
plugins: anyio-3.5.0
collected 3 items
tests/pkexample_test.py ... [100%]
=========================== 3 passed in 0.22s =============================
The dots (.
) after tests/pkexample_test.py tells you it ran 3 test cases, each of
which equates to one function call (test_ema
, in this case).
You can't have an EMA with a negative length so the EMA class constrains length to be greater than 0 so let's test that:
from pykern import pkunit
with pkunit.pkexcept(AssertionError):
pkexample.EMA(0)
(The pykern.pkunit module contains useful operations for unit tests)
This is good enough. You don't need to test a bunch of different numbers. It's the boundary condition (0) that's important.
One of the things that happens with tests is that you add cases
as you run into problems. In this case, you are unlikely to run
into a problem with negative numbers being passed to EMA
and/or
the condition defining the constraint being incorrect. If you
do, you'll add a case at that time, not before.
We'll add a test case to make sure the formula is right, since this is the meat of the module:
pkunit.pkeq(
7.0,
e.compute(10.0),
'Adding a value to the series changes the average'
)
There are many other cases we can think of, e.g. passing
an int
, None
, testing self.average
gets updated,
or calling compute
multiple times. Again, there's almost
no point in doing this. Yes, it would improve the quality
of the test, but there's a cost: brittleness.
Going back to the concept a software universe, we want to avoid rigidity in the tests as well as the code. The more test cases you have, the harder it is to change the code. In the case of an EMA, we are talking about a fixed mathematical concept, but that's not the case with a majority of the code you'll be writing. Fixed mathematical concepts are probably already coded so you don't have to rewrite them anyway.
We should think of ways to simplify testing. For example, in RadTrack, AnalyticCalc1_test uses YAML to define expected inputs and outputs, e.g.
CriticalEnergyWiggler:
-
Input:
Bx: 0.0
By: 0.865
Gam: 5870.925
Expect: 4.478e+03
There's a simple parser in AnalyticCalc1_test.py
, which
reads this file and calls functions in the order they are
defined in the file. The results are compared with a fudge
factor (1e-300) so we can avoid digital arithmetic
anomalies.
We can consider including this approach in PyKern if we want. Or maybe there's a better way. By writing more tests, we'll see what it is that we need to simplify testing, and refactor the testing infrastructure to make it even easier to write tests.