[Bug]: Some integration tests are non-deterministic #2833

jsbrittain · 2023-03-29T14:41:32Z

PyBaMM Version

23.2 [develop branch, commit 8232ac4]

Python Version

3.9.16

Describe the bug

At least one integration test (test_compare_SPMe_silicon_graphite) is non-deterministic, which allows the test (and subsequently the test integration suite) to fail for reasons that may be unrelated to recent code changes.

After encountering the issue during a pull request, I used the bash script below to examine the outcome of test_compare_SPMe_silicon_graphite over 100 iterations. The output log shows that the test failed 3 times (note that rerunning the script again resulted in no failures, suggesting that the failure rate is low, likely around 1-2%; however, since integration test are currently run over 3 OSs and 2 python versions, and this may not be the only impacted test, this increases the chance of encountering the failure).

Steps to Reproduce

#!/usr/bin/env bash

FAILED=0
for RUN in {1..100}
do
    printf "Run: %d\n" $RUN
    printf "Failed: %d\n" $FAILED
    python -m unittest tests/integration/test_models/test_full_battery_models/test_lithium_ion/test_compare_outputs_two_phase.py -k test_compare_SPMe_silicon_graphite
    if [ $? -ne 0 ]
    then
        let FAILED++
    fi
done

printf "Number failed out of 100: %d\n" $FAILED

Relevant log output

(snippet)

...

Run: 94
Failed: 2
At t = 490.558 and h = 1.78065e-07, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 193.043 and h = 6.85492e-08, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 401.911 and h = 3.24251e-14, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 104.614 and h = 9.76741e-14, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 104.614 and h = 9.72049e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 32.5421 and h = 2.01133e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 32.5421 and h = 5.97405e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 14.5241 and h = 2.43781e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 5.51504 and h = 1.34886e-16, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.827541 and h = 8.71606e-14, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.827541 and h = 1.72565e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.827545 and h = 2.08591e-14, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.241604 and h = 1.53753e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.241604 and h = 1.29083e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.0951226 and h = 2.96869e-16, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0 and h = 6.36755e-12, the error test failed repeatedly or with |h| = hmin.
E
======================================================================
ERROR: test_compare_SPMe_silicon_graphite (tests.integration.test_models.test_full_battery_models.test_lithium_ion.test_compare_outputs_two_phase.TestCompareOutputsTwoPhase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/casadi_solver.py", line 669, in _run_integrator
    casadi_sol = integrator(
  File "/home/jsb/repos/pybamm-team/PyBaMM/venv/lib/python3.9/site-packages/casadi/casadi.py", line 13453, in __call__
    return self.call(kwargs)
  File "/home/jsb/repos/pybamm-team/PyBaMM/venv/lib/python3.9/site-packages/casadi/casadi.py", line 12324, in call
    return _casadi.Function_call(self, *args)
RuntimeError: .../casadi/interfaces/sundials/idas_interface.cpp:591: IDASolve returned "IDA_ERR_FAIL". Consult IDAS documentation.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jsb/repos/pybamm-team/PyBaMM/tests/integration/test_models/test_full_battery_models/test_lithium_ion/test_compare_outputs_two_phase.py", line 169, in test_compare_SPMe_silicon_graphite
    self.compare_outputs_two_phase_silicon_graphite(model_class)
  File "/home/jsb/repos/pybamm-team/PyBaMM/tests/integration/test_models/test_full_battery_models/test_lithium_ion/test_compare_outputs_two_phase.py", line 148, in compare_outputs_two_phase_silicon_graphite
    sol2 = sim.solve(t_eval, inputs={"x": 0.1})
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/simulation.py", line 626, in solve
    self._solution = solver.solve(self.built_model, t_eval, **kwargs)
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/base_solver.py", line 839, in solve
    new_solution = self._integrate(
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/casadi_solver.py", line 288, in _integrate
    current_step_sol = self._solve_for_event(current_step_sol)
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/casadi_solver.py", line 443, in _solve_for_event
    dense_step_sol = self._run_integrator(
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/casadi_solver.py", line 675, in _run_integrator
    raise pybamm.SolverError(error.args[0])
pybamm.expression_tree.exceptions.SolverError: .../casadi/interfaces/sundials/idas_interface.cpp:591: IDASolve returned "IDA_ERR_FAIL". Consult IDAS documentation.

...

Number failed out of 100: 3

The text was updated successfully, but these errors were encountered:

jsbrittain · 2023-03-29T14:42:01Z

My suggestion would either be to fix the parameters / random seed / etc of the test in order to produce a consistent result; or embrace the stochasticity and retain / expand this approach in a separate set of robustness tests that could be run periodically (i.e. separate from pull requests). In the longer term this latter approach could be considered as part of infrastructure discussions (e.g. #2315).

martinjrobins · 2023-03-30T09:37:20Z

That's interesting! The solver shouldn't be random at all, it's an entirely deterministic model and solver.... Something getting written/loaded to disc during the test that is changing the outcome?

brosaplanella · 2023-03-30T10:08:10Z

I think there initial states for the algebraic equations are perturbed with some noise, it might come from there.

rtimms · 2023-03-30T15:01:10Z

See #2356 . IMO perturb_algebraic_initial_conditions should be false by default since it is kind of unexpected. It's ok as an option to try if people have convergence issues.

jsbrittain · 2023-03-30T16:24:01Z

@rtimms Ah, that's really helpful to know! I suspected something similar, but wasn't sure if this was going to be due to Casadi or PyBaMM (a quick check reveals a handful of other locations in PyBaMM where random numbers are also used [including some test routines]).

I've just run a quick set of tests again on test_compare_SPMe_silicon_graphite, fixing the random seed this time from within the test function, and this leads to consistent performance / convergence, so it seems the tests can be made more reliable without needing to change the behaviour of the user / production code at this time.

Although this leads to further questions over the role of random factors in testing, and whether perturbations should be turned on or off by default, these feel like broader issues for discussion. For now, can I suggest that I hunt down any tests that call code with random elements and simply fix their random seeds? This should prevent tests from failing due to reasons that are unrelated to recent codebase changes.

rtimms · 2023-03-30T18:33:52Z

@jsbrittain thanks, sounds like a sensible approach!

jsbrittain added the bug Something isn't working label Mar 29, 2023

jsbrittain mentioned this issue Mar 31, 2023

Fix random seed on tests #2844

Merged

8 tasks

martinjrobins closed this as completed in #2844 Apr 17, 2023

agriyakhetarpal mentioned this issue Dec 13, 2023

Use pytest for testing #3617

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Some integration tests are non-deterministic #2833

[Bug]: Some integration tests are non-deterministic #2833

jsbrittain commented Mar 29, 2023

jsbrittain commented Mar 29, 2023

martinjrobins commented Mar 30, 2023

brosaplanella commented Mar 30, 2023

rtimms commented Mar 30, 2023

jsbrittain commented Mar 30, 2023

rtimms commented Mar 30, 2023

[Bug]: Some integration tests are non-deterministic #2833

[Bug]: Some integration tests are non-deterministic #2833

Comments

jsbrittain commented Mar 29, 2023

PyBaMM Version

Python Version

Describe the bug

Steps to Reproduce

Relevant log output

jsbrittain commented Mar 29, 2023

martinjrobins commented Mar 30, 2023

brosaplanella commented Mar 30, 2023

rtimms commented Mar 30, 2023

jsbrittain commented Mar 30, 2023

rtimms commented Mar 30, 2023