Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Some integration tests are non-deterministic #2833

Closed
jsbrittain opened this issue Mar 29, 2023 · 6 comments · Fixed by #2844
Closed

[Bug]: Some integration tests are non-deterministic #2833

jsbrittain opened this issue Mar 29, 2023 · 6 comments · Fixed by #2844
Labels
bug Something isn't working

Comments

@jsbrittain
Copy link
Contributor

PyBaMM Version

23.2 [develop branch, commit 8232ac4]

Python Version

3.9.16

Describe the bug

At least one integration test (test_compare_SPMe_silicon_graphite) is non-deterministic, which allows the test (and subsequently the test integration suite) to fail for reasons that may be unrelated to recent code changes.

After encountering the issue during a pull request, I used the bash script below to examine the outcome of test_compare_SPMe_silicon_graphite over 100 iterations. The output log shows that the test failed 3 times (note that rerunning the script again resulted in no failures, suggesting that the failure rate is low, likely around 1-2%; however, since integration test are currently run over 3 OSs and 2 python versions, and this may not be the only impacted test, this increases the chance of encountering the failure).

Steps to Reproduce

#!/usr/bin/env bash

FAILED=0
for RUN in {1..100}
do
    printf "Run: %d\n" $RUN
    printf "Failed: %d\n" $FAILED
    python -m unittest tests/integration/test_models/test_full_battery_models/test_lithium_ion/test_compare_outputs_two_phase.py -k test_compare_SPMe_silicon_graphite
    if [ $? -ne 0 ]
    then
        let FAILED++
    fi
done

printf "Number failed out of 100: %d\n" $FAILED

Relevant log output

(snippet)

...

Run: 94
Failed: 2
At t = 490.558 and h = 1.78065e-07, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 193.043 and h = 6.85492e-08, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 401.911 and h = 3.24251e-14, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 104.614 and h = 9.76741e-14, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 104.614 and h = 9.72049e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 32.5421 and h = 2.01133e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 32.5421 and h = 5.97405e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 14.5241 and h = 2.43781e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 5.51504 and h = 1.34886e-16, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.827541 and h = 8.71606e-14, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.827541 and h = 1.72565e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.827545 and h = 2.08591e-14, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.241604 and h = 1.53753e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.241604 and h = 1.29083e-13, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0.0951226 and h = 2.96869e-16, the corrector convergence failed repeatedly or with |h| = hmin.
At t = 0 and h = 6.36755e-12, the error test failed repeatedly or with |h| = hmin.
E
======================================================================
ERROR: test_compare_SPMe_silicon_graphite (tests.integration.test_models.test_full_battery_models.test_lithium_ion.test_compare_outputs_two_phase.TestCompareOutputsTwoPhase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/casadi_solver.py", line 669, in _run_integrator
    casadi_sol = integrator(
  File "/home/jsb/repos/pybamm-team/PyBaMM/venv/lib/python3.9/site-packages/casadi/casadi.py", line 13453, in __call__
    return self.call(kwargs)
  File "/home/jsb/repos/pybamm-team/PyBaMM/venv/lib/python3.9/site-packages/casadi/casadi.py", line 12324, in call
    return _casadi.Function_call(self, *args)
RuntimeError: .../casadi/interfaces/sundials/idas_interface.cpp:591: IDASolve returned "IDA_ERR_FAIL". Consult IDAS documentation.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jsb/repos/pybamm-team/PyBaMM/tests/integration/test_models/test_full_battery_models/test_lithium_ion/test_compare_outputs_two_phase.py", line 169, in test_compare_SPMe_silicon_graphite
    self.compare_outputs_two_phase_silicon_graphite(model_class)
  File "/home/jsb/repos/pybamm-team/PyBaMM/tests/integration/test_models/test_full_battery_models/test_lithium_ion/test_compare_outputs_two_phase.py", line 148, in compare_outputs_two_phase_silicon_graphite
    sol2 = sim.solve(t_eval, inputs={"x": 0.1})
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/simulation.py", line 626, in solve
    self._solution = solver.solve(self.built_model, t_eval, **kwargs)
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/base_solver.py", line 839, in solve
    new_solution = self._integrate(
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/casadi_solver.py", line 288, in _integrate
    current_step_sol = self._solve_for_event(current_step_sol)
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/casadi_solver.py", line 443, in _solve_for_event
    dense_step_sol = self._run_integrator(
  File "/home/jsb/repos/pybamm-team/PyBaMM/pybamm/solvers/casadi_solver.py", line 675, in _run_integrator
    raise pybamm.SolverError(error.args[0])
pybamm.expression_tree.exceptions.SolverError: .../casadi/interfaces/sundials/idas_interface.cpp:591: IDASolve returned "IDA_ERR_FAIL". Consult IDAS documentation.

...

Number failed out of 100: 3
@jsbrittain jsbrittain added the bug Something isn't working label Mar 29, 2023
@jsbrittain
Copy link
Contributor Author

My suggestion would either be to fix the parameters / random seed / etc of the test in order to produce a consistent result; or embrace the stochasticity and retain / expand this approach in a separate set of robustness tests that could be run periodically (i.e. separate from pull requests). In the longer term this latter approach could be considered as part of infrastructure discussions (e.g. #2315).

@martinjrobins
Copy link
Contributor

That's interesting! The solver shouldn't be random at all, it's an entirely deterministic model and solver.... Something getting written/loaded to disc during the test that is changing the outcome?

@brosaplanella
Copy link
Member

I think there initial states for the algebraic equations are perturbed with some noise, it might come from there.

@rtimms
Copy link
Contributor

rtimms commented Mar 30, 2023

See #2356 . IMO perturb_algebraic_initial_conditions should be false by default since it is kind of unexpected. It's ok as an option to try if people have convergence issues.

@jsbrittain
Copy link
Contributor Author

@rtimms Ah, that's really helpful to know! I suspected something similar, but wasn't sure if this was going to be due to Casadi or PyBaMM (a quick check reveals a handful of other locations in PyBaMM where random numbers are also used [including some test routines]).

I've just run a quick set of tests again on test_compare_SPMe_silicon_graphite, fixing the random seed this time from within the test function, and this leads to consistent performance / convergence, so it seems the tests can be made more reliable without needing to change the behaviour of the user / production code at this time.

Although this leads to further questions over the role of random factors in testing, and whether perturbations should be turned on or off by default, these feel like broader issues for discussion. For now, can I suggest that I hunt down any tests that call code with random elements and simply fix their random seeds? This should prevent tests from failing due to reasons that are unrelated to recent codebase changes.

@rtimms
Copy link
Contributor

rtimms commented Mar 30, 2023

@jsbrittain thanks, sounds like a sensible approach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants