-
-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled tests failing while running example tests on python 3.8 (Ubuntu) #3415
Comments
I have noticed this on my PRs as well, but it appears to be intermittent. In my one PR it failed for python 3.11 instead of 3.8 like in your example. I have not been able to reproduce it locally either. |
I have also noticed it failing on 3.9 but now it is consistently failing on 3.8. Potentially a resultant of #3198 |
I can confirm it is random. If I run this code 3-4 times in the notebook then I get a failure locally. model_with_degradation = pybamm.lithium_ion.DFN({
"working electrode": "positive",
"SEI": "reaction limited", # SEI on both electrodes
"SEI porosity change": "true",
"particle mechanics": "swelling and cracking",
"SEI on cracks": "true",
"lithium plating": "partially reversible",
"lithium plating porosity change": "true", # alias for "SEI porosity change"
})
param_GrSi = pybamm.ParameterValues("OKane2022_graphite_SiOx_halfcell")
param_GrSi.update({"SEI reaction exchange current density [A.m-2]": 1.5e-07})
var_pts = {"x_n": 1, "x_s": 5, "x_p": 7, "r_n": 1, "r_p": 30}
exp_degradation = pybamm.Experiment(["Charge at 0.3C until 1.5 V", "Discharge at 0.3C until 0.005 V"])
sim3 = pybamm.Simulation(model_with_degradation, parameter_values=param_GrSi, experiment=exp_degradation, var_pts=var_pts)
sol3 = sim3.solve()
t = sol3["Time [s]"].entries
V = sol3["Voltage [V]"].entries
plt.figure()
plt.plot(t,V)
plt.xlabel("Time [s]")
plt.ylabel("Voltage [V]")
plt.show() I am running python 3.11 on a Mac M2 laptop. |
@agriyakhetarpal, @arjxn-py The solver may be sensitive to initial state. I will try to dig deeper later today. |
I remember there have been similar issues with |
I will pull it down the branch and test on my Linux machine as well |
Here is what I can confirm from seeing it personally. Failures from my laptops:
Failures in CI:
|
I think I have an adjustment that will fix it. I wrote a stress testing script to make sure random problems are done. It will take a few hours to makes sure everything is OK. I will make a PR tomorrow if it works |
I tried adjusting the var_pts and solver settings. So far I have not been able to stabilize the notebook. I am going to set this aside for now. I was testing by running the code I posted above in a loop and counting how many times it failed. All the settings I tried had 6-10% failure rates with 1000 runs of the code. |
It's good to know that this failure is random, at least, and that it occurs only in the example tests and not the unit or integration tests (or does it? I haven't noticed yet). I labelled this issue as high priority so that it can be looked at and fixed before the stable release |
It was agreed in the dev meeting 30/10/23 that we will wait for the next release before tackling this. |
I pushed some experiments for this on this branch by running the half-cell models notebook 500 times in a row in a workflow (I tried running them 1000 times first but that timed out miserably on GitHub Actions runners after six hours 😅). As we discussed in the meeting it looks like the CasADi solver is susceptible to the initial conditions and fails to converge intermittently for currently unexplained reasons, and while I cannot comment on the use of the solver given my inexperience with battery models, setting a fixed random seed at the time of instantiating an instance of the The runs are here: 1. without any changes
https://github.com/agriyakhetarpal/PyBaMM/actions/runs/6709365330 Across Python 3.8–3.11, 500 runs on each, for a total of 2000, there were
These results are in line with @kratman's failure rate as noted above, though slightly more as 341/2000 means the probability comes close to 0.17. This is without any adjustments to the notebook or the involved parameter values. 2. with all Simulation objects initialised with fixed random seed
https://github.com/agriyakhetarpal/PyBaMM/actions/runs/6709408022 Zero failures across any Python version. All 2000 runs passed. The logs have been saved to a file and are available as downloadable artifacts on both of the above links. Notwithstanding such stochastic behaviour it is reasonable to assume that the Python version is not relevant for this functionality and we can accumulate all the results to record a statistically significant experiment, it should be fine to infer that the probability of encountering two failures in two consecutive runs of the code becomes close to 3% (we would obviously need to run more experiments and this value is going to fluctuate as the number of runs increases to, say, 10,000 and onwards). I can write a PR to implement these changes, but I would appreciate some advice and clarifications first as to what they should look like and what sort of API changes should be required (e.g., should we provide this as an internal function in Also, it looks like the name of the object changes: instead of being |
I think I would make the fix random seed aspect a member function that gets called from "inside" when the solver gets set to casadi. That solves your wrapping issue and makes it something that is easy to yank out if we find a better fix. |
Thanks @kratman, that sounds very reasonable and doing that indeed solved the wrapping issue. It is a minimal change at best, so it can be replicated across any other moving parts which tend to fail at random later on. I have pushed some changes to my branch and will stress-test the notebook with each Python version one last time (another workflow run passed with zero failures across 2000). I am self-assigning this and I will open a PR soon |
Example tests failing while testing half-cell.ipynb - https://github.com/pybamm-team/PyBaMM/actions/runs/6427005783/job/17452030411#step:15:682
The text was updated successfully, but these errors were encountered: