Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap area] Solver improvements #3910

Open
brosaplanella opened this issue Mar 20, 2024 · 14 comments
Open

[Roadmap area] Solver improvements #3910

brosaplanella opened this issue Mar 20, 2024 · 14 comments
Assignees

Comments

@brosaplanella
Copy link
Member

I'd like to continue making pybamm easier to work with for the parameter inference libraries like PyBOP (https://github.com/pybop-team/PyBOP). At the moment I think this includes:

Originally posted by @martinjrobins in #3839 (comment)

@rtimms
Copy link
Contributor

rtimms commented Jun 7, 2024

Not solver, but this is related to speed-up: #4058

@martinjrobins
Copy link
Contributor

martinjrobins commented Jul 15, 2024

This is my attempt at a roadmap for PyBaMM's solvers, comments / suggestions welcome

Sensitivities

  • Sensitivities not currently usable in experiments, add this functionality
    • Requires propagating sensitivities through events
    • Casadi will have this functionality in Autumn ‘24, integrate this into pybamm
  • Automatic differentiation (forwards + backwards) through solvers using Jax. This builds on existing work wrapping idaklu solver as a Jax operation.
  • Higher order gradients (e.g. hessians)

Multithreaded/GPU support:

  • merge in multithreaded Idaklu solver code from I4087-multiprocessing #4260
  • once casadi solver has events support: use map to parallise over inputs
  • gpu support in casadi or idaklu
    • most likely we can do this for idaklu mlir solver once iree supports 64 bit floats
    • or casadi if they are planning this, but no sign of this.

Post Processing optimisation and refactoring

  • Solution objects (and associated ProcessedVariables objects): clean up the code, refactor so casadi dependency is reduced, and optimise the postprocessing computation for speed of execution

Solver refactoring

Solver documentation

  • Create a “high performance pybamm” user guide to explain how to use the solvers.

WASM build

  • Navigate lack of threading support through build system changes
  • Compile successfully with Emscripten builds of OpenBLAS, SuiteSparse + KLU enabled, and SUNDIALS + IDAS enabled
  • Implement a "Lab" using JupyterLite where PyBaMM in the browser whifh can then be tried out by the community
  • Try to profile it under Emscripten/WASM and see where things are headed and report findings, if any

@MarcBerliner
Copy link
Member

MarcBerliner commented Jul 31, 2024

Here are my thoughts on improving the solver. Please let me know if you have comments or questions @martinjrobins and others.

Solver options

Initialization

Time stepping

In PyBaMM, the solver currently stops at all values in the t_eval vector. It seems that we use t_eval and set some dt for a few distinct purposes:

  1. To enforce time-dependent discontinuities within a single step, like with a drive cycle
  2. To set the data collection frequency (as in the period experiment kwarg)
  3. Setting a small dt to force better solver convergence (take smaller time steps)

These three reasons for setting a dt are all valid, but stopping the simulation at all time steps can have a drastic impact on the adaptive time stepping and performance. For example, consider a full C/10 discharge with

  • a. t_eval = [0, 36000] (i.e., no stops)
  • b. t_eval = np.arange(0, 36060, 60) (pybamm default)

If we compare the solver stats,

  • Number of steps: a. 165 vs b. 715
  • Number of residual calls: a. 288 vs b. 823
  • Number of linear solver setups: a. 28 vs b. 91
  • Number of nonlinear iterations: a. 286 vs b. 821
  • DAE integration time: a. 25.5 ms vs b. 97.1 ms

Even though we solve the same system, the dense t_eval b. is nearly 4x slower! To address these issues, I propose the following changes that align the time-stepping options with Julia's differential equation solvers (see Output Control):

  • (Breaking) By default, save every t and y determined by IDA's adaptive time stepping algorithm. This eliminates issues like the one above with the C/10 discharge. We can accurately post-interpolate the solution onto specific times with IDA's Hermite interpolator. This is a huge benefit for parameter estimation because we will always obtain the full solution accuracy regardless of the data collection frequency.
  • (Non-breaking) Change the description of t_eval to match Julia's tstops: "Denotes extra times that the time-stepping algorithm must step to. This should be used to help the solver deal with discontinuities and singularities, since stepping exactly at the time of the discontinuity will improve accuracy." With this option, drive cycles in 1. still work
  • (Non-breaking) Add a solver option that matches Julia's saveat: "Denotes specific times to save the solution at, during the solving phase. The solver will save at each of the timepoints in this array in the most efficient manner available to the solver." This addresses the data collection frequency in 2. without negatively affecting performance since it interpolates the solution with IDAGetDky().
  • (Non-breaking) Discourage modifying t_eval for performance issues and encourage modifying appropriate solver options (rtol, atol, dt_max, etc.), which addresses 3.

@martinjrobins
Copy link
Contributor

yea, agree that it would be great to get rid of IDASetStopTime! or at least reduce the number of times we use it. Would this play nicely with the casadi solver?

@MarcBerliner
Copy link
Member

Would this play nicely with the casadi solver?

If we can access IDA's internal time steps via Casadi, I think we can do most of this stuff

@agriyakhetarpal
Copy link
Member

Corollary from the PyBaMM developer meeting on 05/08/2024 as a part of PyBaMM running in WASM:

  • CasADi can be compiled with the Emscripten toolchain starting with the upcoming v3.6.6 and will be available in Pyodide with the next patch or minor version (either v0.26.3 or v0.27.0).
    • uses NumPy for computations, so no floating point imprecisions were noticed (so far)
  • Compiling IDAS doesn't take too much effort; linking with KLU could be difficult. I'm unsure if IREE is available/possible. Reference BLAS/LAPACK implementations should be available and configurable.
  • Best way to proceed with this is to compile PyBaMM without the IDAKLU solver for now (i.e., as a pure Python package) and provide pybamm.CasadiSolver() through a Pyodide instance, which can then be used in the docs for the example notebooks or other in-browser uses (more or less, JupyterLite)
    • ship a pure Python wheel (with BUILD_IDAKLU set to OFF) in addition to platform-specific wheels, since pip always chooses the most specific wheel available

@martinjrobins
Copy link
Contributor

martinjrobins commented Aug 6, 2024

@agriyakhetarpal: FYI, I've compiled IDA with KLU using enscripten in the past, see this repo: https://github.com/martinjrobins/diffeq-runtime. However I agree that focusing on the casadi solver for now is the best approach since you have it already compiled and in Pyodide

@agriyakhetarpal
Copy link
Member

Thanks for the resource, @martinjrobins! I see you've built a static lib for KLU and also used NO_LAPACK – should work well. I'll tinker with the settings and maybe I can reduce the binary size a bit :)

@MarcBerliner
Copy link
Member

To improve the speed of our ODE models, I'd like to add CVODE from SUNDIALS to our suite of C solvers in addition to IDA. I have a few questions about the implementation, and I'd like to hear your thoughts @jsbrittain, @martinjrobins, @pipliggins, and others.

  • In C++, we can make a base solver class and derived classes for IDA and CVODE. The solver code structure for both will be very similar, so this will help with code reuse. However, this will cause a headache for some of the existing PRs.
  • The CasadiSolver automatically determines if the system is an ODE or a DAE and passes it to the appropriate solver. I think this approach also makes sense for our C solvers. If we do take this approach, then...
  • One minor issue is that the IDAKLUSolver name will no longer be accurate. Since we plan on streamlining the IDAKLU installation process and changing the default away from CasadiSolver, maybe we can also rename IDAKLUSolver to SundialsSolver or something. This might make the "new and improved default solver" announcement splashier (cc @valentinsulzer @rtimms).

And just FYI, I don't plan on starting this work for at least a couple of weeks.

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Nov 20, 2024

Hi @martinjrobins, an update: I've made a substantial amount of progress recently in compiling PyBaMM + the IDAKLU solver for WASM and I'm at one of the last stages for it – which is to finish up and get it to compile fully so that we have a shareable wasm32 wheel (and hopefully correctly in terms of computations, which I can't verify until we do have a wheel at hand).

I've linked a PR with my progress above, and the blocker seems to be the fact that post #4260 (which can be closed now since it was superseded, I think?) and its latter variant #4449 that was merged recently, is that I need to figure out how to turn off OpenMP/threading support completely using a build-time flag/CMake argument that I can pass on/an environment variable of sorts/a hardcoded define I can patch, that will set the number of available threads to one so that the solver's compilation can proceed with the NVector_Serial pathway and Emscripten would never attempt to compile the code for the OpenMP pathway – thus, making OpenMP optional for the compilation procedure (please see the error logs in CI for context). Could you give me any pointers towards this, in case I'm missing anything? As far as I understand, we aren't exposing setup_opts.num_threads anywhere to do this, and changes in our build scripts and compilation will be required to turn off threads entirely. I can use OMP_NUM_THREADS to control the parallelisation as you mentioned in #4087 (comment), but this seems to be something I can access only at runtime when we already have wheels and not during build time.

That aside, PyBaMM without the IDAKLU solver, i.e., with just the CasadiSolver, and as a pure Python wheel, as you suggested in #3910 (comment), should work now 🎉 We have almost all the dependencies available for PyBaMM in Pyodide, including all of the compiled ones. It should just a matter of deploying it with JupyterLite and my custom Pyodide distribution.

@martinjrobins
Copy link
Contributor

Hi @agriyakhetarpal. I would define a new variable in the CMakeLists.txt (e.g. PYBAMM_OPENMP), that if is equal to ON (this should be the default as well), adds a compile definition using https://cmake.org/cmake/help/latest/command/add_compile_definitions.html (e.g. call the definition PYBAMM_OPENMP). This will be available at compile time and you can do stuff like:

#ifdef PYBAMM_OPENMP
  #include <nvector/nvector_openmp.h>
#endif

Then you need to go through the code and switch off any openmp related code using this compile definition

remember also to switch off the searching and linking against openmp in CMakeLists.txt (line 147-169)

@martinjrobins
Copy link
Contributor

looking forward to trying out pybamm in the browser, let me know when you have a deployment :)

@agriyakhetarpal
Copy link
Member

Thanks for the ideas, @martinjrobins; I did have something like that in mind! I'll put together a PR for this and add another patch. I'll also need changes from #4449 for it, so it might be better for me to wait until we have a new release with #4598. I'll consider moving my repository to our GitHub organisation once the deployments are up and working.

[Note to self] some more changes that be needed to be done:

  • make pooch optional: while pooch is a pure Python package, we also need to interact with the in-browser file system and PyBaMM won't work outside JupyterLite (offers its own file system implementation) without doing this – it might work with a mounted folder with NodeFS (but we need pooch to save to that folder, then).
  • make platformdirs optional and disable posthog-based telemetry
  • patch away the pybamm.BatchStudy class under WASM because it uses multiprocessing – the import multiprocessing statement in itself does not break Pyodide because we've patched it to do nothing, but acquiring a Pool will break and we should fail with a helpful error message.
  • Fail early if someone sets num_threads at runtime as more than 1 when initialising an IDAKLUSolver() object, so that we avoid internal Emscripten traceback that comes with a crash
  • Run test suite under WASM with out-of-tree builds and see if there are any threading/multiprocessing issues to catch
  • Possibly upstream PyBaMM's recipe to the Pyodide repo once the patches are minimal (it does help that I help maintain it :D).

Most of these changes can very well be upstreamed; it would only help us in the future and I wouldn't need to maintain these patches.

@agriyakhetarpal
Copy link
Member

Additional notes for further reference: we don't have a recipe for SUNDIALS in Pyodide yet – there will soon be one, though, but it will be for the latest version (7.1.1) and not the version we use (6.5.0), so I've been compelled to create a custom one myself and patch out or exclude the Fortran bits however we can because we don't have a relatively stable Fortran to WASM compiler (flang is starting to offer support starting with LLVM 19).

The same goes for SuiteSparse; we have v5.11.0 in Pyodide (for SparseQR, and does not compile KLU so far), but we use 6.0.3 here in PyBaMM (side note: we're getting quite behind, though, it might be worth testing the latest 7.8.3 for our conventional targets), which lacks some of the "build system convenience" changes such as being able to pass -DSUITESPARSE_ENABLE_PROJECTS or -DUSE_FORTRAN=NO, so I've compiled AMD, COLAMD, BTF, and KLU by going into their directories one-by-one in a pattern similar to what we do in install_KLU_Sundials.py. It would be nice to not maintain this too much, either.

For CasADi, we can't simply use the PyPI/non-PyPI wheels because we need the CasADi shared objects as side modules for the IDAKLU shared object to link to. Again, it might be nice to compile it some time as a native setuptools Extension (this gives us the added benefit of speeding up some of the Windows builds for which we use vcpkg).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants