Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable simulations are not caught by buildkite, LES_driven_SCM actually breaking #574

Closed
ilopezgp opened this issue Nov 18, 2021 · 2 comments · Fixed by #575
Closed

Unstable simulations are not caught by buildkite, LES_driven_SCM actually breaking #574

ilopezgp opened this issue Nov 18, 2021 · 2 comments · Fixed by #575
Assignees
Labels
bug Something isn't working help wanted 👋 Extra attention is needed

Comments

@ilopezgp
Copy link
Contributor

ilopezgp commented Nov 18, 2021

I just realized that the MSE buildkite tests can pass without the simulation reaching the end time. This situation is frequently seen in the LES_driven_SCM test. Examples in closed PRs,

I attach an example of this behavior from the buildkite of PR #564. This issue has two components,

  • The current integration tests are fault tolerant, they shouldn't be.
  • The default LES_driven_SCM case breaks.

Screen Shot 2021-11-17 at 11 49 06 PM

@ilopezgp ilopezgp added bug Something isn't working help wanted 👋 Extra attention is needed labels Nov 18, 2021
@jakebolewski
Copy link

jakebolewski commented Nov 18, 2021

I think for this you probably want some sort of state dump upon aborting (ex. simulation time, number of steps, run parameters). Then the CI can see if this abort state dump file exists for a particular run and raises an proper error exit code at the end. This will help with calibration as well because the error will have some context with it's state so you don't have to do the extra work of trying to figure out which parameters caused the abort to happen at which time.

bors bot added a commit that referenced this issue Nov 18, 2021
575: Add post-run tests, error if we do not run to t_max r=costachris a=charleskawczynski

Closes #574.

Co-authored-by: Charles Kawczynski <[email protected]>
@bors bors bot closed this as completed in 74f799e Nov 19, 2021
@charleskawczynski
Copy link
Member

Hmm, #575 adds a test on the ode integrator that we've reach t_max, and adds post-run tests (which ensure NaN is not in the solution). But maybe we just need the NaNs check? Alternatively we could return the a success flag and throw the error after the state dump / comparison / mse computations. Yeah, that's probably more helpful. Will open a PR.

bors bot added a commit that referenced this issue Nov 19, 2021
581: Error on aborted simulations after solution export r=charleskawczynski a=charleskawczynski

This PR moves the error on early aborted simulations to _after_ the solution is exported. This way we can more easily look at the solution / data to see what went wrong. [Discussed here](#574 (comment)).

I'm hoping that this will shed light on #577.

Co-authored-by: Charles Kawczynski <[email protected]>
bors bot added a commit that referenced this issue Nov 19, 2021
581: Error on aborted simulations after solution export r=charleskawczynski a=charleskawczynski

This PR moves the error on early aborted simulations to _after_ the solution is exported. This way we can more easily look at the solution / data to see what went wrong. [Discussed here](#574 (comment)).

I'm hoping that this will shed light on #577.

Co-authored-by: Charles Kawczynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted 👋 Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants