Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing error if old dmp files are around #235

Open
dschwoerer opened this issue Mar 28, 2022 · 5 comments
Open

Confusing error if old dmp files are around #235

dschwoerer opened this issue Mar 28, 2022 · 5 comments

Comments

@dschwoerer
Copy link
Contributor

Running with a lower number of processes results in this error:

ValueError: Each run directory does not contain an equal number of output files. If the parallelization scheme of your simulation changed partway-through, then please load each directory separately and concatenate them along the time dimension with xarray.concat().

I am only loading one simulation, so this error does not seem to be correct.

Preferably I would have old data just ignored, as it is with collect:
https://github.com/boutproject/boutdata/blob/master/boutdata/collect.py#L279
https://github.com/boutproject/boutdata/blob/master/boutdata/collect.py#L310

Would you be happy for a PR?

@johnomotani
Copy link
Collaborator

Yes, PR would be welcome. This case was not considered when the existing error message was written!

@dschwoerer
Copy link
Contributor Author

Hmm, changing the error is fairly easy, but ignoring files without breaking the current behaviour is I guess essentially impossible.

If we could assume just a single simulation per folder, that would make things easier.
But right now we support BOUT.dmp.{0,1}.nc to be two different simulations. Would it be ok to break this behaviour?

@johnomotani
Copy link
Collaborator

If we could assume just a single simulation per folder, that would make things easier.
But right now we support BOUT.dmp.{0,1}.nc to be two different simulations. Would it be ok to break this behaviour?

I quite often want to have something like boutdata{0,1,2}.nc in the same directory, with different simulations in each (squashed) file, and I'd like to be able to have that keep working.

I wouldn't be against having BOUT.dmp.*.nc be treated as a special case though. In hindsight, it's unfortunate that I chose BOUT.dmp.nc as the default output file name in squashoutput(). If anyone names consecutive restarts of a simulation as BOUT.dmp.{0,1}.nc, etc., they'd be crazy though, so I think printing a warning whenever this special-case workaround is triggered would be enough.

@dschwoerer
Copy link
Contributor Author

Is it possible to check for a run_id - and depending on that decide which to merge?

@johnomotani
Copy link
Collaborator

Is it possible to check for a run_id - and depending on that decide which to merge?

That's a good idea! Think it should work, and we can fall back to an error if run_id doesn't exist (e.g. older output files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants