Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lock cache file when computing #6

Open
matt-long opened this issue Nov 20, 2019 · 2 comments
Open

lock cache file when computing #6

matt-long opened this issue Nov 20, 2019 · 2 comments

Comments

@matt-long
Copy link
Collaborator

To avoid a race condition when computing in parallel, we should enable a "lock" feature on the cache file. If a lock file is detected, wait until it's removed before reading the cache. We should wrap the cache creation in a try/except block to clean up lock files if there is a failure.

@andersy005
Copy link
Member

@matt-long
Per xarray's open_dataset() documentation, xarray handles the locking for the user:

lock (False or duck threading.Lock, optional) – Resource lock to use when reading data from disk. Only relevant when using dask or another form of parallelism. By default, appropriate locks are chosen to safely read and write files with the currently active dask scheduler.

Should we still come up with our own approach? If so, do you have a good test case that I can use as a reference?

@andersy005
Copy link
Member

andersy005 commented Nov 21, 2019

When it comes to writing, my understanding is that:

  • ds.to_netcdf() is serial by default, and a single process writes to the file while others are just idle.
  • By default, the compute parameter for to_netcdf() is set to True, which guarantees that the to_netcdf() is both synchronous and serial.

Let me know if I am missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants