Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to shared Lock (SerializableLock if possible) for reading/writing #1179

Merged
merged 6 commits into from
Jan 4, 2017

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Dec 22, 2016

Fixes #1172

The serializable lock will be useful for dask.distributed or multi-processing
(xref #798, #1173, among others).

…writing

Fixes pydata#1172

The serializable lock will be useful for dask.distributed or multi-processing
(xref pydata#798, pydata#1173, among others).
@shoyer shoyer mentioned this pull request Dec 22, 2016
@mrocklin
Copy link
Contributor

Is there a clear fail case we can use as a test to demonstrate the value here?

@shoyer
Copy link
Member Author

shoyer commented Dec 23, 2016

@mrocklin We could update our dask-distributed integration tests to avoid lock=False, but that will need to wait until the next dask release for tests to pass on CI.

@rabernat
Copy link
Contributor

rabernat commented Jan 3, 2017

Is there a clear fail case we can use as a test to demonstrate the value here?

I have found a fail case related to distributed: attempting to use to_netcdf() with a dask.distributed client fails because the threading.Lock() can't be serialized. A SerializableLock would overcome this problem.

Consider this example:

import dask.array as da
from distributed import Client
import xarray as xr

def create_and_store_dataset():
    shape = (10000, 1000)
    chunks = (1000, 1000)
    data = da.zeros(shape, chunks=chunks)
    ds = xr.DataArray(data).to_dataset()
    ds.to_netcdf('test_dataset.nc')
    print("Success!")

create_and_store_dataset()
client = Client()
create_and_store_dataset()

The first call succeeds, while the second fails with TypeError: can't pickle thread.lock objects.

When using the distributed client, I can successfully call .store on the underlying dask array if I pass lock=SerializableLock().

@mrocklin
Copy link
Contributor

mrocklin commented Jan 3, 2017

Dask 0.13.0 has been released

@shoyer
Copy link
Member Author

shoyer commented Jan 3, 2017

I will update the xarray/dask-distributed integration and submit this later today. @rabernat it should solve your issues with to_netcdf.

@shoyer
Copy link
Member Author

shoyer commented Jan 4, 2017

Hmm. It looks like we need dask 0.13 in conda to make the distributed integration pass.

@mrocklin
Copy link
Contributor

mrocklin commented Jan 4, 2017

It's up now on conda-forge if you're interested in switching over.

@shoyer shoyer merged commit 21a792d into pydata:master Jan 4, 2017
@shoyer
Copy link
Member Author

shoyer commented Jan 4, 2017

In it goes. We're using conda-forge for all Travis-CI builds now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants