-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to run open_virtual_dataset
in parallel
#95
Comments
multithreading does not work with HDF5/netCDF4. There's a process-level lock in the HDF5 C library so you have only serial access. Use |
Ah thanks! Running that now and will report back. |
vds_list_parallel_processes = dask.compute(vds_lazy_list, scheduler='processes') Also looks pretty dissapointing.
Is there something on the fsspec level that locks this? Have to move on from this for now, but would try to run this with local netcdfs first to get rid of fsspec? |
Running (EDIT: I did execute the below two cells independently and the timing only represents the latter) import s3fs
fs = s3fs.S3FileSystem(anon=True)
local_files = [file.split('/')[-1] for file in files]
for file, local_file in zip(files, local_files):
fs.get_file(file, local_file) %%time
vds_lazy_list_local = [open_virtual_(f, filetype=FileType.netcdf4, indexes={}) for f in local_files]
vds_list_parallel_processes_local = dask.compute(vds_lazy_list_local, scheduler='processes') was faster!
|
Thanks for trying this @jbusecke ! Presumably |
Oh sorry this was not posted clearly, I have edited the post above. |
lol now it makes a lot more sense. |
I am trying to build onto #93 and reduce the time that is needed to create the reference file:
My setup is the following:
To establish a baseline I read each file in serial:
I then naively tried to wrap open_virtual_dataset with dask.delayed:
But I am seeing no speedup at all:
I am not sure if there is anything wrong in the way I am setting this problem up that prevents parallelism, but would be curious to hear others opinion about this.
The text was updated successfully, but these errors were encountered: