-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF5/Tensorflow compatibility issues with NetCDF4 file saving #242
Comments
Hey @willgraf, update on this. I was going to submit an issue to the netCDF4 package maintainers, but I can't recreate the issue outside of Docker. If I set up a virtual_env, pip install the requirements.txt file from the current docker build, and run the code, it doesn't generate the above error. However, attempting this within a jupyter notebook from docker does prompt the error. Is there something docker specific that could be causing this, or a hidden dependency that isn't captured in the requirements.txt file that's leading to the issue? Here are the steps to reproduce the error-free version outside of docker:
I'm using the following requirements.txt file: |
Hi Will, I think I figured out the issue. However, if I load xarray first, I don't get the error, even If i then load deepcell afterwards. I think the default library that is getting used as the backend is dependent on which package gets initialized first. I'm going to close this for now, since the workaround of loading in a different order is fine for now. |
So now that I'm going to be adding a multiplexed applications model this issue has come up again. Here's the summary from what I discovered so far:
The above save command fails with an HDF error. The workaround I discovered was that if the save command comes before tensorflow is imported, it works, and so do subsequent save/load commands:
However, the really strange thing is that both versions work outside docker! Specifically, if i make a virtualenv and pip install all the same requirements, I don't have any issues executing the first example. This is with added to the requirements.txt file |
Seems like others have run into this issue as well. It could be a scipy engine vs netcdf engine issue? This other issue makes it sound like which format are you using? It sounds like |
So I had been using NETCDF3_64BIT before, but it would fail on files that were like 3 GB. It now appears to be working up through files that 10 GB. Not sure why that's the case, but that's more than enough for our typical use case. I'll close this (again) for now. Thanks for the help! If it crops back up again, I may have to investigate not installing with pip, which appears to be what triggers these issues. |
The current set of requirements in the docker image is not compatible with the newest scipy backend for saving data in xarray. In particular, if I add xarray==0.12.1 and netcdf4==1.4.2 as requirements, rebuild the image, and run the code below, the legacy format (NETCDF3_64BIT) works, but the newest version (NETCDF4) does not.
This is an issue because NetCDF3_64BIT only supports saving files that are 4GB or less. The segmentation channels for a typical MIBI cohort takes up more space than this.
A current workaround is just to save the cohort into multiple distinct data files and run each of them separately through deepcell.
The error message I get is below:
The text was updated successfully, but these errors were encountered: