-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing file objects to netCDF4.Dataset doesn't work #295
Comments
We would also love to have this, but unfortunately, I don't think it's an easy fix (as it would require delving into the netcdf C library). Hopefully @jswhit can elaborate. |
scipy.io_netcdf is a pure python module that reads and writes netcdf-3 formatted files directly. netcdf4-python is a python interface to the netcdf C library, and can handle the netcdf HDF5-based file format. There's no way for the C library to utilize a python file object. |
I doubt there's "no way", but I don't have well defined sense of how difficult it would be (or who actually knows enough to do it). There is, for example, a C side API for working with Python file objects: https://docs.python.org/2/c-api/file.html |
I should have said "impossible without extensive modifications to the HDF5 and netCDF C libs". I suppose as a workaround we could dump the bytes from the open file object to a temp file, and then pass the name of that temp file to the netCDF C lib. |
Indeed, this is the simplest way to solve this problem. But I would say that sort of solution belongs in user code, not this library. |
maybe I'm missing something here, but a filehandle has a .name attribute, so could the code to work around this, and then by extension offer the way to fix the issue in the python layer look like:
? It's not especially pretty, but at least it enables expected behaviour to be preserved. |
@marqh that would work for this example, but in general a python file object only needs to adhere the file API -- it need not be an actual file on disk (e.g., it could be a |
Hi everyone - I'd love to see a fix for this. Its coming up quite regularly in "map reduce" world (i.e. Hadoop, Spark, Dask) where we want to be able to pass file objects around and read them quickly, that is without dumping to disk. Is there anything on the horizon that might help out with this? |
netCDF-C last fall gained support for reading directly from an in-memory buffer that contains the bytes of a netCDF file. It's been on my TODO list to expose this in the Cython wrappers here, but I haven't gotten to it. That's probably your best bet--it's not a file-like object, but at least you wouldn't have to have a file on disk any more. |
great - thanks for the update |
@dopplershift any update on this? |
As the original creator of this issue, I am pleased to see it is still alive. I am still very interested, although more for the reasons described by @niallrobinson. I believe the in-memory buffer solution could solve things. To clarify, would be be able to pass a BytesIO object? |
yup - still actively thinking/worrying about this ;) |
It's still on my todo list, but it hasn't bubbled to the top. I'll try to squeeze it in sooner rather than later (since I don't think it's that hard), but can't make any promises (especially before AMS annual meeting in January). I don't see import bz2
from netCDF4 import Dataset
bz2_fobj = bz2.BZ2File('MODIS.nc4.bz2')
nc4 = Dataset(bz2_fobj.read()) Would that would serve the use cases mentioned here? |
An interface that accepts file images in the form of The driver of performance is the number of memory copies. With |
Here's the documentation for the netCDF-C routine ( |
@jswhit nice!!! I'm going to try seeing if I can get this to work in a fork. Update, created linked PR, unfortunately |
update for others on this thread, in master you can now open a file from memory (not released to pypi yet unfortunately) |
@thehesiod can you show an example please. I am interested to use this with pyfilesystem2, e.g. webdav, ftp direct access. |
You should be able to use: Dataset('myname', memory=fobj.read()) There was a problem with |
Still must be non-empty name I believe |
@ReimarBauer if still interested, here is the solution where I used pyfilesystem2 to read zipped netcdf files: from fs.zipfs import ZipFS
import xarray as xr
import netCDF4
new_zip = ZipFS("results.zip")
bytes = new_zip.getbytes(u'one_file_within_zip.nc')
nc4_ds = netCDF4.Dataset('name', mode = 'r', memory=bytes)
store = xr.backends.NetCDF4DataStore(nc4_ds)
ds = xr.open_dataset(store) |
Hello, I am retrieving a BytesIO object from a REST API response and I would like to read directly the Dataset from it without having to first write the object on disk. Is there a way to do this? |
@mir-una Just like the other ones above: data_bytes = response.read()
nc4_ds = netCDF4.Dataset('name', mode='r', memory=data_bytes) |
@dopplershift thank you but I cannot figure it out, I am using the requests package, read() does not seem to be a method supported... I am doing the following: |
@mir-una Using BytesIO is unnecessary, try: response = requests.get(my_url, params=token, stream=True)
y=Dataset('name', mode='r', memory=respons.content) |
I'm trying to read in to a Dataset from memory as per the docs but it's not working tried 2.7 and 3.7 and get the same error
code:
netCDF4 version '1.5.0' a |
Can you post the file here? (attach to ticket as a gzipped tar file?) |
Also, what version of netcdf-c are you using? (you can check by looking at the |
Thanks @jswhit you solved my issue over here It was version |
I am trying to port some code from scipy.io.netcdf_file to netCDF4.Dataset. I have encountered an issue which is pretty significant for me. netCDF4.Dataset expects a string as its argument and is unable to accept an open file object. The issue can be seen in the following code
The second-to-last line raises
This may seem like an unnecessary feature (why not just pass the filename directly), but the problem is that I have a large archive of bzipped netcdf files on disk. The way I usually read them is
If I can't do this with netCDF4, I will have do design a clumsy workaround involving system commands to manually unzip the files.
I considered tying to add this feature myself, but then I realized that the whole library was written in C. Hopefully you will consider adding support for reading file objects.
The text was updated successfully, but these errors were encountered: