Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF error when trying to write Dataset read with rasterio to NetCDF #2535

Closed
loicdtx opened this issue Nov 1, 2018 · 17 comments · Fixed by #7671
Closed

HDF error when trying to write Dataset read with rasterio to NetCDF #2535

loicdtx opened this issue Nov 1, 2018 · 17 comments · Fixed by #7671
Labels
plan to close May be closeable, needs more eyeballs topic-backends

Comments

@loicdtx
Copy link

loicdtx commented Nov 1, 2018

I'm getting an HDF error when trying to write a Dataset read from GeoTiff (rasterio backend) to NetCDF. See reproducible example below:

import urllib.request
import tempfile
import os

import xarray as xr

path = tempfile.gettempdir()
url = 'https://earthexplorer.usgs.gov/browse/gisready/landsat_8/LC08_L1TP_026047_20180110_20180119_01_T1.zip'
filename = os.path.join(path, url.split('/')[-1])
nc_name = os.path.join(path, 'landsat_rgb.nc')

# Download file if not exist (11 Mb)
if not os.path.isfile(filename):
    urllib.request.urlretrieve(url, filename)

# Read rgb file using rasterio backend
rgb_name = '/'.join(['/vsizip', filename,
                     os.path.basename(filename).split('.')[-2] + '.tif'])
ds = xr.open_rasterio(rgb_name)
ds = ds.to_dataset('band').rename({1:'blue', 2:'green', 3:'red'})
print(ds)

# <xarray.Dataset>
# Dimensions:  (x: 7611, y: 7761)
# Coordinates:
#   * y        (y) float64 2.193e+06 2.193e+06 2.193e+06 ... 1.961e+06 1.960e+06
#   * x        (x) float64 3.732e+05 3.732e+05 3.733e+05 ... 6.015e+05 6.015e+05
# Data variables:
#     blue     (y, x) uint8 ...
#     red      (y, x) uint8 ...
#     green    (y, x) uint8 ...
# Attributes:
#     transform:   (30.0, 0.0, 373185.0, 0.0, -30.0, 2193315.0)
#     crs:         +init=epsg:32614
#     res:         (30.0, 30.0)
#     is_tiled:    1
#     nodatavals:  (nan, nan, nan)


# Write to netcdf
ds.to_netcdf(nc_name)

Output of xr.show_versions()

python -c "import xarray as xr; xr.show_versions()"

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.10.9
pandas: 0.23.4
numpy: 1.15.3
scipy: None
netCDF4: 1.4.2
h5netcdf: None
h5py: None
Nio: None
zarr: 2.2.0
cftime: 1.0.2.1
PseudonetCDF: None
rasterio: 1.0.9
iris: None
bottleneck: None
cyordereddict: None
dask: 0.20.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
setuptools: 40.5.0
pip: 18.1
conda: None
pytest: None
IPython: 7.1.1
sphinx: None

@jhamman
Copy link
Member

jhamman commented Nov 5, 2018

@loicdtx - thanks for the report. I just tried this using xarray master and didn't get an error. So, you could try that and see if that works. Since our last release, we've had a fairly significant refactor of the IO backends in xarray.

@loicdtx
Copy link
Author

loicdtx commented Nov 5, 2018

Hi @jhamman, just tried what you suggest. Apparently it's a scipy vs netCDF4 thing; it works with scipy but not with netCDF4 on both master HEAD and the latest stable release.

@jhamman
Copy link
Member

jhamman commented Nov 5, 2018

Interesting. I'm not seeing any difference when using scipy or netcdf4 or h5netcdf.

(by the way, thank you for the reproducible example)

@loicdtx
Copy link
Author

loicdtx commented Nov 5, 2018

yes, that's interesting... are you using a different OS? Let me know if there's something else I can help with.

@shoyer
Copy link
Member

shoyer commented Nov 6, 2018

Exactly what error message do you see?

@loicdtx
Copy link
Author

loicdtx commented Nov 6, 2018

@shoyer, below the full traceback; I installed everything with pip

mktmpenv -p python3
pip install xarray numpy netcdf4
pip install rasterio
Traceback (most recent call last):
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/api.py", line 724, in to_netcdf
    unlimited_dims=unlimited_dims, compute=compute)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/core/dataset.py", line 1179, in dump_to_store
    unlimited_dims=unlimited_dims)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/common.py", line 374, in store
    unlimited_dims=unlimited_dims)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 406, in set_variables
    super(NetCDF4DataStore, self).set_variables(*args, **kwargs)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/common.py", line 413, in set_variables
    self.writer.add(source, target)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/common.py", line 272, in add
    target[...] = source
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 48, in __setitem__
    data[key] = value
  File "netCDF4/_netCDF4.pyx", line 4648, in netCDF4._netCDF4.Variable.__setitem__
  File "netCDF4/_netCDF4.pyx", line 4913, in netCDF4._netCDF4.Variable._put
  File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rasterio_reprex.py", line 41, in <module>
    ds.to_netcdf(nc_name)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/core/dataset.py", line 1254, in to_netcdf
    compute=compute)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/api.py", line 729, in to_netcdf
    store.close()
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 474, in close
    ds.close()
  File "netCDF4/_netCDF4.pyx", line 2276, in netCDF4._netCDF4.Dataset.close
  File "netCDF4/_netCDF4.pyx", line 2260, in netCDF4._netCDF4.Dataset._close
  File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

@jhamman
Copy link
Member

jhamman commented Nov 6, 2018

Is this the same error you get with xarray/master?

@fmaussion
Copy link
Member

I can reproduce this on my machine - also linux. This is going to be hard to track down though.

@loicdtx
Copy link
Author

loicdtx commented Nov 6, 2018

@jhamman, yes, same error message when installing from master

@ghost
Copy link

ghost commented Dec 3, 2018

I have similar problem, when importing rasterio in the same script (not even using it for anything). This fails with HDF error:

import xarray as xa
import numpy as np
#import netCDF4
import rasterio

ds = xa.Dataset()
ds['z'] = (('y', 'x'), np.zeros((100, 100), np.float32))
print(ds)
ds.to_netcdf('test.nc')
ds.close()

with xa.open_dataset('test.nc') as ds:
    print(ds)

If I import netCDF4 before rasterio it works fine (uncomment line 3). This is probably an issue with rasterio somehow.

I installed everything with pip:

$ pip install Cython
$ pip install netCDF4 xarray rasterio numpy

From pip freeze:

affine==2.2.1
attrs==18.2.0
cftime==1.0.3
Click==7.0
click-plugins==1.0.4
cligj==0.5.0
Cython==0.29.1
netCDF4==1.4.2
numpy==1.15.4
pandas==0.23.4
pyparsing==2.3.0
python-dateutil==2.7.5
pytz==2018.7
rasterio==1.0.11
six==1.11.0
snuggs==1.4.2
xarray==0.11.0

@ghost
Copy link

ghost commented Dec 10, 2018

It seems that this is not a problem with xarray but only with rasterio and netCDF4. Also this fails:

import rasterio
import netCDF4

with netCDF4.Dataset('test.nc', mode='w') as ds:
    ds.createDimension('x')
    ds.createVariable('foo', float, dimensions=('x'))
    print(ds)

Commenting out import rasterio removes the HDF error. I’ll report this to rasterio.

@shoyer
Copy link
Member

shoyer commented Dec 10, 2018

This looks like a binary incompatibility issue with wheels for rasterio and netCDF4 on PyPI.

Good alternatives (for now) would be building from source or using conda-forge.

@ChristianF88
Copy link

Hi there,

I just wanted to let you know, that I do get the same error, when working with the following script. It does not use xarray or rasterio. So those will most likely not be the problem.

class Gif2NetCDF():
    def __init__(self, netcdf_name, netcdf_destfolder, gif_folder,
                 gif_filter_pattern=None, detailled_conversion=True):
        """

        Parameters
        ----------
        netcdf_name : str
            specifying the name of the netcdf file to write.

        netcdf_destfolder : str
            specifying the folder where the netcdf file is supposed to be created

        giffolder : str
            specifying the folder that contains the gif files, that are supposed to be written as an netcdf

        gif_filter_pattern : str
            specifying a re pattern to filter the all files in directory so you end up with the gif files you want.
            The here specified string will be passed to re.match which will be checked for each file in the provided giffolder


        Examples
        --------

        ## defining variables
        import netCDF4
        netcdf_name = "2016-07.nc"
        netcdf_destfolder = "./example/radar"
        giffolder = "./example/radar/2016-07/"

        ## create Instance
        cdf = Gif2NetCDF(netcdf_name, netcdf_destfolder, giffolder,gif_filter_pattern)
        ## write all (filtered) gifs in folder to netcdf file
        cdf.writeCDF()
        """

        ## creating global vars

        self.netcdf_name = netcdf_name
        self.netcdf_destfolder = netcdf_destfolder
        self.giffolder = gif_folder
        self.refilterpattern = gif_filter_pattern
        self.detailled_conversion = detailled_conversion

        self.netcdfFP = os.path.join(netcdf_destfolder, netcdf_name)

        # preparing the coordinate vectors
        self._lat_range = np.arange(479500, -160500, step=-1000)  # north
        self._long_range = np.arange(255500, 965500, step=1000)  # east

        # preparing time origin
        self.time_origin = dt.datetime.strptime("1970-01-01 00:00:00", "%Y-%m-%d %H:%M:%S")
        
        self.raincodes = np.array([....]) # this array is quite large, so I left it out... Which unfortunately means the code does not run... If anybody needs it please let me know

        return

    def list_gifs(self):
        self._gifs = os.listdir(self.giffolder)
        return self._gifs

    def filter_gifs(self):
        self._gifs = [file for file in self._gifs if re.match(self.refilterpattern, file)]
        return self._gifs

    def addDimensions(self):

        # adds dimensions to empty netcdf file
        self.latD = self.netcdfFile.createDimension('lat', self._lat_range.shape[0])  # north-south
        self.lonD = self.netcdfFile.createDimension('lon', self._long_range.shape[0])  # east-west
        self.timeD = self.netcdfFile.createDimension('time', None)

        return

    def addVariables(self):

        ## creating variables
        self.latV = self.netcdfFile.createVariable("chy", np.float32, ("lat",), complevel=9, zlib=True)  # north-south
        self.lonV = self.netcdfFile.createVariable("chx", np.float32, ("lon",), complevel=9, zlib=True)  # east-west
        self.timeV = self.netcdfFile.createVariable("time", np.float64, ("time",), complevel=9, zlib=True)

        self.rainV = self.netcdfFile.createVariable("rain", np.float32, ("time", "lat", "lon"), complevel=9, zlib=True,
                                                    fill_value=-100)

        ## adding units
        self.latV.units = "meters"
        self.lonV.units = "meters"

        self.timeV.units = "seconds"
        self.timeV.calender = "standard"

        self.rainV.units = "millimeter/hour"

        ## adding longname
        self.latV.long_name = "swiss northing CH1903"
        self.lonV.long_name = "swiss easting CH1903"
        self.timeV.long_name = "seconds since 1970-01-01 00:00:00"

        self.rainV.long_name = "precipitation intensity forecast"

        return

    def addDescription(self):

        self.netcdfFile.description = """..."""

        self.netcdfFile.history = """Created: {}""".format(dt.datetime.now().strftime("%Y-%m-%d %H:%M"))
        self.netcdfFile.source = '...'
        return

    def _write_static_dims(self):
        self.latV[:] = self._lat_range
        self.lonV[:] = self._long_range
        return

    def _write_time(self, file, datetime=None):

        if datetime is None:
            datestr = re.findall("\.([0-9]+)\.gif", file)[0]
            date = dt.datetime.strptime(datestr, "%Y%m%d%H%M")
        else:
            date = datetime

        seconds = (date - self.time_origin).total_seconds()
        current_size = self.timeV.size
        self.timeV[current_size] = seconds
        
        return

    def gif2array(self, file):
    
        xpix = 0
        ypix = 76
        n_pixel_x = 710 + xpix
        n_pixel_y = 640 + ypix

        gif = np.array(Image.open(file))[ypix:n_pixel_y, xpix:n_pixel_x].astype("float64")
        for idx, raincode in enumerate(self.raincodes):
            gif[gif == idx] = raincode[3]
        return gif

    def _write_rain(self, file):
        array = self.gif2array(os.path.join(self.giffolder, file))
        idx = self.rainV.shape[0] - 1
        self.rainV[idx, :, :] = array
        return

    def writeCDF(self):
        self.netcdfFile = Dataset(self.netcdfFP, 'w', format='NETCDF4_CLASSIC', )
        try:
            giflist = self.list_gifs()
            if self.refilterpattern is not None:
                fgiflist = self.filter_gifs()

            self.addDimensions()
            self.addVariables()
            self.addDescription()
            self._write_static_dims()

            for file in tqdm(self._gifs):
                self._write_time(file)
                self._write_rain(file)
                
        except Exception:
            self.netcdfFile.close()
            raise

        self.netcdfFile.close()

        return

Error

Traceback (most recent call last):
.
.
.
  File "C:\Users\foerstch\AppData\Local\Programs\Python\Python37\lib\site-packages\archiving\radar.py", line 358, in _write_rain
    idx = self.rainV.shape[0] - 1
  File "netCDF4\_netCDF4.pyx", line 4031, in netCDF4._netCDF4.Variable.shape.__get__
  File "netCDF4\_netCDF4.pyx", line 3369, in netCDF4._netCDF4.Dimension.__len__
  File "netCDF4\_netCDF4.pyx", line 1857, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

Does anybody have some advice on how to fix this?

Thanks a bunch!
Christian

@dcherian
Copy link
Contributor

dcherian commented Jun 7, 2019

HDF error usually means a corrupt file. Does ncdump -h work on your file?

I've found that sometimes ncdump -h will succeed but there'll still be some corrupt data which will result in an error when you try to load variable values

@ChristianF88
Copy link

Well the error occurs while writing the file. So if the reason is, that the file is corrupt then necdf4 is corrupting it... right? The problem is rather, that it does not even finish writing the file.

FYI: The error does not occur at the same file every time.

Have a good weekend!! ;)

@alpha-beta-soup
Copy link

Similar error when using xr.open_rasterio, a workaround seems to be to change the order in which my datasets are opened. Example:

import xarray as xr
ds = xr.open_dataset('/data/someFile.nc') # netcdf
m = xr.open_rasterio('/data/otherFile.tif') # geotif
# Everything is happy
import xarray as xr
m = xr.open_rasterio('/data/otherFile.tif') # geotif
ds = xr.open_dataset('/data/someFile.nc') # netcdf
# Results in the following error
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/file_manager.py", line 186, in _acquire_with_cache_info
    file = self._cache[self._key]
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/lru_cache.py", line 42, in __getitem__
    value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/data/someFile.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 420, in open_dataset
    filename_or_obj, group=group, lock=lock, **backend_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/netCDF4_.py", line 335, in open
    autoclose=autoclose)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/netCDF4_.py", line 293, in __init__
    self.format = self.ds.data_model
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/netCDF4_.py", line 344, in ds
    return self._acquire()
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/netCDF4_.py", line 338, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/file_manager.py", line 174, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/file_manager.py", line 192, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "netCDF4/_netCDF4.pyx", line 2291, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1855, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: b'/data/someFile.nc'
>>> xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6 (default, Sep 12 2018, 18:26:19) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
python-bits: 64
OS: Linux
OS-release: 4.15.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.3

xarray: 0.12.3
pandas: 0.25.1
numpy: 1.13.3
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.26
cfgrib: None
iris: None
bottleneck: None
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: None
IPython: None
sphinx: None

And I have the HDF5_USE_FILE_LOCKING environment variable set to FALSE.

FabianHofmann added a commit to PyPSA/atlite that referenced this issue Jan 21, 2021
* documentation: Update index.

* documentation: Add TODOs.

* documentation: Add 'Examples' category.

* resource.py add for accessing solarpanels and windturbines
utlis.py add class arrowdict

* Update configuration.rst

* documentation: Add user guide and api ref.

* Update README.md

* documentation: Auto API reference documentation.

* documentation: Structural updates.

* doc: api reference: bug fixes, add structure

* Delete os.path

Accidentally uploaded.

* example: Create cutout notebook.

* documentation: Add release notes for v0.2.

* doc enable napoleon extension

* convert.py docstring update

* wind.py update resource.turbines when downloading_turbineconf

* fix strings, exclude '.yaml'

* documentation: Update (WIP)

* Licensing: Add copyright and license to all files (REUSE).

* wind.py import resource.turbines in function due to import errror otherwise

* Adjust documentation config for copyright and project name.

* instructions.rst add images and 'how it works' description
basics.rst distribute among other chapters

* Update atlite-doc environment file with recommended environment dependencies.

* documentation: Update SPHINX conf.py strings and automatic version number.

* doc/cutout.rst update text and add commands
convert.py small docstring corrections

* delete basics.rst

* doc/user-guide continue work

* update environment_docs.yaml
further writing in doc/user-guide

* temporarly comment nbextension due to extensive memory consumption

* reduce environment_doc.yaml dependencies radically (try out if it still works)

* fix missing space in environment_docs.yaml

* convert: Substitute deprecated cutout.meta with cutout.data. (#29)

* Fix typo in README.md link to contributors.

* doc/contributing.rst: Fix typo in heading.

* __init__.py: Include new package docstring.

* setup.py: Include new description and authors.

* utils: Provide detailed re-creation information and catch error (fixes #33)

* Move download_turbineconf to resource module

To avoid circular dependencies.

* config: Add _update_hooks list for registering for config updates

* resource: Update resource dictionaries on config change

* __init__.py load turbines and panel from resource.py
wind.py remove unnecessary imports
resource.py fix missing imports,
	use _update_resource_dict in download_windturbineconfig
	ensure replacing '-' in wind_config strings

* Add minimal SPHINX Makefile.

* environment_docs.yaml: try again with nbshinx extension, include other packages as required

* Update names to windturbines and solarpanels for consistency

* environment_docs.yaml comment nbsphinx installation again

* configuration.rst: Update doc.

Fixes #44 changse in documentation.

* config: Allow more formats for configuration updates. (#44)

* doc/makefile bugfix
environment_doc test pypi installations due to memory efficiency
doc/* small continuations

* environment_docs another try with pyyaml in dependencies

* environment_docs move back to conda, but exclude unnecessary packages

* environment_docs build failed again (move nbsphinx into pip installation)

* environment_docs commenting out nbsphinx due to memory error (again!)

* Try to install "nbsphinx" in RTD with pip instead of conda.

* Try different configuration for RTD.

* Add pyyaml as extra required and RTD config cleanup

* setup.py: Include toolz for doc installation.

* Disable system_packages in RTD config.

* setup.py: Include python-dateutil for doc install.

* Substitute deprecated option in autodoc.

* setup.py move missing packages to install_requires

* Try RTD installation with conda/pip mixture again.

* Change RTD config for installing package.

* Remove local dir from RTD installation environment.

* Change RTD environment to use fixed versions for lower RAM requirements.

* Fix package version numbers in RTD environment.

* Switch back to RST and remove duplications by softlinking.

* MD does not allow for RST include directives making the switch necessary
* By using the includes, we remove duplications in author listings in the documentation and README.rst and have the chance to have a separate AUTHORS.rst file
* The release notes are also no longer duplicated and have a separate file in the root directory. An include-link brings them into the documentation.

* Replace README.md with README.rst.

* Prepare RTD for displaying and linking Jupyter notebooks.

* Provide examples in documentation from subfolder via link to repo root.

* Provide examples in documentation subfolder via link to repo root. (2)

Examples which where missing from the commit before #72c0eda .

* Disable nbsphinx cell execution in SPHINX for empty notebooks.

Workaround for execution errors (do not execute noetbooks. Any output wanted has to be generated locally and uploaded).

* setup.py: Fix wrong README import (.md instead of .rst).

* Update RTD environment.yaml for nbsphinx.

* Update formatting in example create_cutout.ipynb.

* Rename Logfiles_and_messages.ipynb to logfiles_and_messages.ipynb

Windows - Linux problem: Now solved.

* Include a note on contributing examples in notebook format.

* Try Jinja2 header for linking the doc with the notebook files.

* Add how-to warnings to example and spell checking.

* Add end-of-string to fix RTD build.

* Try another preamble for the RTD Jinja2 filter.

* Update RTD environment for improved performance.

* Minor changes in the documentation on contributing.

* Update RTD conf.py trying to fix nbsphinx_prolog / Jinja2 recipe.

* Update RTDs conf.py to get correct path to examples in the repository.

* Update RTDs conf.py to get correct path to examples in the repository (try 2).

* Update RTDs conf.py to get correct path to examples in the repository (try 3).

* Update RTDs conf.py to get correct path to examples in the repository (last try).

* utils: Re-raise MergeError on unsuccessful automatic migration (#33)

* Update module docstrings and update Cutout class docstring.

* examples: add notebook fot plotting

* examples and doc: make plotting_nb visible in docs

* doc/examples create nb_link for plotting_with_atlite.ipynb

* Update plotting_with_atlite.ipynb

* Autoformatting (autoformat pep8)
* Spell checks
* Axis labels
* Add note on descartes
* Add our basic logging recommendation

* era5: Move sanitation into dataset preparal

* convert: Refactor from direct access to high-level functions

* setup: Add requests dependency

* sarah: Port atlite.datasets.sarah to new framework

sarah: Update for data retrieval update

sarah: Fix imports

sarah: Fix _get_filenames

sarah: Fix get_coords

sarah: Add static_features

sarah: Fix imports

sarah: Fix get_data_era5

sarah: Add debug messages

sarah: Fix get_data

sarah: Fix _get_filenames

sarah: Fixup implementation

* era5: Fix handling of periods

* era5: Create and clean up separate tmpdir for intermediate downloads

The finalizer deletes files too early in a distributed setting, where the file
handles are pickled and restored.

* era5: Use gebco_path from config instead

* data: Only download a single day of data if that is all we need

* gis: Fix default crs setting

rio.warp.reproject fails when using 'latlong', so we use the epsg number instead

* data: Clean up tmpdir on exception

* data: Add progress bar for writing to disk

* sarah: Use a half-day chunk

Leading to a memory use of about 2.2GB for a cutout the size of Europe.

* datasets.common: Don't show the cdsapi download progress

* sarah: Fix interpolation for newer dask versions

* Update make.bat for windows to correspond to current minimal SPHINX file.

* Update dependencies

toolz is required by dask.

* Fixes #46 .

* Unset unused directories for gebco, ncep, sarah, cordex.
* Provide error message for unsuccessfull gdalwarp subprocess calls.

* Extends and partially reverses euronion/atlite@7a225ca .

* Reverse default config.
* Rather than checking for a set gebco_path, we check if the path specified exists.

* cutout: Add missing construct_filepath for properly treating config paths.

* Change configuration mechanism for using GEBCO height.

* cutout: Work around binary wheel incompatibility of netCDF4 and rasterio

Refer to
pydata/xarray#2535,
rasterio/rasterio-wheels#12

* Invert default ordering in latitude dimension

The default array ordering traverses now from small latitudes to large
latitudes, since this is how ERA-5 organises its data by default and it lead to
non-intuitive confusion several times. The most important changes are:

- One now specifies cutout bound slices always from small to large:

  atlite.Cutout("bla", x=slice(-10, 10), y=slice(40, 45), time="2012-01")

- Plotting using imshow has to be inverted explicitly from now on

    capacity_factor = cutout.wind(turbine="Vestas...", capacity_factor=True)
    plt.imshow(capacity_factor.transpose('y', 'x').values[::-1],
               extent=cutout.extent)

  We encourage users to use xarrays plotting facilities, instead,

    capacity_factor.plot()

* Ensure temporary files are released before deleting (fixes #47)

* Open cutout files with cache=False for saving memory

Caching is still possible with `cutout.data = cutout.data.load()`.

* convert: Make convert_wind fully dask-compatible

* datasets.era5: Support requesting interpolated datasets

* config: Store cutouts by default in the current directory

* Remove cordex config settings

CORDEX is available from CDS.

* pv: Add simple latitude orientation

With azimuth=180, it is angled to the south on the northern hemisphere and to
the north on the southern hemisphere (due to the negative values for slope)

* era5: Enable creating global cutouts

One needs to use: x=slice(None), y=slice(None)

* Revert "era5: Enable creating global cutouts"

This reverts commit 3e7b434.

* utils: Prevent deprecation warning from pkg_resources.

Absolute paths will be deprecated in pkg_resources.resource_filename.
Including a leading (back-)slash is recognised as such and raises a warning,
this change fixes this behaviour.

* example plotting: Include figures, update to match v0.2 and misc.

* Now including the figures for the example to be displayed in the doc.
* Update the plotting commands to xarray's plot() to match the new v0.2
reverse indexing order
* Misc: Show warnings and use the logger.

* Add new example: Historic comparison for Germany (PV, wind in 2012).

Example is based on the old "openmod-atlite-de.ipynb" example.

* data: Fix no-missing-no-overwrite-branch (fixes #47)

* utils: Correct error message for cutout migration.

* On trying to automatically migrate an existing cutout, display the correct (new) order (min,max) for coordinates in the cutout recreation command.
* Use the logging.error(...) facilities to correctly categorise the MergeError incl. the stack trace.

* logging: Use .warning() API instead of deprecated .warn() calls.

* utils: Only sort indices which can be in the wrong order

* Remove config in favour of dataset parameters

* sarah: Couple of small fixes and improvements

* datasets.sarah: Make file searching more robust

* setup.py: Update name and email

* resource: Fix path of oedb wind_turbine_library in docstring

* doc: Update cutout creation

* Fix unmatched braket.

* Update cutout creation example for ERA5 to new cutout signature.

* convert: Fix the use of atlite.windturbines dict

* gis: Do not pickle the spatial index

Turns out the spatial index gets distorted by pickling,
so we turn to_file and from_file to no-ops, until we
find a work-around.

* Update examples for new cutout signature.

* Add documentation for cutout creation from SARAH-2 dataset.

* Remove references to old configuration system and update index in doc.

* Fix runoff conversion broken by #55ddd7f97c .

* Add documentation for GEBCO in cutout creation.

* Bump and versions in RTD environment to workaround compiling error.

* examples: update plotting notebook

* fix #62

* Fix cutout creation with odd bounds (era5) (#65)

* fix #64
* ensure min/max assingments in coordinates

* ensure correct ordering of slices

* small fix

* small fixup of #68

* use strtree instead of rtree

* cutout.py: style, add line breaks to very long lines

* Ensure backwards compatability in cutout creation.

* fix literal and undefined name

* Remove cutout_dir from cutout constructor main signature.

* add test scripts

* add test for loading and preparation

* remove GridCell class

* remove sindex references

* change 'name' arg to 'path'

* adjust test: delete temporary cutout files

* add CachedAttribute decorator for property caching

* fix sarah selector for file parsing

* style only: break too long lines in [convert, data, sarah, gis, utils]

* Add PR and issue templates.

Co-authored-by: FabianHofmann <[email protected]>

* add conversion tests, fix small typo from previous commit

* rename test script

* Substitute deprecated .drop() by .drop_vars().

* test scripts: modify mktemp

* add dx, dy, dt as properties to Cutout class

* Dask compatibility (#77)

* make interpolation optional

* replace hourly_mean by resampling function

* sarah module: apply autopep8

* fix typo

* autopep8; move get_coords to high level

* revise data.py, sarah.py

* sarah use again as_slice

* spread common.py among dataset modules

* fix gebco as module

* adjust test

* change cutout representation

* fix era5 static

* fix features and allows multiple modules in cutout

* fix feature preparation

* sanitize prepare function, unify output of get_data

* fix logging for requests

* enable optional parallel loading

* data.py: - remove literal_eval -> cutout_parameters are directly given to cutout.data.attr,
				slices are directly processes in coordinates creation
	 - re-enable tmp_dir in cutout prepare
	 - to_netcdf has different mode 'a'/'w' depending on whether file exists
gebco.py: fix output of get_data
sarah.py: set interpolate always to true

* sarah.py re-enable optional interpolation

* - add docstrings for all functions in data.py/sarah.py/era5.py
- modify input variables of get_coords function.

* add docstrings for all of cutout.py and gebco.py

* cutout.py:
	- update docstrings for cutout class
	- remove support for cutou_dir, add warning and pointer to migration function
	- remove support for data argument as this requires further TODOs and can worked around very easily
	- remove default for module, this argument must be given
	- abolish is_view cases
	- add assertions for argument requirements 'x', 'y', 'time', 'module' when building new cutout

* cutout.py:
	- Improve argument exception
	- Make cutout representation better
	- Ensure projection in cutout building

* data.py make window class working with new feature handling
tests: run tests for era5 and mixed ['sarah', 'era5'] cutouts
cutout.py: reenable data as an optional argument

* autopep8 in pv/*.py
fix typo in migration function

* first take for benchmarking: load data with chunks and apply conversion function without windows

* cutout.py
	- clean imports
	- fully intergrate chunks as an cutout parameter and property
	- set chunking as standard loading of cutout
data.py - use cutout.chunks property
era5.py - use cutout.chunks property
sarah.py
	- use cutout.chunks property

* utils.py: fix import and swap_dimensions for old style cutout

* pv module:
	- make module dask friendly, this commit removes all .values call which cause dask to be unable to chunk.
	- direct import of numpy functions which are often used
convert.py:
	- restructure convert_and_aggregate function, this makes the function faster if only a layout is given.
	- change show_progress to bool only
	- change layout to be xr.DataArray only

* convert: Fix heat demand hourshift for xarray 0.15.1 (#63)

From xarray version 0.15.1, .values cannot be assigned. You should
use the .assign_coords() method instead.

See release notes for xarray 0.15.1 "breaking changes":

http://xarray.pydata.org/en/stable/whats-new.html#v0-15-1-23-mar-2020

* convert.py - rename heatdemand array

* cutout.py: - raise Error if old style

* cutout.py: restructure handling of projection. The projection of different modules
	is tested when initializing the cutout. The property 'projection'
	will then only look at the projection of first module.

* aggregate.py: remove aggregate_sum function
convert.py:
	- try out dense operation for indicator matrix multiplication
	- replace aggregate_matrix function with tensor dot (still figuring out performance)
gis.py:
	- argument shapes can now also be geopandas frame

* convert.py: reinstatiate aggregate_matrix function, but fix name of index

* convert.py: - make index valid for geopandas series and frame

* Allow saving of pseudo-boolean cutout attributes to netCDF.

* Fix interpolation option and defaults for SARAH cutouts.

* convert.py fix division for capacity factor calculation
cutout.py fix represention of features

* Rename cutout parameters for interpolation of SARAH data.

* review examples:
	- clean and run create_cutout and plotting notebook
	- delete tiny create_cutout.py script
	- add comparison script for verions 1 and 2

* Apply suggestions from code review

Co-authored-by: Jonas Hörsch <[email protected]>

* add suggestions:
	- aggregate.py ensure index name
	- convert.py updat docstring
	- convert.py remove index.name defaults as done by aggregate_matrix function
	- convert.py reenable progrss bar for hydro
	- cutout.py fix import structure
	- data.py remove unneeded code
	- sarah.py add assertion for time resolution

* data.py store booleans as int for to_netcdf

* prepare cutout: move prepared_feature assignment directly before storing

* update authors list
fix suggestions in migration function

* revise all imports

* * change chunk size attribute from 'chunk_{dim}' to 'chunksize_{}'
* fix cutout.prepared_features for one feature only

Co-authored-by: Tom Brown <[email protected]>
Co-authored-by: euronion <[email protected]>
Co-authored-by: Jonas Hörsch <[email protected]>

* aggregate.py do not immute index with no name

* examples: rerun and comment

* example: historic reanalysis notebook, add head comment, fix headings

* examples: rerun sarah notebook

* doc: 	- remove User Guide section in favour of notebooks
	- add notebook for gebco heightmap
cutout.py & data.py: small fix for cutout with spatial dimension only

* tests: Use pytest fixtures and marks (#83)

* data: Re-enable parallelized queueing (#87)

* era5: Use lock to support download progress bars

* era5: Fix retrieval_times

* era5: Re-enable warnings

* era5: Fix int64 is not serializable to JSON exception

* data: Re-enable parallelized queueing

Since queueing times are significantly longer than downloading, combine
dask.delayed with a lock to queue in parallel but download in series.

* Document breaking changes, user warning due to change in cutout index order.

* fix updating features & sanitize static variables in era5 (#91)

* era5.py - set output of get_data to xr.Dataset
	- sanitize static features, whereas the variable should not have a
	  time dimension, the returned dataset of get_data should have
data.py - parallize feature loading -> delayed call of get_data

* era5.py - set delete file message to logger.debug

* era5.py fix typo

* era5.py - bug fix time keys

* data.py:
	- store data file without appending to netcdf as it is unsecure, instead create a new one
era5.py - revert time assigning to static feature as not needed
test 	- add test for updating a variable

* data.py - fix name of temporary file
era5.py	- stream info log of client through logger.debug

* Update atlite/data.py

Co-authored-by: euronion <[email protected]>

* reenable progress bar

* era5: Logging differentiates between request and download (#94)

* cutou.py: - selection must have a separate name
data.py:
	- make tempfile creation safe
	- logging for tmp dir
test 	- adjust test for selection
era5.py - fix division by zero

* test - use fixture for updatable cutout

* era5.py remove sleep time again
cutout.py set chunks to 100 as it is faster

Co-authored-by: euronion <[email protected]>
Co-authored-by: Jonas Hörsch <[email protected]>

* [v0.2] review suggestions (#95)

* cutout: Make new `path` argument optional

* cutout: Make dx/dy calculation more robust

* data: Do not leak the temporary directory

* bugfixes of PR #95

* update README.rst

* Fixup README

* Fixup README II

* add workflow chart

* README: make workflow image look better

* test remove dummy test

* AUTHORS.rst fix table

* README.rst fix inclusion of authors list

* README.rst fix authors list II

* Fix link to authors in README.rst.

* README.rst reintegrate Installation section

* data.py fix return_capacity with layout not None (#97)

* era5.py disable warning for ds.drop()
test: temporarly disable windows machines

* Fix divisions by zero (#98)

* solve #55

* fixup

* era5.py remove errstate with block as obsolete

* era5-module: tiny fix string formatting

* data.py: fixup for no features given

* cutout.py: add 'module' to __repr__

* data.py tiny code style fix

* aggregate.py use dask_gufunc_kwargs

* Ci mamba (#110)

* ci: use mamba instead of conda

* follow up, add comment [skip travis]

* follow up

* follow up, fix conda activate

* ci: playaround, remove conda specifications

* fix numpy <-> daks incompatibility in clip function (#111)

* irradiation.py: replace .clip by .where due to new numpy/dask incompatibility

* follow up, only apply .where where necessary

* travis: do not fix mamba version

* Windows machine fix (#109)

* enable ci on windows

* data.py: use TemporaryDirectory instead of mkdtemp

* data.py revert last commit, try now with wrapper

* fix travis env for windows machines

* follow up: write pip and pytest dependencies in env file

* env: add libspatialindex to requirements

* travis: reintroduce strict channel order due to installation problems on windows

* Cutout grid (#112)

* introduce Cutout.grid
make Cutout.grid_cells and Cutout.grid_coordinates deprecated

* follow up

* adjust plotting example

* update release notes

* test_creation.py adjust test

* test: tiny fix up

* add crs to Cutout.grid

* follow up: add comment [skip travis]

* release notes: fix typo [skip travis]

* Update to pyproj 2 (#92)

* Rename projection to crs

Follows pyproj in nomenclature. See https://pyproj4.github.io/pyproj/stable/gotchas.html#upgrading-to-pyproj-2-from-pyproj-1 .

* environment: Remove channel pinning

Channel pinning has been superseed by strict channel_priority as
proposed at https://conda-forge.org/docs/user/tipsandtricks.html.

* gis: Add grid_cell_areas function to compute areas of grid cells

* cutout: Fix forgotten conversion

* gis: Improve grid_cell_areas

* remove area calculation due to geopandas implementation

* update release notes

* gis.py: revise imports

Co-authored-by: Fabian <[email protected]>

* gebco: Extract and resample data from GEBCO using rasterio [DNMY] (#93)

* gebco: Extract and resample data from GEBCO using rasterio

* tiny fixup of inversed y-axis and data array accessing

* fix numeric tags

Co-authored-by: Fabian <[email protected]>

* use 'cea' projection instead of 'aea' (alternative to #102)

* Modified cutout creation (#114)

* * add warning for ignoring cutoutparams if cutout already exists
* reintroduce Cutout.prepared

* follow up

* gis.py: adjust deprecation warning

* data.py set "feature" attribute for every variable (#115)

cutout.py make prepared features more secure

* Cutout merge (#116)

* cutout.py add merge function
pytest add merge test

* cutout.py: when data is passed and path is non-existent, write out file
path in cutout.merge and cutout.sel has to be non-existent

* adjust docstrings

* revert second last commit, add cutou.to_file function

* revert unneeded assert

* follow up: update docstrings [skip travis]

* irradiation.py: add comment to clipping, use other approach
solar_position.py: saver/cleaner approach for chunking

* utils.py: track module and feature when migrating (#117)

* Convert fix (#118)

* convert.py catch case of no layout given

* convert.py: restructure convert_and_aggregate for correctly handling all input combinations

* test: pv add rounding to assert

* convert.py secure division by zero

* update examples

* update years and authors

Co-authored-by: euronion <[email protected]>
Co-authored-by: Fabian <[email protected]>
Co-authored-by: FabianHofmann <[email protected]>
Co-authored-by: Tom Brown <[email protected]>
@kmuehlbauer
Copy link
Contributor

Just a heads up, I can't reproduce this with latest packages from conda-forge for neither for xr.open_rasterio nor xr.open_dataset with engine=rasterio. Maybe the issue got resolved upstream.

@headtr1ck headtr1ck added the plan to close May be closeable, needs more eyeballs label Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs topic-backends
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants