Preparation of v0.2 #20

coroa · 2019-06-12T21:19:04Z

The first official version of atlite is getting closer and there is a bunch of good stuff coming.

Changes

The main change is that a cutout corresponds to a single netcdf file for the whole cutout period, which is fully accessible as a xarray at cutout.data.

This makes it possible to iterate over the data in customizable slices: cutout.wind(shapes=countries, turbine="Vestas_V90_3MW") uses months as in the previous version, while f.ex. cutout.wind(..., windows='Y') uses years. windows can be anything that pd.Grouper understands, ie. 'D', 'M', 'Y' or even '2D'. windows = False makes it possible to apply the conversion function to the data as a whole. It should also be possible to choose windows compatible with a particular time zone to avoid the re-averaging that was necessary for heat_demand.

The data for cutouts is now grouped into different features (from the ERA-5 dataset):

features = {
    'height': ['height'],
    'wind': ['wnd100m', 'roughness'],
    'influx': ['influx_toa', 'influx_direct', 'influx_diffuse', 'influx', 'albedo'],
    'temperature': ['temperature', 'soil_temperature'],
    'runoff': ['runoff']
}

It's possible to prepare a cutout only for a subset of the available features: cutout.prepare(['runoff', 'wind']). One can always extend the cutout by running prepare again.

One can load a cutout fully into memory using cutout.data.load() (or cutout.data.wnd100m.load(), cutout.data.roughness.load() for wind only), which should fully supersede @euronion 's caching from Introduce dataset caching and outsource wind speed extrapolation #9 .
It's easy to get a subset of a cutout using .sel directly induced by atlite's sel function: cutout.sel(time="2012-01") or cutout.sel(time="2012-07", bounds=german_shape.bounds)

Open questions

config.py has been completely removed instead one has to provide the necessary paths explicitly when creating new cutouts. In addition we could allow reading in a config file like ~/.atlite.config or some such?
Should data cleaning methods be moved into datasets (ie surface roughness <= 0. ->0.002)? I think that would be a good idea! Are there objections?
When data is read in as dask arrays, it is not mutable in the conversion functions leading to Exceptions. We can either change everything to copy-on-change (ie use clip) or catch the error and throw a more helpful error message to have to prepare the dataset? Related to Parallelised calculations using dask. #30. To be conservative we load dask arrays before they are passed to the conversion functions.

Remaining TODOS

I'm happy about everyone, who wants to test the new version, provide feedback or help with documentation or the remaining todos! @leonsn @nworbmot @FabianHofmann @schlott @fneum

coroa · 2019-06-18T12:06:03Z

@euronion : Do you have time to test whether this branch works for you?

The easiest invocation to get a cutout for wind generation covering germany now is something along the lines of:

germanshapefile = ...
cutout = atlite.Cutout("<cutoutname>", bounds=germanshapefile.buffer(0.2).bounds, time="2012", module="era5")
cutout.prepare(["wind"])

This generates a <cutoutname>.nc file in the current directory containing wind speed and surface roughness in hourly resolution. This file is opened and fully accessible as cutout.data. You can load everything into memory using cutout.data.load().

The previously used wind generation function cutout.wind (like the other conversion function) additionally now understands a windows argument. Using windows=False will apply the conversion to the whole time-series at a time, while the default windows = 'M' goes monthly, as `windows = 'Y' works on yearly slices.

euronion · 2019-06-18T12:16:01Z

I'd love to, but I can't spare sufficient ressource on it at the moment.
I estimate I can look at it in ~2 weeks earliest and then do some testing work:

I would also

include my config branch (https://github.com/euronion/atlite/tree/config-change) which allows for a config file.

And do I would at the same time work on an

update of the documentation with the official upcoming version.

including

some examples to follow along.

I really hope you can wait that long :)

coroa · 2019-06-18T13:12:33Z

@euronion We're not under any particular time pressure right now. I'd like to merge this around mid-July. Would be great if you can integrate your changes. I think it would be ideal to work on pull requests against this branch!

euronion · 2019-06-18T13:22:49Z

@coroa Yes o/c only PR against this branch. Mid of July sounds good.

coroa · 2019-06-18T16:29:43Z

Atlite in the current version 0.0.2 is now available from PyPI as well as conda-forge. The version prepared in this branch will be tagged as 0.1 (sic!) as soon as I merge this branch (the branch name will stay v0.2, for the time being).

euronion · 2019-07-18T09:07:24Z

Hi @coroa ,
Other things on my side took longer than expected, I started working on the changes this week,
expect some PRs from me at the end of this week or start of next week :)

euronion · 2019-07-23T21:09:54Z

Cross-checking:
Dropping support for Python 2.X is understandable, but was it intended to also drop support for <3.6?
I came across the use of f-strings which are not supported on lower Python versions.

In any case think best practice is to include the Python versino into the setup.py? E.g. with

python_requires='>=3.6'

coroa · 2019-07-23T21:18:17Z

Semi-intended. I don't want to miss out on f-strings anymore, and even Debian stable these days already ships with python 3.7 so a requirement for 3.6 seems fair to me. Is there any reason to have a python 3.5 or lower?

Making the requirement explicit in setup.py and the conda-forge recipe should be done, true.

euronion · 2019-07-23T21:33:54Z

Mainly just asking.
The only ad-hoc reason would be that you might want to apply the same restrictions on the Python version for atlite as for e.g. pypsa.

Add python version requirement >= 3.6 to setup.py.

atlite/data.py

euronion · 2019-07-29T15:58:20Z

The windows=... option does not return an error after changing base_string to str and seems to work as intended (good).
But I do not understand its purpose, I would need another explanation / example / documentation for it.

coroa · 2019-07-30T13:53:16Z

windows is an argument available to any function wrapped by a @requires_windowed decorator. It allows you to choose how to traverse the cutout.data dataset.

Omitting it, is equivalent to the argument windows = "M". This will break cutout.data into monthly chunks, so that the for-loop in cutout.convert_and_aggregate iterates over slices of a month. Internally, the windows argument is converted to an iterator by code similar to the following:

windows = xr.core.groupby.DatasetGroupBy(self.data, self.data['time'], grouper=pd.Grouper(freq="M"))._iter_grouped()

over which the convert_and_aggregate function then iterates and calls the conversion functions.

Other allowed strings are "2M" for slices of two months, and "D" for days, "Y" for years. If you supply an integer it is used to feed the bins argument of xr.core.groupby.DatasetGroupBy. For instance windows=2 splits the time axis in two. We will have to experiment with what good choices are.

This grouping mechanism can be switched off using windows=False. For a regular dataset as loaded from a cutout NetCDF file, xarray will then try to convert the whole data in one go. If your memory is big enough and you want to do a lot of repetitions it's probably best to preload the wind data

for feature in cutout.dataset_module.features["wind"]:
    cutout.data[feature].load()

and convert it in one go

cutout.wind(turbine=..., windows=False)

What needs to be investigated a bit further is the possibility to use dask automatically for the heavy lifting:
cutout.data.chunk(time=744) will split the dataset into approximately monthly chunks, and then cutout.wind(turbine=..., windows=False) will use dask to do the wind conversion for these monthly chunks. With the right configuration of dask this should even happen in parallel on multiple processors.

euronion · 2019-07-30T16:26:08Z

What the windows argument is becoming somehow clear.
Still I do not see any benefits and reason for it.
Except for cosmetic changes (the resolution of the progressbar) and (maybe a slight) performance difference, it does not do much?

Preloading features has become really easy and convinient with the change.
xarray and dask also do a nice job and keeping parts of the cutout.data loaded automatically (not all of it though, so preloading still gives a few seconds of extra performance for repeated calculations).

As a universal and really simple solution for the chunk sizes, using the new auto keyword from dask could work.
I.e. what I (successfully) tried was

cutout.data.chunk({dim:'auto' for dim in cutout.data.dims})

I doubt we will find a better universal solution for the chunk sizes. This has been a long standing issue for xarray and dask and depends on the cutout size (spatial and temporal) as well as confiugration, hardware setup and use case.
But that should not come as a suprise, that is always an issue of optimisation of parallelisation.

E.g.:
I was playing around a bit with it today and the best I could do was 8s / iteration (without dask) and ~18s with dask and a chunk={'time':2500} and the auto-chunk above with ~11s.

coroa · 2019-07-30T16:45:43Z

The windows machinery enabled switching out the data backend completely, while keeping compatibility with non-dask-ready conversion functions.

coroa · 2019-07-30T16:49:38Z

I'd be open to throwing it out in a separate PR in which we transition everything to dask and don't incur huge performance losses in the process.

The bottleneck will be the conversion of the pv module. When I originally tried to implement it using dask only, a couple of years ago, I regularly broke dask in the process, which is why the original atlite version finally iterated through separate files. It's possible that dask has improved enough by now, but we will have to clock and measure it; until then we'll need windows.

euronion · 2019-07-31T06:39:00Z

No, let's have the windows feature included in the upcoming version.
The idea is powerful and should be the way to go in the future.

Not knowing what went wrong back then with dask, I'd say we could try again. For this use-case dask does not always seem to be the most performant choice. As you say: if we try it, we should also clock and measure it.

utlis.py add class arrowdict

…documentation

* solve #55 * fixup

* ci: use mamba instead of conda * follow up, add comment [skip travis] * follow up * follow up, fix conda activate * ci: playaround, remove conda specifications

* irradiation.py: replace .clip by .where due to new numpy/dask incompatibility * follow up, only apply .where where necessary

* enable ci on windows * data.py: use TemporaryDirectory instead of mkdtemp * data.py revert last commit, try now with wrapper * fix travis env for windows machines * follow up: write pip and pytest dependencies in env file * env: add libspatialindex to requirements * travis: reintroduce strict channel order due to installation problems on windows

* introduce Cutout.grid make Cutout.grid_cells and Cutout.grid_coordinates deprecated * follow up * adjust plotting example * update release notes * test_creation.py adjust test * test: tiny fix up * add crs to Cutout.grid * follow up: add comment [skip travis] * release notes: fix typo [skip travis]

* Rename projection to crs Follows pyproj in nomenclature. See https://pyproj4.github.io/pyproj/stable/gotchas.html#upgrading-to-pyproj-2-from-pyproj-1 . * environment: Remove channel pinning Channel pinning has been superseed by strict channel_priority as proposed at https://conda-forge.org/docs/user/tipsandtricks.html. * gis: Add grid_cell_areas function to compute areas of grid cells * cutout: Fix forgotten conversion * gis: Improve grid_cell_areas * remove area calculation due to geopandas implementation * update release notes * gis.py: revise imports Co-authored-by: Fabian <[email protected]>

* gebco: Extract and resample data from GEBCO using rasterio * tiny fixup of inversed y-axis and data array accessing * fix numeric tags Co-authored-by: Fabian <[email protected]>

* * add warning for ignoring cutoutparams if cutout already exists * reintroduce Cutout.prepared * follow up

cutout.py make prepared features more secure

* cutout.py add merge function pytest add merge test * cutout.py: when data is passed and path is non-existent, write out file path in cutout.merge and cutout.sel has to be non-existent * adjust docstrings * revert second last commit, add cutou.to_file function * revert unneeded assert * follow up: update docstrings [skip travis]

solar_position.py: saver/cleaner approach for chunking

* convert.py catch case of no layout given * convert.py: restructure convert_and_aggregate for correctly handling all input combinations * test: pv add rounding to assert

FabianHofmann · 2021-01-21T21:17:43Z

finally :)

coroa changed the title ~~Preparation of v0.2~~ Preparation of v0.1 Jun 18, 2019

euronion reviewed Jul 29, 2019

View reviewed changes

atlite/data.py Outdated Show resolved Hide resolved

euronion mentioned this pull request Jul 30, 2019

Parallelised calculations using dask. #30

Closed

coroa changed the title ~~Preparation of v0.1~~ Preparation of v0.2 Aug 8, 2019

fneum mentioned this pull request Aug 8, 2019

Create minimal data bundle for beginners in pypsa-eur PyPSA/pypsa-eur#40

Closed

euronion and others added 10 commits August 8, 2019 11:54

documentation: Update index.

0081a90

documentation: Add TODOs.

5206df2

documentation: Add 'Examples' category.

47306f3

resource.py add for accessing solarpanels and windturbines

b9c9727

utlis.py add class arrowdict

Merge branch 'documentation' of https://github.com/PyPSA/atlite into …

07cbc9e

…documentation

Update configuration.rst

058e6fd

documentation: Add user guide and api ref.

09dc21b

Update README.md

14307ee

Merge branch 'documentation' of https://github.com/PyPSA/atlite into …

6e97d8d

…documentation

documentation: Auto API reference documentation.

94c608e

FabianHofmann and others added 27 commits June 16, 2020 15:37

Fix divisions by zero (#98)

3bf0fcd

* solve #55 * fixup

era5.py remove errstate with block as obsolete

0f54254

Merge branch 'v0.2' of github.com:PyPSA/atlite into v0.2

ba5daae

era5-module: tiny fix string formatting

dbf3d17

data.py: fixup for no features given

c5808e6

cutout.py: add 'module' to __repr__

6481304

data.py tiny code style fix

34d1281

aggregate.py use dask_gufunc_kwargs

6ba2bd7

Ci mamba (#110)

462028d

* ci: use mamba instead of conda * follow up, add comment [skip travis] * follow up * follow up, fix conda activate * ci: playaround, remove conda specifications

fix numpy <-> daks incompatibility in clip function (#111)

8ae30e7

* irradiation.py: replace .clip by .where due to new numpy/dask incompatibility * follow up, only apply .where where necessary

travis: do not fix mamba version

12e1189

gebco: Extract and resample data from GEBCO using rasterio [DNMY] (#93)

f217878

* gebco: Extract and resample data from GEBCO using rasterio * tiny fixup of inversed y-axis and data array accessing * fix numeric tags Co-authored-by: Fabian <[email protected]>

use 'cea' projection instead of 'aea' (alternative to #102)

0463175

Merge branch 'master' into v0.2

4821e37

Modified cutout creation (#114)

d497fd4

* * add warning for ignoring cutoutparams if cutout already exists * reintroduce Cutout.prepared * follow up

gis.py: adjust deprecation warning

811f91c

data.py set "feature" attribute for every variable (#115)

2cf3a1c

cutout.py make prepared features more secure

irradiation.py: add comment to clipping, use other approach

91233c2

solar_position.py: saver/cleaner approach for chunking

utils.py: track module and feature when migrating (#117)

4705d03

Convert fix (#118)

b139612

* convert.py catch case of no layout given * convert.py: restructure convert_and_aggregate for correctly handling all input combinations * test: pv add rounding to assert

convert.py secure division by zero

e4a903f

update examples

d07be82

update years and authors

6a01214

FabianHofmann merged commit 12846ce into master Jan 21, 2021

FabianHofmann deleted the v0.2 branch March 3, 2021 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preparation of v0.2 #20

Preparation of v0.2 #20

coroa commented Jun 12, 2019 •

edited

Loading

coroa commented Jun 18, 2019 •

edited

Loading

euronion commented Jun 18, 2019 •

edited

Loading

coroa commented Jun 18, 2019

euronion commented Jun 18, 2019

coroa commented Jun 18, 2019

euronion commented Jul 18, 2019

euronion commented Jul 23, 2019

coroa commented Jul 23, 2019

euronion commented Jul 23, 2019 •

edited by coroa

Loading

euronion commented Jul 29, 2019 •

edited

Loading

coroa commented Jul 30, 2019

euronion commented Jul 30, 2019

coroa commented Jul 30, 2019 •

edited

Loading

coroa commented Jul 30, 2019 •

edited

Loading

euronion commented Jul 31, 2019

FabianHofmann commented Jan 21, 2021

Preparation of v0.2 #20

Preparation of v0.2 #20

Conversation

coroa commented Jun 12, 2019 • edited Loading

Changes

Open questions

Remaining TODOS

coroa commented Jun 18, 2019 • edited Loading

euronion commented Jun 18, 2019 • edited Loading

coroa commented Jun 18, 2019

euronion commented Jun 18, 2019

coroa commented Jun 18, 2019

euronion commented Jul 18, 2019

euronion commented Jul 23, 2019

coroa commented Jul 23, 2019

euronion commented Jul 23, 2019 • edited by coroa Loading

euronion commented Jul 29, 2019 • edited Loading

coroa commented Jul 30, 2019

euronion commented Jul 30, 2019

coroa commented Jul 30, 2019 • edited Loading

coroa commented Jul 30, 2019 • edited Loading

euronion commented Jul 31, 2019

FabianHofmann commented Jan 21, 2021

coroa commented Jun 12, 2019 •

edited

Loading

coroa commented Jun 18, 2019 •

edited

Loading

euronion commented Jun 18, 2019 •

edited

Loading

euronion commented Jul 23, 2019 •

edited by coroa

Loading

euronion commented Jul 29, 2019 •

edited

Loading

coroa commented Jul 30, 2019 •

edited

Loading

coroa commented Jul 30, 2019 •

edited

Loading