Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERA5 CDS requests which return a mixture of ERA5 and ERA5T data #190

Closed
lukas-rokka opened this issue Nov 5, 2021 · 5 comments · Fixed by #261
Closed

ERA5 CDS requests which return a mixture of ERA5 and ERA5T data #190

lukas-rokka opened this issue Nov 5, 2021 · 5 comments · Fixed by #261

Comments

@lukas-rokka
Copy link

Description

As reported here, CDS will return an dataset with an extra expver dimension when a call spans over an date range that consist of both ERA5 and preliminary ERA5T data. This is not handled in Atlite.

expver 1 : ERA5 data, that has been quality checked.
expver 5 : ERA5T preliminary data (the three last months).

Expected Behavior

The expver dimension should be removed.

A bonus would be to report the date when the ERA5T (expver 5) data starts as this data might get updated/corrected later.

Fix

An easy fix is that has worked in all my use cases:
cutout.data = cutout.data.reduce(np.nansum, 'expver')

Guess it could be done somewhere earlier in the preparation of the data, i.e. before any derived variables are calculated. Having the extra expver dimension also doubles the memory footprint, so it can be good to be thoughtful about this also when/if splitting cds calls in the time dimension

@lukas-rokka lukas-rokka changed the title ERA5 CDS requests which return a mixture of ERA5 and ERA5T data Hoppa till slutet på meta-data ERA5 CDS requests which return a mixture of ERA5 and ERA5T data Dec 26, 2021
@zoltanmaric
Copy link
Contributor

zoltanmaric commented Sep 30, 2022

I also ran into this issue today. The reduce(np.nansum, 'expver') solution is not completely robust - for example, if you have data spanning 2 consecutive years, then the values for the previous year will be non-NaN for both expver=1 and expver=5, meaning np.nansum will then double all values for the previous year.

What's worked for me (and is much faster than the reduce method) was the solution proposed here:

cutout.data = cutout.data.sel(expver=1).combine_first(cutout.data.sel(expver=5))

zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Sep 30, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
@zoltanmaric
Copy link
Contributor

@fneum you seem to be the most active contributor on atlite. I could prepare a pull request that does something like master...zoltanmaric:atlite:patch-1 - would you be willing to review and merge if I make it nice? :)

zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Sep 30, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
@fneum
Copy link
Member

fneum commented Sep 30, 2022

You probably mean @FabianHofmann :) but sure, a PR is welcome!

@zoltanmaric
Copy link
Contributor

Whoopsie, yeah, I did mean Fabian Hofmann :) Sorry about that. Alright, I'll prepare something.

zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Oct 3, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 16, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 16, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 16, 2022
zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 16, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 16, 2022
zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 16, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
@zoltanmaric
Copy link
Contributor

Draft pull request: #261

zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 17, 2022
zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 17, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Nov 17, 2022
Requesting cutout data spanning recent (ERA5T)
and data older than ~3 months (ERA5) results in an
additional dimension in `cutout.data`, called `expver`,
which `atlite` currently cannot handle gracefully.

This change collapses the two dimensions into a single dimension.

See discussion in PyPSA#190
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants