Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix geometry issues #532

Closed
wants to merge 54 commits into from
Closed

Conversation

ekatef
Copy link
Member

@ekatef ekatef commented Dec 3, 2022

It has been found during workflow testing on data around the world that underlaying geographical shapes sometimes can be sometimes problematic, namely:

  1. it looks like the GID_0 feature in the gadm file may sometimes have values which do not exactly correspond to the country name. As a result, theGADM_ID in the gadm_shapes.geojson in may have some unexpected values like "Z02.28_1" or "Z03.28_1" (taken from data for China), and leads in country_shapes.geojson file to zeros for the population value and "name": null. This problem may be reproduced by running build_shapes for China or for India and leads to numerous troubles along the workflow.

  2. sometimes the country area is a multiply connected region (an area with holes) which is treated by shapely as invalid geometry. In particular, that is the case for Dahagram–Angarpota Bangladeshi enclave in India.

  3. it may happen that Voronoi partition leads to empty polygons which are not filtered by dropna(), are being written as none geometry into "regions_onshore.geojson" and cause some troubles propagating along the workflow. In particular, that is the case for Kazakhstan shapes (OSM data on 4.12.2022, 10 clusters).

Changes proposed in this Pull Request

This pull request is intended to fix such regional-specific issues.

Checklist

  • Clean-up the code
  • I tested my contribution locally and it seems to work fine
    • for India (the GADM issue & multply connected regions)
    • for China (the GADM issue)
    • for Kazakhstan (the empty geometry issue)
    • for Malaysia (no issues, a complex shape)
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and envs/environment.docs.yaml.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

@ekatef
Copy link
Member Author

ekatef commented Dec 4, 2022

The GADM issue means that some GADM_ID may have non-standard GADM_ID, e.g. for India (part of gadm_shapes):

"GADM_ID": "Z07.3_1", "country": "not found", "pop": 0.0, 
"GADM_ID": "Z04.13_1", "country": "not found", "pop": 0.0,
"GADM_ID": "Z09.13_1", "country": "not found", "pop": 0.0,
"GADM_ID": "Z01.14_1", "country": "not found", "pop": 0.0,
"GADM_ID": "Z05.35_1", "country": "not found", "pop": 0.0,
"GADM_ID": "Z09.35_1", "country": "not found", "pop": 0.0,

A visual check gives the following picture (which actually doen't clarify the issue):

india_gadm_id

Regions with non-standard GADM_IDs are highlighted by violet.

@ekatef
Copy link
Member Author

ekatef commented Dec 5, 2022

It appears that non-standard GADM_IDs cause troubles when being interpreted as country names (e.g. 'not found') and propagating into the cleaned osm datasets. E.g. that is currently the case for India while for China non-standard GADM_IDs seem to remain in the country_shape and gadm_shape only do not causing any harm downstream. Although, such a difference may be a random effect of a particular workflow realisation.

@ekatef
Copy link
Member Author

ekatef commented Dec 5, 2022

A picture for non-standard GADM_IDs for China looks as follows

image

"GADM_ID": "Z02.28_1", "country": null, "pop": 0.0, "gdp": 13450274.0
"GADM_ID": "Z03.28_1", "country": null, "pop": 0.0, "gdp": 45227044.0
"GADM_ID": "Z03.29_1", "country": null, "pop": 0.0, "gdp": 5931376.0 
"GADM_ID": "Z08.29_1", "country": null, "pop": 0.0, "gdp": 3446499.0

@ekatef
Copy link
Member Author

ekatef commented Dec 5, 2022

It looks like a fix is needed in build_shapes to get rid of GADM_IDs non-valid as countries names. A current work-around with len(no_data_countries) > 0 works for build_osm_network but it's likely that non-relevant values of the country name will lead to the problems downstream. E.g. if non-standard names are kept into shapes files, cluster_networks gives AssertionError: The following countries have no load: ['not found'] for distribute_cluster: ['load'].

The most strait-forward solution would be to filter by GID_0 (aka country name) values with something like df_countries = df_countries.query("name in @countries"). However, it looks like these areas represent meaningful geographical regions and a more data-saving approach may be preferable.

@ekatef
Copy link
Member Author

ekatef commented Dec 8, 2022

In a current implementation the onshore shape consists of a some shapes each corresponding to a special administrative area (those ones notated with the non-standard GADM_IDs)

image

"geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ 69.
"geometry": { "type": "Polygon", "coordinates": [ [ [ 75.0716094
"geometry": { "type": "Polygon", "coordinates": [ [ [ 78.6513452
"geometry": { "type": "Polygon", "coordinates": [ [ [ 80.0879363
"geometry": { "type": "Polygon", "coordinates": [ [ [ 94.1912460
"geometry": { "type": "Polygon", "coordinates": [ [ [ 78.9089126

That leads to some troubles along the workflow. In particular, build_bus_regions terminates with an error:

rule build_bus_regions:
    input: resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, networks/base.nc, resources/shapes/gadm_shapes.geojson
    output: resources/bus_regions/regions_onshore.geojson, resources/bus_regions/regions_offshore.geojson
    log: logs/build_bus_regions.log
    jobid: 0
    reason: Input files updated by another job: networks/base.nc
    resources: tmpdir=/var/folders/qn/vpndfm21795ckkq89np1ckp40000gn/T, mem_mb=1000
INFO:snakemake.logging:rule build_bus_regions:
    input: resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, networks/base.nc, resources/shapes/gadm_shapes.geojson
    output: resources/bus_regions/regions_onshore.geojson, resources/bus_regions/regions_offshore.geojson
    log: logs/build_bus_regions.log
    jobid: 0
    reason: Input files updated by another job: networks/base.nc
    resources: tmpdir=/var/folders/qn/vpndfm21795ckkq89np1ckp40000gn/T, mem_mb=1000

INFO:snakemake.logging:
INFO:pypsa.io:Imported network base.nc has buses, lines, links, transformers
Traceback (most recent call last):
  File "~pypsa-earth/.snakemake/scripts/tmp0qppxgve.build_bus_regions.py", line 221, in <module>
    onshore_geometry = custom_voronoi_partition_pts(
  File "~pypsa-earth/.snakemake/scripts/tmp0qppxgve.build_bus_regions.py", line 105, in custom_voronoi_partition_pts
    xmin = min(xmin, minx_o)
TypeError: '>' not supported between instances of 'numpy.ndarray' and 'str'
[Thu Dec  8 20:21:24 2022]
INFO:snakemake.logging:[Thu Dec  8 20:21:24 2022]
Error in rule build_bus_regions:
    jobid: 0
    output: resources/bus_regions/regions_onshore.geojson, resources/bus_regions/regions_offshore.geojson
    log: logs/build_bus_regions.log (check log file(s) for error message)

ERROR:snakemake.logging:Error in rule build_bus_regions:
    jobid: 0
    output: resources/bus_regions/regions_onshore.geojson, resources/bus_regions/regions_offshore.geojson
    log: logs/build_bus_regions.log (check log file(s) for error message)

RuleException:
CalledProcessError in line 279 of ~pypsa-earth/Snakefile:
Command 'set -euo pipefail;  /Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/bin/python3.10 ~pypsa-earth/.snakemake/scripts/tmp0qppxgve.build_bus_regions.py' returned non-zero exit status 1.
  File "~pypsa-earth/Snakefile", line 279, in __rule_build_bus_regions
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR:snakemake.logging:RuleException:
CalledProcessError in line 279 of ~pypsa-earth/Snakefile:
Command 'set -euo pipefail;  /Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/bin/python3.10 ~pypsa-earth/.snakemake/scripts/tmp0qppxgve.build_bus_regions.py' returned non-zero exit status 1.
  File "~pypsa-earth/Snakefile", line 279, in __rule_build_bus_regions
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.

That issue is caused by a fact that bounds method returns a dataframe instead of a single boundary vector. That problem may be easily resolved by unifying outline shapes before applying Voronoi partition.

@ekatef
Copy link
Member Author

ekatef commented Dec 8, 2022

It could be a good idea to unify all the polygons when generating onshore_shape. The problem is to merge all the geometries simultaneously keeping other features (like GID_0).

It's worth to investigate if the geometry consisting of different polygons would lead to any problems along the workflow apart of the mentioned one in build_bus_regions.

Apart of that, it may be worth to test the workflow on the countries which consist of different areas (e.g. Malaysia)

@ekatef
Copy link
Member Author

ekatef commented Dec 10, 2022

It looks like onshore_shape composed of different geometries doesn't lead to other problems along the workflow apart the known one in build_bus_regions which can be fixed by a direct application of unary_union() just before bound calculation.

However, it appears that for some reasons geometries were duplicated in offshore_shape. Currently offshore_shape consists of five identical geometries which needs to be investigated.

@ekatef
Copy link
Member Author

ekatef commented Dec 14, 2022

The issue with duplicated offshore geometries was fixed for India.

However, during tests with alternative_clustering: true another error appeared (an error listing is bellow). However, this error persisted after changing the region, e.g. to ["CD"] or ["NA"] but doesn't appears for ["KZ"]. So, it looks like that is connected rather to some specific topology of the power grid (HVDC lines??) than to the geometry issues addressed by this PR. The issue #537 has been created to track this error

@ekatef
Copy link
Member Author

ekatef commented Dec 15, 2022

There are some weird issues during renewable profiles generation:

  • one error appears during generation on a hydro profile: AttributeError: 'numpy.int64' object has no attribute 'intersects' when calculating indicator_matrix in inflow = correction_factor * func(capacity_factor=True, **resource) (it can be reproduced, e.g. for ["IN"] using add_to_snakefile: true for augmented_line_connection)
  • another error appears during generation of a solar profile TypeError: ufunc 'linestrings' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' when forming a linestring line = LineString([p, regions.loc[bus, ["x", "y"]]] (that is the case for "KZ" or "NG" even for runs from the main branch which previously worked perfectly)

That is quite weird and most likely linked to some environment issues. #538 attempts to fix it

@ekatef
Copy link
Member Author

ekatef commented Dec 15, 2022

Update: in the CI run there is currently one of the errors listed above, namely AttributeError: 'numpy.int64' object has no attribute 'intersects'. The full listing:

INFO:snakemake.logging:
[Thu Dec 15 17:07:34 2022]
INFO:snakemake.logging:[Thu Dec 15 17:07:34 2022]
rule build_renewable_profiles:
    input: networks/base.nc, resources/natura.tiff, data/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326.tif, data/gebco/GEBCO_2021_TID.nc, resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, data/hydro_capacities.csv, data/eia_hydro_annual_generation.csv, resources/powerplants.csv, resources/bus_regions/regions_onshore.geojson, cutouts/africa-2013-era5-tutorial.nc
    output: resources/renewable_profiles/profile_hydro.nc
    log: logs/build_renewable_profile_hydro.log
    jobid: 20
    benchmark: benchmarks/build_renewable_profiles_hydro
    reason: Missing output files: resources/renewable_profiles/profile_hydro.nc; Input files updated by another job: networks/base.nc, resources/shapes/offshore_shapes.geojson, data/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326.tif, resources/powerplants.csv, data/hydro_capacities.csv, cutouts/africa-2013-era5-tutorial.nc, resources/shapes/country_shapes.geojson, data/gebco/GEBCO_2021_TID.nc, resources/bus_regions/regions_onshore.geojson, resources/natura.tiff
    wildcards: technology=hydro
    threads: 2
    resources: tmpdir=/tmp, mem_mb=20000, mem_mib=19074
INFO:snakemake.logging:rule build_renewable_profiles:
    input: networks/base.nc, resources/natura.tiff, data/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326.tif, data/gebco/GEBCO_2021_TID.nc, resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, data/hydro_capacities.csv, data/eia_hydro_annual_generation.csv, resources/powerplants.csv, resources/bus_regions/regions_onshore.geojson, cutouts/africa-2013-era5-tutorial.nc
    output: resources/renewable_profiles/profile_hydro.nc
    log: logs/build_renewable_profile_hydro.log
    jobid: 20
    benchmark: benchmarks/build_renewable_profiles_hydro
    reason: Missing output files: resources/renewable_profiles/profile_hydro.nc; Input files updated by another job: networks/base.nc, resources/shapes/offshore_shapes.geojson, data/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326.tif, resources/powerplants.csv, data/hydro_capacities.csv, cutouts/africa-2013-era5-tutorial.nc, resources/shapes/country_shapes.geojson, data/gebco/GEBCO_2021_TID.nc, resources/bus_regions/regions_onshore.geojson, resources/natura.tiff
    wildcards: technology=hydro
    threads: 2
    resources: tmpdir=/tmp, mem_mb=20000, mem_mib=19074

INFO:snakemake.logging:
INFO:__main__:Hydro normalization mode hydro_capacities

Determine upstream basins per plant: 0it [00:00, ?it/s]
Determine upstream basins per plant: 3it [00:00, 201.72it/s]
Traceback (most recent call last):
  File "/home/runner/work/pypsa-earth/pypsa-earth/.snakemake/scripts/tmpndxdiqdd.build_renewable_profiles.py", line 455, in <module>
    inflow = correction_factor * func(capacity_factor=True, **resource)
  File "/usr/share/miniconda3/envs/pypsa-earth/lib/python3.10/site-packages/atlite/convert.py", line 805, in hydro
    matrix = cutout.indicatormatrix(basins.shapes)
  File "/usr/share/miniconda3/envs/pypsa-earth/lib/python3.10/site-packages/atlite/cutout.py", line 536, in indicatormatrix
    return compute_indicatormatrix(self.grid, shapes, self.crs, shapes_crs)
  File "/usr/share/miniconda3/envs/pypsa-earth/lib/python3.10/site-packages/atlite/gis.py", line 148, in compute_indicatormatrix
    if o.intersects(d):
AttributeError: 'numpy.int64' object has no attribute 'intersects'
[Thu Dec 15 17:07:40 2022]
INFO:snakemake.logging:[Thu Dec 15 17:07:40 2022]
Error in rule build_renewable_profiles:
    jobid: 20
    input: networks/base.nc, resources/natura.tiff, data/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326.tif, data/gebco/GEBCO_2021_TID.nc, resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, data/hydro_capacities.csv, data/eia_hydro_annual_generation.csv, resources/powerplants.csv, resources/bus_regions/regions_onshore.geojson, cutouts/africa-2013-era5-tutorial.nc
    output: resources/renewable_profiles/profile_hydro.nc
    log: logs/build_renewable_profile_hydro.log (check log file(s) for error details)

ERROR:snakemake.logging:Error in rule build_renewable_profiles:
    jobid: 20
    input: networks/base.nc, resources/natura.tiff, data/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326.tif, data/gebco/GEBCO_2021_TID.nc, resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, data/hydro_capacities.csv, data/eia_hydro_annual_generation.csv, resources/powerplants.csv, resources/bus_regions/regions_onshore.geojson, cutouts/africa-2013-era5-tutorial.nc
    output: resources/renewable_profiles/profile_hydro.nc
    log: logs/build_renewable_profile_hydro.log (check log file(s) for error details)

RuleException:
CalledProcessError in file /home/runner/work/pypsa-earth/pypsa-earth/Snakefile, line 325:
Command 'set -euo pipefail;  /usr/share/miniconda3/envs/pypsa-earth/bin/python3.10 /home/runner/work/pypsa-earth/pypsa-earth/.snakemake/scripts/tmpndxdiqdd.build_renewable_profiles.py' returned non-zero exit status 1.
  File "/home/runner/work/pypsa-earth/pypsa-earth/Snakefile", line 325, in __rule_build_renewable_profiles
  File "/usr/share/miniconda3/envs/pypsa-earth/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR:snakemake.logging:RuleException:
CalledProcessError in file /home/runner/work/pypsa-earth/pypsa-earth/Snakefile, line 325:
Command 'set -euo pipefail;  /usr/share/miniconda3/envs/pypsa-earth/bin/python3.10 /home/runner/work/pypsa-earth/pypsa-earth/.snakemake/scripts/tmpndxdiqdd.build_renewable_profiles.py' returned non-zero exit status 1.
  File "/home/runner/work/pypsa-earth/pypsa-earth/Snakefile", line 325, in __rule_build_renewable_profiles
  File "/usr/share/miniconda3/envs/pypsa-earth/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
WARNING:snakemake.logging:Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
ERROR:snakemake.logging:Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-12-15T165417.221370.snakemake.log
WARNING:snakemake.logging:Complete log: .snakemake/log/2022-12-15T165417.221370.snakemake.log
Error: Process completed with exit code 1.

@ekatef ekatef marked this pull request as ready for review January 9, 2023 21:55
@ekatef
Copy link
Member Author

ekatef commented Jan 9, 2023

@davide-f, thank you so much for the review and the discussion! My feeling is that we are close to finalise the first bug-fixing keeping advanced features for the second step.

The only point left is replacing a if else block with pandas.where(). Have to look into it in some more details: not sure if the replacing function has a vectorised version in Python.

If you don't have major objections regarding the introduced changes, I'll move the changes into a clean PR

@ekatef ekatef mentioned this pull request Jan 10, 2023
7 tasks
@davide-f
Copy link
Member

@ekatef May we close this PR since we worked in the others?

@ekatef
Copy link
Member Author

ekatef commented Jan 22, 2023

Closed to be finalised in a cleaned version

@davide-f davide-f closed this Jan 22, 2023
@ekatef ekatef deleted the country_geom_fixes branch November 14, 2023 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants