test data moved

GEUS-Glaciology-and-Climate · Nov 5, 2024 · 63545fb · 63545fb
1 parent dd691e8
commit 63545fb
Show file tree

Hide file tree

Showing 49 changed files with 899 additions and 64 deletions.
diff --git a/docs/tutorial-data.md b/docs/tutorial-data.md
@@ -0,0 +1,61 @@
+# Dataset tutorials
+
+The GrIML package is used for the production of the Greenland ice marginal lake inventory series, which is freely available through the [GEUS Dataverse](https://doi.org/10.22008/FK2/MBKW9N). This dataset is a series of annual inventories, mapping the extent and presence of lakes across Greenland that share a margin with the Greenland Ice Sheet and/or the surrounding ice caps and periphery glaciers. 
+
+Here, we will look at how to load and handle the dataset, and provide details on its contents.
+
+## Dataset contents
+
+This ice marginal lake dataset is a series of annual inventories, mapping the extent and presence of lakes across Greenland that share a margin with the Greenland Ice Sheet and/or the surrounding ice caps and periphery glaciers. The annual inventories provide a comprehensive record of all identified ice marginal lakes, which have been detected using three independent remote sensing techniques:
+
+- DEM sink detection using the ArcticDEM (mosaic version 3)
+- SAR backscatter classification from Sentinel-1 imagery
+- Multi-spectral indices classification from Sentinel-2 imagery
+
+All data were compiled and filtered in a semi-automated approach, using a modified version of the [MEaSUREs GIMP ice mask](https://nsidc.org/data/NSIDC-0714/versions/1) to clip the dataset to within 1 km of the ice margin. Each detected lake was then verified manually. The methodology is open-source and provided in the associated [Github repository](https://github.com/GEUS-Glaciology-and-Climate/GrIML) for full reproducibility.
+
+The inventory series was created to better understand the impact of ice marginal lake change on the future sea level budget and the terrestrial and marine landscapes of Greenland, such as its ecosystems and human activities. The dataset is a complete inventory series of Greenland, with no absent data.
+
+### Data format
+
+The detected lakes are presented as polygon vector features in shapefile format (.shp), with coordinates provided in the WGS NSIDC Sea Ice Polar Stereographic North (EPSG:3413) projected coordinate system.
+
+### Metadata
+
+Each inventory in the inventory series contains the following metadata information:
+
+| Variable name       | Description         | Format | 
+|---------------------|---------------------|---------|
+| `row_id`  	| Index identifying number for each polygon   | Integer  |
+| `lake_id` 	| Identifying number for each unique lake  	| Integer  |
+| `lake_name`| Lake placename, as defined by the [Oqaasileriffik (Language Secretariat of Greenland)](https://oqaasileriffik.gl) placename database which is distributed with [QGreenland](https://qgreenland.org/)  | String   |
+| `margin`	| Type of margin that the lake is adjacent to (`ICE_SHEET`, `ICE_CAP`)   | String |
+| `region`	| Region that lake is located, as defined by Mouginot and Rignot (2019) (`NW`, `NO`, `NE`, `CE`, `SE`, `SW`, `CW`)       	| String |
+| `area_sqkm`	| Areal extent of polygon/s in square kilometres  | Float |
+| `length_km`	| Length of polygon/s in kilometres         		| Float |
+| `temp_aver`	| Average lake surface temperature estimate (in degrees Celsius), derived from the Landsat 8/9 OLI/TIRS Collection 2 Level 2 surface temperature data product  | Float |
+| `temp_min`	| Minimum pixel lake surface temperature estimate (in degrees Celsius), derived from the Landsat 8/9 OLI/TIRS Collection 2 Level 2 surface temperature data product  | Float |
+| `temp_max`	| Maximum pixel lake surface temperature estimate (in degrees Celsius), derived from the Landsat 8/9 OLI/TIRS Collection 2 Level 2 surface temperature data product  | Float |
+| `temp_stdev`	| Average lake surface temperature estimate standard deviation, derived from the Landsat 8/9 OLI/TIRS Collection 2 Level 2 surface temperature data product  | Float |
+| `method`		| Method of classification (`DEM`, `SAR`, `VIS`)  | String |
+| `source`     | Image source of classification (`ARCTICDEM`, `S1`, `S2`)    | String  |
+| `all_src`     | List of all sources that successfully classified the lake (i.e. all classifications with the same `lake_name` value)  | String         |
+| `num_src`          | Number of sources that successfully classified the lake (`1`, `2`, `3`)     | String | 
+| `certainty`     | Certainty of classification, which is calculated from `all_src` as a score between `0` and `1`          | Float | -                             |
+| `start_date` | Start date for classification image filtering 	| String  |
+| `end_date` 	| End date for classification image filtering     | String |
+| `verified` | Flag to denote if the lake has been manually verified (`Yes`, `No`)   | String |
+| `verif_by`  | Author of verification | String  |
+| `edited`  | Flag to denote if polygon has been manually edited (`Yes`, `No`)  | String   |
+| `edited_by` | Author of manual editing   | String  |
+
+## Getting started
+
+Loading the dataset: Data available at [GEUS Dataverse](https://doi.org/10.22008/FK2/MBKW9N).
+
+Quicklook plotting of the dataset
+
+
+## Generating statistics
+
+Extracting statistics
diff --git a/src/griml/metadata/assign_id.py b/src/griml/metadata/assign_id.py
@@ -28,7 +28,7 @@ def assign_id(gdf, col_name='unique_id'):
     n, ids = connected_components(overlap_matrix)
     ids=ids+1
 
-    # Assign ids and realign geoedataframe index 
+    # Assign ids and realign geodataframe index 
     gdf[col_name]=ids
     gdf = gdf.sort_values(col_name)
     gdf.reset_index(inplace=True, drop=True) 

diff --git a/src/griml/metadata/assign_names.py b/src/griml/metadata/assign_names.py
@@ -12,7 +12,7 @@
 from shapely.geometry import Point, LineString, Polygon
 from griml.load import load
 
-def assign_names(gdf, gdf_names):
+def assign_names(gdf, gdf_names, distance=1000.0):
     '''Assign placenames to geodataframe geometries based on names in another 
     geodataframe point geometries
 
@@ -39,13 +39,17 @@ def assign_names(gdf, gdf_names):
     names = _compile_names(gdf2)
     placenames = gpd.GeoDataFrame({"geometry": list(gdf2['geometry']),
                                    "placename": names})
+
+    # Remove invalid geometries
+    gdf1 = _check_geometries(gdf1)
 
     # Assign names based on proximity
-    a = _get_nearest_point(gdf1, placenames)
+    a = _get_nearest_point(gdf1, placenames, distance)
+
     return a
 
 
-def _get_nearest_point(gdA, gdB, distance=500.0):
+def _get_nearest_point(gdA, gdB, distance=1000.0):
     '''Return properties of nearest point in Y to geometry in X'''
     nA = np.array(list(gdA.geometry.centroid.apply(lambda x: (x.x, x.y))))
     nB = np.array(list(gdB.geometry.apply(lambda x: (x.x, x.y))))
@@ -70,18 +74,25 @@ def _get_indices(mylist, value):
     return[i for i, x in enumerate(mylist) if x==value]
 
 
+def _check_geometries(gdf):
+    '''Check that all geometries within a geodataframe are valid'''  
+    return gdf.drop(gdf[gdf.geometry==None].index)
+
 def _compile_names(gdf):
     '''Get preferred placenames from placename geodatabase'''  
     placenames=[]
     for i,v in gdf.iterrows():
-        if v['Ny_grønla'] != None: 
-            placenames.append(v['Ny_grønla'])
+        if v['New Greenl'] != None: 
+            placenames.append(v['New Greenl'])
         else:
-            if v['Dansk'] != None: 
-                placenames.append(v['Dansk'])
+            if v['Old Greenl'] != None: 
+                placenames.append(v['Old Greenl'])
             else:
-                if v['Alternativ'] != None:
-                    placenames.append(v['Alternativ'])
-                else:
-                    placenames.append(None)
+            	if v['Danish'] != None: 
+                    placenames.append(v['Danish'])
+            	else:
+                    if v['Alternativ'] != None:
+                        placenames.append(v['Alternativ'])
+                    else:
+                        placenames.append(None)
     return placenames
diff --git a/src/griml/metadata/assign_sources.py b/src/griml/metadata/assign_sources.py
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 
-def assign_sources(gdf, col_names=['unique_id', 'source']):
+def assign_sources(gdf, col_names=['lake_id', 'source']):
     '''Assign source metadata to geodataframe, based on unique lake id and
     individual source information
     
@@ -17,38 +17,34 @@ def assign_sources(gdf, col_names=['unique_id', 'source']):
     gdf : geopandas.GeoDataFrame
         Vectors with assigned sources
     '''
-    ids = gdf[col_names[0]].tolist()
-    source = gdf[col_names[1]].tolist()
-    satellites=[]
-
-    # Construct source list
-    for x in range(len(ids)):
-        indx = _get_indices(ids, x)
-        if len(indx) != 0:
-            res = []
-            if len(indx) == 1:
-                res.append(source[indx[0]].split('/')[-1])
-            else:
-                unid=[]
-                for dx in indx:
-                    unid.append(source[dx].split('/')[-1])
-                res.append(list(set(unid)))
-            for z in range(len(indx)):
-                if len(indx) == 1:
-                    satellites.append(res)
-                else:
-                    satellites.append(res[0])
-
-    # Compile lists for appending
-    satellites_names = [', '.join(i) for i in satellites]
-    number = [len(i) for i in satellites]
-
-    # Return updated geodataframe    
-    gdf['all_src']=satellites_names
-    gdf['num_src']=number
+    all_src=[]
+    num_src=[]
+    for idx, i in gdf.iterrows():
+        idl = i[col_names[0]]
+        g = gdf[gdf[col_names[0]] == idl]
+        source = list(set(list(gdf[col_names[1]])))
+        satellites=''
+        if len(source)==1:
+            satellites = satellites.join(source)
+            num = 1
+        elif len(source)==2:
+            satellites = satellites.join(source[0]+', '+source[1])
+            num = 2
+        elif len(source)==3:
+            satellites = satellites.join(source[0]+', '+source[1]+', '+source[2])
+            num = 3
+        else:
+            print('Unknown number of sources detected')
+            print(source)
+            satellites=None
+            num=None
+        all_src.append(satellites)
+        num_src.append(num)
+    satellites
+    gdf['all_src']=all_src
+    gdf['num_src']=num_src
     return gdf
 
-
 def _get_indices(mylist, value):
     '''Get indices for value in list'''
     return[i for i, x in enumerate(mylist) if x==value]
diff --git a/src/griml/metadata/iml_abundancy_error_estimate.py b/src/griml/metadata/iml_abundancy_error_estimate.py
@@ -0,0 +1,41 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Thu Sep 26 16:09:20 2024
+
+@author: pho
+"""
+import geopandas as gpd
+import glob
+import numpy as np
+import pandas as pd
+from pathlib import Path
+from scipy.sparse.csgraph import connected_components
+from scipy.spatial import cKDTree
+
+# Map inventory file locations
+gdf_files = '/home/pho/Desktop/python_workspace/GrIML/other/iml_2016-2023/final/checked/*IML-fv1.shp'
+
+# Load inventory point file with lake_id, region, basin-type and placename info
+gdf2 = gpd.read_file('/home/pho/Desktop/python_workspace/GrIML/other/iml_2016-2023/manual_validation/iml_manual_validation_with_names.shp')
+gdf2_corr = gdf2.drop(gdf2[gdf2.geometry==None].index)
+
+
+# Iterate across inventory series files
+gdfs=[]
+for g in list(sorted(glob.glob(gdf_files))):
+    print(g)
+    gdf = gpd.read_file(g)
+    gdf = gdf.dissolve(by='lake_id')
+    print(len(gdf['geometry']))
+    gdfs.append(gdf)
+
+dfs = pd.concat(gdfs)
+dfs = dfs.dissolve(by='lake_id')
+dfs['area_sqkm']=[g.area/10**6 for g in list(dfs['geometry'])]
+dfs['length_km']=[g.length/1000 for g in list(dfs['geometry'])]
+
+
+print('Average lake size: ' + str(dfs.area_sqkm.mean()))
+
+dfs.to_file('/home/pho/Desktop/python_workspace/GrIML/other/iml_2016-2023/final/checked/'+'ALL-ESA-GRIML-IML-MERGED-fv1.shp')