Skip to content

Latest commit

 

History

History
executable file
·
49 lines (32 loc) · 5.29 KB

README.md

File metadata and controls

executable file
·
49 lines (32 loc) · 5.29 KB

WeeForest Lens - Data Analysis

Overview

Most data analysis within this project is done using Python and Jupyter Notebooks. Geopandas is the main tool of choice and I have been rather liberal with load order for convenience, therefore some notebooks can easily consume 30-40GB of RAM at any given moment. You might want to consider modifying them to run sequentially, year by year, or loading up files from parquet on disk directly.

Main workflows are:

  • AWI Dataset preparation and analysis, consisting of three AWI datasets and the NWSS dataset.
  • NFI Dataset preparation and analysis for aggregation and overlay with AWI.
  • NFI x AWI Overlay calculation and dataset generation as well as the overlap error identification & handling.
  • Area Calculation dataset generation for each year and dataset, reduced to a point to step away from geospatial bounding box calculations.
  • MBTiles generation from temporary GeoJSON files for the Lens map.

There's also the sandbox folder that contains various experiments, i.e. In-notebook flask server for area calculation, duckdb vs postgis benchmarks, statistical data and much more.

If you're interested in detailed rationale and justification for particular data source choices, aggregation methods and assumptions, please refer to the Research section as well as relevant notebooks. Any notebook-related research and methodology would usually be contained within the notebook itself in the Markdown cells.

Running the Notebooks

All three main notebooks contain Markdown cells with detailed instructions on running them, as well as justifications for certain approaches and expected runtimes.

In order to be able to run Lens you must finish running five main notebooks in order: uk_gb_awi.ipynb, uk_gb_nfi.ipynb, uk_gb_nfi_awi_overlay, uk_gb_area.ipynb and finally uk_gb_tiles.ipynb.

Once ran to completion you'll end up with 23 .mbtiles files in the ../data/tiles folder, 23 parquet files in the ../data/area folder totalling 5GB in size and 23 more point-based parquet files for area calculation totalling 400MB.

Main Sources

This work presently utilises two main resources and their derivatives:

Contributing

Contributions are encouraged and welcome. The project roadmap, ideas, bugs and issues are tracked in the Project.

Areas where help would be most appreciated:

  1. MBTiles generation. Presently there are a few issues with zoom levels and feature simplification due to rather simplistic tippecanoe parameters used. See the issue for details.
  2. Generating satellite basemaps for each of the dataset years, allowing to add contextual satellite imagery to the map.
  3. Processing the NFI dataset to add country designation, potentially using spatial joins to mimic the 2022 dataset structure, enabling grouping and more versatile comparison.
  4. Improving the Overlay dataset generation, especially around small fringe areas that are left over when large areas are overlayed and have negligible leftover area wrapping around.
  5. Improving the IO and overall data handling practices, reducing RAM usage and improving the runtime of the notebooks.
  6. Help ensuring correct licensing and attribution has been provided for all the datasets used in the project, as I'm still not entirely sure I've covered all the bases.