Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for the downloader module #23

Open
cheginit opened this issue Jan 19, 2024 · 3 comments
Open

Suggestions for the downloader module #23

cheginit opened this issue Jan 19, 2024 · 3 comments

Comments

@cheginit
Copy link
Collaborator

For downloading DEM and precipitation, I highly recommend using Microsoft's Planetary Computer service via STAC. It can be much faster than the alternatives, it's straightforward to use, and for each dataset there's a notebook example that you can easily repurpose for your application. You can access the catalog here. They have both global 30 m DEM and ERA5. I have experience with this and used them in HyRiver. So, I can help out with this.

For accessing building data, I'd recommend using Overture's. It's similar to VIDA's, but it appears that they have a more rigorous quality control in place. You can check out the Building's theme section here regarding their approach. I have a function that you can use:

import duckdb
from pathlib import Path

def overture_buildings(bbox: tuple[float, float, float, float], dst_parquet: str | Path):
    """Query a subset of Overture's buildings data and save it as a GeoParquet file.

    Parameters
    ----------
    bbox : tuple
        A tuple of floats representing the bounding box of the area of interest
        in the format (minX, minY, maxX, maxY) and 4326 coordinate reference system.
    dst_parquet : str or Path
        The path to the output GeoParquet file.
    """
    conn = duckdb.connect()

    conn.execute("INSTALL httpfs;")
    conn.execute("INSTALL spatial;")
    conn.execute("LOAD httpfs;")
    conn.execute("LOAD spatial;")
    conn.execute("SET s3_region='us-west-2';")

    remote_path = "s3://overturemaps-us-west-2/release/2024-01-17-alpha.0/theme=buildings/type=*/*"
    conn.execute(f"CREATE VIEW buildings_view AS SELECT * FROM read_parquet('{remote_path}', filename=true, hive_partitioning=1);")

    query = f"""
    SELECT
        buildings.id,
        ST_GeomFromWKB(buildings.geometry) as geometry
    FROM buildings_view AS buildings
    WHERE buildings.bbox.minX <= {bbox[2]} AND buildings.bbox.maxX >= {bbox[0]}
      AND buildings.bbox.minY <= {bbox[3]} AND buildings.bbox.maxY >= {bbox[1]}
    """

    file = str(Path(dst_parquet).resolve())
    conn.execute(f"COPY ({query}) TO '{file}' WITH (FORMAT PARQUET);")

    conn.close()
@barneydobson
Copy link
Collaborator

Thanks - seems like a good idea, the ERA5 and buildings downloads in particular can both be super slow

@cheginit
Copy link
Collaborator Author

I updated the Overture code, here's the link to the updated post.

@barneydobson
Copy link
Collaborator

Ohh nice - no need to download the whole country now, will make an issue for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants