Skip to content

Commit

Permalink
Many pieces of additional documentation, plus changes to documentatio…
Browse files Browse the repository at this point in the history
…n structure to make core pieces more visible
  • Loading branch information
nickrsan committed Feb 29, 2024
1 parent 340a689 commit b80089d
Show file tree
Hide file tree
Showing 7 changed files with 178 additions and 10 deletions.
File renamed without changes.
7 changes: 6 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,19 @@ large area images from Earth Engine possible.
`Issue Tracker <https://github.com/water3d/eedl/issues>`_ |
`Q&A Support <https://github.com/water3d/eedl/discussions>`_ |

.. note::
Wondering where to start? See our :ref:`GettingStarted` Guide

.. toctree::
:maxdepth: 2
:caption: Table of Contents

user_guide/about
user_guide/getting_started
user_guide/how_eedl_works
user_guide/working_with_eedl
examples/index
eedl
eedl_api_reference

Indices and tables
==================
Expand Down
5 changes: 2 additions & 3 deletions docs/source/user_guide/about.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
About EEDL
============
About EEDL and Installation
==================================

What is EEDL?
---------------

EEDL combines the functionality of what normally would take several packages into one. This makes working with eedl easier and more stream-lined that when working with other modules.

How to install EEDL
Expand Down
6 changes: 4 additions & 2 deletions docs/source/user_guide/getting_started.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _GettingStarted:

Getting Started with EEDL
============================

Expand Down Expand Up @@ -76,5 +78,5 @@ Note the main points of configuration in this example:

* Creation of the image object: As stated above, here you can customize your image to any valid Earth Engine image - no need to save it out as an asset or into a collection first
* When creating the EEDLImage object - The example above shows no arguments, but you can pass in information, such as CRS information, tiling parameters, and configuration options for running zonal statistics, here. You may want to adjust tiling parameters for multi-band exports, for example, since those can run into Earth Engine's per-tile limits more easily. See documentation for :ref:`EEDLImage` for more information on arguments to the class.
* When triggering the export: When running the export, you can specify where
* And when waiting for images
* When triggering the export: When running the export, you can specify image name information, as well as provide overrides for some class defaults. Additionally, you can provide any valid keyword argument to Earth Engine's export methods directly as keywords to the EEDLImage :code:`export` method and they'll be passed directly through to Earth Engine.
* And when waiting for images: The main points of configuration to set up when waiting are around where to save images on your device, how to handle errors encountered while processing images, how long to wait between polls of the Earth Engine image status endpoints, and whether to run a callback - a method on the EEDLImage class that does postprocessing. Callbacks are helpful because they make EEDL do work on images it has already downloaded while it waits for Earth Engine to export any remaining images. The end result is that your total processing time shouldn't be much longer than Earth Engine's total export time in most cases, even for hundreds of exports. The primary callback to access EEDL's functionality is just the :code:`mosaic` method, but there is also a :code:`mosaic_and_zonal` method that will mosaic the image then run zonal stats. Note that :code:`mosaic_and_zonal` requires that you provide additional configuration parameters when initializing :code:`EEDLImage`, but :code:`mosaic` can run without additional configuration.
118 changes: 118 additions & 0 deletions docs/source/user_guide/how_eedl_works.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
How EEDL Works, In Detail
===============================
EEDL doesn't do anything particularly fancy or hard, but it handles things that are a pain to manage,
whether you're doing it by hand, or whether you're trying to write code to manage the process. Here's
what EEDL handles for you and how it goes about it. This document is more the story of what's happening
so you can understand the pieces, especially if something goes wrong, than direct information on how to use EEDL.

Summary
-------------
EEDL allows you to get data out of Earth Engine, in bulk, for further analysis or postprocessing. If you're wanting
to export one image, it can do that, but you may be better served by handling it yourself. But once you want many images,
this task can get tedious or impractical.

EEDL manages Earth Engine exports from before you request Earth Engine export the image until
after it is downloaded on your device and ready to use. It can handle this whether it's one image or 10,000.
It manages the tasks of configuring Earth Engine to slice large images into parts, export them through cloud storage,
retrieve them from cloud storage onto your device, reassemble the parts, and optionally run zonal statistics before
returning control to your code for any further work you want to do with the data

Exports
----------
EEDL starts when you want to export any image from Earth Engine. It can be an existing asset that you've loaded
into an :code:`ee.Image` object, or a computed image that hasn't been saved out in any way yet, such as a collection
that you've summed/reduced into a single image.

Exports work like any other export from Earth Engine, and allow the same parameters, but by starting it through EEDL,
EEDL tracks the export so it can download it for you automatically once it's ready.

Slicing
-----------
As part of managing your export, EEDL automatically passes a parameter to Earth Engine's export code that splits your image
into tiles (by default with 12,800 pixels to a side). This is to keep the image within Earth Engine's memory budget for exports.
Larger tiles may not succeed in being exported on Earth Engine's servers. In some cases, such as with multi-band images, you
may need to decrease the tile size in order to stay within Earth Engine's memory limits, at the cost of more files being output
for a single image export. This isn't a problem on its own, but can create problems in some export scenarios, if you end
up with more than 1000 tiles for one image.

Tracking images and their statuses
-----------------------------------
A key piece of EEDL's functionality is that, once you tell it to download your images, it starts tracking the status of
all exports you've initiated on :code:`EEDLImage` objects within a session.

.. note::
Each :code:`EEDLImage` object instance should only be used for a single export due to how it uses the object
to track information. Create additional :code:`EEDLImage` objects if you need to manage multiple exports

EEDL only begins updating the status of images once you call :code:`wait_for_images` on your :ref:`Task Registry <EEDLImage>`.
After that, it starts updating the status of all images exported so far in your script and blocks execution until
after all images exported have either downloaded or failed to export.

Two important considerations are involved in tracking:

1. EEDL will poll, about once a minute, Earth Engine's status endpoint for tasks to find out where in the export
process the image is. If the image is waiting or exporting, it does nothing more. But once Earth Engine reports
that the image has completed its export, EEDL begins the download of the image.

2. EEDL tracks images by their name - it constructs the name from a few pieces of information, but the most important
is information *you* provide when you initiate each image's export. For now, it's important that you give
each image a unique name, or else EEDL will mix up the pieces of images when downloading and reassembling them.
We'd like to change this to automatically assign a unique ID through the export process and assign your name
only at the end to avoid this issue, but that work has not been done yet.

Exporting through cloud storage
---------------------------------
Earth Engine supports three export targets - Earth Engine assets, Google Drive, and Google Cloud Storage. EEDL
supports and handles exports to Drive and Cloud Storage (Earth Engine assets aren't as accessible outside Earth Engine).

Your choice of which one to export to will vary based upon workflows available to you and each one has unique requirements and
implications for your download. See :ref:`ExportLocations` for more information on this topic. It's important information
to understand before you begin using EEDL. The two most important factors are:

1. If you wish to use Google Drive exports, you need to have the Google Drive client installed on your computer - EEDL doesn't access files in drive via the API. Get in touch or file an issue if you'd like to work on supporting API access instead (which would streamline EEDL for many workloads)
2. If you use Google Cloud Storage exports, your Cloud Storage bucket *must* be public. We don't currently support private buckets, but would like to in the future.

Accessing and downloading data in cloud storage
-------------------------------------------------
Once the image status indicates it's ready for download, EEDL will go retrieve all the image parts
that Earth Engine exported from your export location. In Google Drive exports, it will access the
mounted Google Drive folder on your computer and list
the contents of the folder you exported to, then find everything with a name matching the
name you provided at export time (plus other name parts Earth Engine adds). It will then download
those parts by moving all the matching files to the export location you provided as an argument
to :code:`wait_for_images`. Note that this method means EEDL deletes images from Google Drive for you,
though they continue to take space (see :ref:`ExportLocations` for more information). For Google Cloud exports, it will ask for a listing from the bucket's public
API endpoint of all files that match the name string you provided, then initiate HTTP requests to download
each individual file. With this method, we cannot currently delete images from Cloud Storage buckets,
so we recommend a lifecycle policy on the bucket that automatically deletes files after 24 hours, if possible.

Reassembling the pieces
-------------------------
Once EEDL has downloaded all pieces of an image, it executes any configured callbacks (provided
as a string name of the EEDLImage method to :code:`wait_for_images`). The most common callback
is :code:`mosaic`, which takes all the tiles that match the image's name that have been downloaded
and mosaics them back together with GDAL. Currently, it also builds overviews/pyramids and sets lossless
compression parameters on images as well. The final result will be a single image on your device,
in the folder you specified for downloads with roughly the name you provided and :code:`_mosaic`
appended to the end. Because you can't reliably predict the name of the final image, it is stored
on the EEDLImage object as the :code:`mosaic_image` attribute once the export is complete.

Running zonal statistics
------------------------------
EEDL also can run zonal statistics after mosaicking. You can either call the methods manually
after finishing the download loop, but more likely, you'll run the :code:`mosaic_and_zonal`
callback instead of the :code:`mosaic` callback. :code:`mosaic_and_zonal` requires preconfiguration
of the EEDLImage object by providing the path to your polygon dataset (OGR-compatible), the unique
identifier field, and the statistics you'd like to run. This information can be passed as keyword arguments
when creating an EEDLImage object or set as attributes later, but before downloads begin.

Zonal statistics will be produced as CSVs in the same folder as the image. Statistics are produced
by the :code:`rasterstats` package and are subject to its capabilities and limitations. We'd like to have
the option to run zonal stats within Earth Engine (and then initiate a separate export and download) as well
but have not developed the functionality yet.

The advantage of running zonal statistics via the :code:`mosaic_and_zonal` callback is that zonal statistics
are the most time consuming local operation EEDL provides. By running it within the callback, zonal statistics are
run primarily in the time EEDL is waiting for Earth Engine to export other images. For very large polygon datasets it can
take longer, but typical usage is that more polygons are associated with larger images that, themselves, can take longer
to export from Earth Engine, so the two execution times roughly scale together.
51 changes: 48 additions & 3 deletions docs/source/user_guide/workflows_and_concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,57 @@ Workflows and Concepts

The EEDLImage Class
----------------------

The EEDLImage class is your main point of entry for all exports not initiated by a helper class.
EEDLImage manages all information related to the export and aids in tracking the status of the export
on Earth Engine, as well as tracking the location of all intermediate products. Most core work in
EEDL is done via methods on the EEDLImage class, but your main points of entry and configuration will
be at instance initialization (when calling :code:`your_image = EEDLImage()`) or when calling the
:code:`EEDLImage.export()` method.

Task Registries
--------------------
Task registries manage groups of EEDLImages you're working with and continually update the status of all
images by polling Earth Engine for status updates at configurable intervals.

Task registries do a fair amount of work, but in most cases, you'll only use them in one line of
code to tell EEDL to wait for all available images, where to save them, and what to do once they're downloaded.
See documentation on the TaskRegistry object under :ref:`EEDLImage` for more information on parameters.

By default, EEDL has a single
task registry at :code:`eedl.image.main_task_registry` that all images are added to, but if you'd like to export
a bunch of images, but segment how you wait for them, you can create as many additional task registries as you
like - just provide a created task registry to an image as its :code:`task_registry` keyword argument.

.. code-block::
python
# this example is incomplete - export parameters are missing, but not what the example is about
from eedl.image import EEDLImage
image_main_registry = EEDLImage() # this image goes into the main task registry automatically
image_main_registry.export() # note we're missing parameters here for this example
from eedl.image import main_task_registry, TaskRegistry
custom_task_registry = TaskRegistry() # no other arguments needed
image_custom_registry = EEDLImage(task_registry=custom_task_registry)
image_custom_registry.export() # note we're missing parameters here for this example
# would only download image_main_registry once it's available
main_task_registry.wait_for_images() # missing parameters here too
# ... some additional work you want to do ...
# would only download image_custom_registry when it's ready
custom_task_registry.wait_for_images() # missing parameters here
In that example, once :code:`image_main_registry` finishes exporting,

.. note::
In the long run, we'd like to remove the concept of the Task Registry in favor of truly
asynchronous code that runs in the background. In the meantime, they remain an important concept
in EEDL that is mostly transparent for your use.
asynchronous code that runs in the background - where each image manages its own status updates, etc.
In the meantime, they remain an important concept in EEDL that is mostly transparent for your use.


Tuning Default Values for Exports
Expand All @@ -24,3 +66,6 @@ Exporting a Single ee.Image

Exporting a Filtered ee.ImageCollection
------------------------------------------

Helper Classes
----------------------
1 change: 0 additions & 1 deletion docs/source/user_guide/working_with_eedl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ via Conda Forge may come in the future.
:maxdepth: 2
:caption: Section Contents

getting_started
workflows_and_concepts
export_locations
general_tips
Expand Down

0 comments on commit b80089d

Please sign in to comment.