A common set of functions used by the Cirrus geospatial pipeline. This library are convenience functions for interacting with the various Cirrus AWS resources (queues, Lambdas, Batch, DynamoDB, etc.) and should be preferred over writing separate code where possible.
Cirrus-lib requires boto3-utils, a library of convenience functions built upon boto3. Cirrus-lib is published to PyPi which is the preferred method of installing:
$ pip install cirrus-lib
# install specific version
$ pip install cirrus-lib==0.1.0
# in a requirements.txt file - the tilda (~) will install the highest compatible version
cirrus-lib~=0.1.0
Module | Description |
---|---|
process_payload | ProcessPayload and ProcessPayloads classes representing a Cirrus input payloads and a set of ProcessPayloads |
errors | Custom Error classes |
statedb | The StateDB class for interacting with the Cirrus state database (DynamoDB) |
transfer | Convenience functions for uploading and download assets in STAC Items |
utils | Miscellaneous utility functions |
The errors
module contains a single class: InvalidInput
, which can be thrown from a task if there is a problem with the input to the task. The Item will be marked as INVALID in the state database rather than FAILED to avoid running it again.
The StateDB
class from the statedb
module is used to interact with the Cirrus state database (DynamoDB): creating new items in the database, querying, and updating statuses. Each item in the state database represents one input payload into Cirrus. The StateDB
class is abstracted from what is actually stored in the DynamoDB, which is optimized for queries. The actual entry in the database can be retrieved with the StateDB.get_dbitem()
and StateDB.get_dbitems()
functions which are used internally, but clients should generally opt to use StateDB.get_items()
, which translates the dbitem into:
Field Name | Type | Description |
---|---|---|
payload_id | string | The ID of the Cirrus input payload |
collections | string | A list of all the collections used by the input items |
workflow | string | The name of the workflow used to process the input data |
items | string | A list of all the input Item IDs |
payload | string | The s3 URL of the original input payload JSON |
state | string | One of: 'PROCESSING', 'COMPLETED', 'FAILED', 'INVALID' |
created | string | The datetime that this item was created in the database |
updated | string | The datetime stamp that this item's state was last updated |
executions | [string] | A list of http URLs to the State Machine executions, in chronological order |
outputs | [string] | If state is COMPLETED, a list of s3 (if private payload) or http (if public) URLs of all STAC Items generated by this process. |
last_error | string | This is the error message from the last FAILED or INVALID execution, if any. It is not cleared if state is PROCESSING or COMPLETED |
payload_id: The payload ID is formed by a combination of all the input collections, output collections, and the name of the workflow.
The transfer
module contains convenience functions for transferring data.
Function | Description |
---|---|
get_s3_session | Get a boto3-utils s3 object for transfer (used by the other functions) to/from s3 |
download_from_http | Download a file from an HTTP url |
download_item_assets | For a given STAC Item download 1 or more assets to a local path |
upload_item_assets | For a given STAC Item upload 1 or more assets to s3 |
This module contains miscellaneous utility functions.
Function | Description |
---|---|
submit_batch_job | Start a Cirrus Batch job with a given payload |
Cirrus is an open-source pipeline for processing geospatial data in AWS. Cirrus was developed by Element 84 originally under a [NASA Access project].