diff --git a/dataset-spec/README.md b/dataset-spec/README.md new file mode 100644 index 000000000..58969076c --- /dev/null +++ b/dataset-spec/README.md @@ -0,0 +1,103 @@ +# STAC Dataset Spec + +[STAC Items](https://github.com/radiantearth/stac-spec/json-spec/) are focused on search within a dataset*. Another topic of interest is the search of datasets, instead of within a dataset. The Dataset Spec is an independent spec that STAC Items are *strongly recommended* to provide a link to a dataset definition. Other parties can also independently use this spec to describe datasets in a lightweight way. + +The Datasets Spec extends the [Catalog Spec](../static-catalog/) with additional fields to describe the set of items in the catalog. It shares the same fields and therefore every Dataset is also a valid Catalog. Datasets can have both parent Catalogs and Datasets and child Items, Catalogs and Datasets. + +A Dataset can be represented in JSON format. Any JSON object that contains all the required fields is a valid STAC Dataset and Catalog. + +* [Example (Sentinel 2)](example-s2.json) +* [JSON Schema](json-schema/dataset.json) + +*\* There is no standardized name for the concept we are describing here. Others called it: dataset series (ISO 19115), collection (CNES, NASA), dataset (JAXA), dataset series (ESA), product (JAXA).* + +## WARNING + +**This is still an early version of the STAC spec, expect that there may be some changes before everything is finalized.** + +Implementations are encouraged, however, as good effort will be made to not change anything too drastically. Using the specification now will ensure that needed changes can be made before everything is locked in. So now is an ideal time to implement, as your feedback will be directly incorporated. + +## Dataset fields + +| Element | Type | Description | +| ----------- | ----------------- | ------------------------------------------------------------ | +| name | string | **REQUIRED.** Identifier for the dataset that is unique across the provider. | +| title | string | A short descriptive one-line title for the dataset. | +| description | string | **REQUIRED.** Detailed multi-line description to fully explain the entity. [CommonMark 0.28](http://commonmark.org/) syntax MAY be used for rich text representation. | +| keywords | [string] | List of keywords describing the dataset. | +| version | string | Version of the dataset. [Semantic Versioning (SemVer)](https://semver.org/) SHOULD be followed. | +| license | string | **REQUIRED.** Dataset's license(s) as a SPDX [License identifier](https://spdx.org/licenses/) or [expression](https://spdx.org/spdx-specification-21-web-version#h.jxpfx0ykyb60) or `proprietary` if the license is not on the SPDX license list. Proprietary licensed data SHOULD add a link to the license text, see the `license` relation type. | +| provider | [Provider Object] | A list of data providers, the organizations which influenced the content of the dataset. Providers should be listed in chronological order with the most recent provider being the last element of the list. | +| host | Host Object | Storage provider, the organization that hosts the dataset. | +| extent | [Extent Object] | **REQUIRED.** Spatial and temporal extents. | +| links | [Link Object] | **REQUIRED.** A list of references to other documents. | + +### Extent Object + +The object describes the spatio-temporal extents of the dataset. Both spatial and temporal extents are required to be specified. + +**Note:** STAC datasets tries to be compliant to [WFS 3.0](https://github.com/opengeospatial/WFS_FES), but there are still issues to be solved. The WFS specification is in draft state any may change, especially regarding [3D support](https://github.com/opengeospatial/WFS_FES/issues/143) for spatial extents or the handling of [open date ranges](https://github.com/opengeospatial/WFS_FES/issues/155) for temporal extents. Therefore, It is also likely that the following fields change over time. + +| Element | Type | Description | +| -------- | -------- | ------------------------------------------------------------ | +| spatial | [number] | **REQUIRED.** Potential *spatial extent* covered by the dataset. West, north, east, south edges of the spatial extent. Only WGS84 longitude/latitude is supported. The list of four numbers can be extended to six numbers to support a 3D spatial extent. | +| temporal | [string\|null] | **REQUIRED.** Potential *temporal extent* covered by the dataset. A list of two timestamps, which MUST be formatted according to [RFC 3339, section 5.6](https://tools.ietf.org/html/rfc3339#section-5.6). Open date ranges are supported by setting either the start or the end time to `null`. Example for data from the beginning of 2019 until now: `["2009-01-01T00:00:00Z", null]`. | + +### Provider Object + +The object provides information about a provider. A provider is any of the organizations that created or processed the content of the dataset and therefore influenced the data offered by this dataset. + +| Field Name | Type | Description | +| ---------- | ------ | ------------------------------------------------------------ | +| name | string | **REQUIRED.** The name of the organization or the individual. | +| url | string | Homepage of the provider. | + +### Host Object + +The objects provides information about the storage provider hosting the data. + +**Note:** The idea of storage profiles is currently [discussed](https://github.com/radiantearth/stac-spec/issues/148). Therefore, scheme, id and region may be removed from the final spec once this concept is introduced to STAC. + +| Field Name | Type | Description | +| -------------- | ------- | ------------------------------------------------------------ | +| name | string | **REQUIRED.** The name of the organization or the individual hosting the data. | +| description | string | Detailed description to explain the hosting details. [CommonMark 0.28](http://commonmark.org/) syntax MAY be used for rich text representation. | +| scheme | string | **REQUIRED.** The protocol/scheme used to access the data. Any of: `S3`, `GCS`, `URL`, `OTHER` | +| id | string | **REQUIRED.** Host-specific identifier such as an URL or asset id. | +| region | string | Provider specific region where the data is stored. | +| requester_pays | boolean | `true` if requester pays, `false` if host pays. Defaults to `false`. | + +### Link Object + +This object describes a relationship with another entity. Data providers are advised to be liberal with links. + +| Field Name | Type | Description | +| ---------- | ------ | ------------------------------------------------------------ | +| href | string | **REQUIRED.** The actual link in the format of an URL. Relative and absolute links are both allowed. | +| rel | string | **REQUIRED.** Relationship between the current document and the linked document. See chapter "Relation types" for more information. | +| type | string | MIME-type of the referenced entity. | + +#### Relation types + +The following types are commonly used as `rel` types in the Link Object of a Dataset: + +| Type | Description | +| ------- | ------------------------------------------------------------ | +| self | **REQUIRED.** *Absolute* URL to the dataset file itself. This is required, to represent the location that the file can be found online. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. | +| root | URL to the root [STAC Catalog](../static-catalog/) or Dataset. | +| parent | URL to the parent [STAC Catalog](../static-catalog/) or Dataset. | +| child | URL to a child [STAC Catalog](../static-catalog/) or Dataset. | +| item | URL to a [STAC Item](../json-spec/). | +| license | The license URL for the dataset SHOULD be specified if the `license` field is set to `proprietary`. If there is no public license URL available, it is RECOMMENDED to supplement the STAC catalog with the license text in separate file and link to this file. | + +## Extensions + +Important related extensions for the dataset spec: + +* [EO extension](../extensions/stac-eo-spec.md) + Please note that some fields such as `eo:sun_elevation ` or `eo:sun_azimuth` are only meaningful on the item level and MUST not be used in datasets. +* Dimensions extension (proposed, see [PR #227](https://github.com/radiantearth/stac-spec/pull/227)) +* [Scientific extension](../extensions/scientific) +* Provenance extension (planned, see [issue #179](https://github.com/radiantearth/stac-spec/issues/179)) + +The [extensions page](../extensions/) gives a full overview about relevant extensions for STAC Datasets. \ No newline at end of file diff --git a/dataset-spec/example-s2.json b/dataset-spec/example-s2.json new file mode 100644 index 000000000..4f73c9c37 --- /dev/null +++ b/dataset-spec/example-s2.json @@ -0,0 +1,50 @@ +{ + "name": "COPERNICUS/S2", + "title": "Sentinel-2 MSI: MultiSpectral Instrument, Level-1C", + "description": "Sentinel-2 is a wide-swath, high-resolution, multi-spectral\nimaging mission supporting Copernicus Land Monitoring studies,\nincluding the monitoring of vegetation, soil and water cover,\nas well as observation of inland waterways and coastal areas.\n\nThe Sentinel-2 data contain 13 UINT16 spectral bands representing\nTOA reflectance scaled by 10000. See the [Sentinel-2 User Handbook](https://sentinel.esa.int/documents/247904/685211/Sentinel-2_User_Handbook)\nfor details. In addition, three QA bands are present where one\n(QA60) is a bitmask band with cloud mask information. For more\ndetails, [see the full explanation of how cloud masks are computed.](https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-2-msi/level-1c/cloud-masks)\n\nEach Sentinel-2 product (zip archive) may contain multiple\ngranules. Each granule becomes a separate Earth Engine asset.\nEE asset ids for Sentinel-2 assets have the following format:\nCOPERNICUS/S2/20151128T002653_20151128T102149_T56MNN. Here the\nfirst numeric part represents the sensing date and time, the\nsecond numeric part represents the product generation date and\ntime, and the final 6-character string is a unique granule identifier\nindicating its UTM grid reference (see [MGRS](https://en.wikipedia.org/wiki/Military_Grid_Reference_System)).\n\nFor more details on Sentinel-2 radiometric resoltuon, [see this page](https://earth.esa.int/web/sentinel/user-guides/sentinel-2-msi/resolutions/radiometric).\n", + "license": "proprietary", + "keywords": [ + "copernicus", + "esa", + "eu", + "msi", + "radiance", + "sentinel" + ], + "provider": [ + { + "name": "European Union/ESA/Copernicus", + "url": "https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi" + } + ], + "extent": { + "spatial": [ + 180.0, + -56.0, + -180.0, + 83.0 + ], + "temporal": [ + "2015-06-23T00:00:00", + null + ] + }, + "links": [ + { + "rel": "self", + "href": "https://storage.cloud.google.com/earthengine-test/catalog/COPERNICUS_S2.json" + }, + { + "rel": "parent", + "href": "https://storage.cloud.google.com/earthengine-test/catalog/catalog.json" + }, + { + "rel": "root", + "href": "https://storage.cloud.google.com/earthengine-test/catalog/catalog.json" + }, + { + "rel": "license", + "href": "https://scihub.copernicus.eu/twiki/pub/SciHubWebPortal/TermsConditions/Sentinel_Data_Terms_and_Conditions.pdf" + } + ] +} \ No newline at end of file diff --git a/dataset-spec/json-schema/dataset.json b/dataset-spec/json-schema/dataset.json new file mode 100644 index 000000000..289323798 --- /dev/null +++ b/dataset-spec/json-schema/dataset.json @@ -0,0 +1,157 @@ +{ + "$schema": "http://json-schema.org/draft-06/schema#", + "id": "dataset.json#", + "title": "Dataset Item", + "description": "This object represents the dataset in a SpatioTemporal Asset Catalog.", + "type": "object", + "required": [ + "name", + "description", + "license", + "extent", + "links" + ], + "additionalProperties": true, + "properties": { + "name": { + "title": "Identifier", + "type": "string" + }, + "title": { + "title": "Title", + "type": "string" + }, + "description": { + "title": "Description", + "type": "string" + }, + "keywords": { + "title": "Keywords", + "type": "array", + "items": { + "type": "string" + } + }, + "version": { + "title": "Dataset Version", + "type": "string" + }, + "license": { + "title": "Dataset License Name", + "type": "string" + }, + "provider": { + "type": "array", + "items": { + "properties": { + "name": { + "title": "Organization Name", + "type": "string" + }, + "url": { + "title": "Organization homepage", + "type": "string", + "format": "url" + } + } + } + }, + "host": { + "required": [ + "name", + "scheme", + "id" + ], + "properties": { + "name": { + "title": "Organization name", + "type": "string" + }, + "description": { + "title": "Description", + "type": "string" + }, + "scheme": { + "title": "Scheme", + "type": "string", + "enum": [ + "S3", + "GCS", + "URL", + "OTHER" + ] + }, + "id": { + "title": "Identifirer", + "type": "string" + }, + "region": { + "title": "Region", + "type": "string" + }, + "requester_pays": { + "title": "Requester Pays", + "type": "boolean", + "default": false + } + }, + "additionalProperties": true + }, + "extent": { + "title": "Extents", + "type": "object", + "required": [ + "spatial", + "temporal" + ], + "properties": { + "spatial": { + "title": "Spatial extent", + "type": "array", + "items": { + "type": "number" + } + }, + "temporal": { + "title": "Temporal extent", + "type": "array", + "minItems": 2, + "maxItems": 2, + "items": { + "type": [ + "string", + "null" + ], + "format": "date-time" + } + } + }, + "additionalProperties": true + }, + "links": { + "type": "array", + "items": { + "type": "object", + "required": [ + "href", + "rel" + ], + "properties": { + "href": { + "title": "Link", + "type": "string" + }, + "rel": { + "title": "Relation", + "type": "string" + }, + "type": { + "title": "type", + "type": "string" + } + }, + "additionalProperties": true + } + } + } +} \ No newline at end of file diff --git a/extensions/README.md b/extensions/README.md index 579ee779f..62130645a 100644 --- a/extensions/README.md +++ b/extensions/README.md @@ -11,13 +11,13 @@ them they can create a shared extension and include it in the STAC repository. ## List of official extensions -| Extension Name (Prefix) | Scope | Description | -| ------------------------------------------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [Collection](stac-collection-spec.md) (`c`) | Item | Provides a way to specify data fields that are common across a collection of STAC Items, so that each does not need to repeat all the same information. | -| [EO](stac-eo-spec.md) (`eo`) | Item | Covers data that represents a snapshot of the earth for a single date and time. It could consist of multiple spectral bands in any part of the electromagnetic spectrum. Examples of EO data include sensors with visible bands, IR bands as well as SAR instruments. The extension provides common fields like bands, cloud cover, off nadir, sun angle + elevation, gsd and more. | -| [Scientific](scientific/) (`sci`) | Catalog | Scientific metadata is considered to be data that indicate from which publication a dataset originates and how the dataset itself should be cited or referenced. | -| [Start end datetime](stac-start-end-datetime-spec.md) (`set`) | Item | An extension to provide start and end datetime stamps in a consistent way. | -| [Transaction](transaction/) | API | Provides an API extension to support the creation, editing, and deleting of items on a specific WFS3 collection. | +| Extension Name (Prefix) | Scope | Description | +| ------------------------------------------------------------ | ---------------- | ------------------------------------------------------------ | +| [Collection](stac-collection-spec.md) (`c`) | Item | Provides a way to specify data fields that are common across a collection of STAC Items, so that each does not need to repeat all the same information. | +| [EO](stac-eo-spec.md) (`eo`) | Item | Covers data that represents a snapshot of the earth for a single date and time. It could consist of multiple spectral bands in any part of the electromagnetic spectrum. Examples of EO data include sensors with visible bands, IR bands as well as SAR instruments. The extension provides common fields like bands, cloud cover, off nadir, sun angle + elevation, gsd and more. | +| [Scientific](scientific/) (`sci`) | Catalog +Dataset | Scientific metadata is considered to be data that indicate from which publication a dataset originates and how the dataset itself should be cited or referenced. | +| [Start end datetime](stac-start-end-datetime-spec.md) (`set`) | Item | An extension to provide start and end datetime stamps in a consistent way. | +| [Transaction](transaction/) | API | Provides an API extension to support the creation, editing, and deleting of items on a specific WFS3 collection. | ## Third-party / vendor extensions