Skip to content

Commit

Permalink
Merge pull request #1091 from radiantearth/clarify-root
Browse files Browse the repository at this point in the history
updates to make clear root can be catalog or collection
  • Loading branch information
cholmes authored Apr 23, 2021
2 parents 091af68 + e257928 commit dfb2300
Show file tree
Hide file tree
Showing 5 changed files with 44 additions and 35 deletions.
32 changes: 18 additions & 14 deletions best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@ So to enable all the great web tools (like [stacindex.org](http://stacindex.org)
[Google Cloud Storage](https://cloud.google.com/storage/docs/cross-origin), or [Apache Server](https://enable-cors.org/server_apache.html).
Many more are listed on [enable-cors.org](https://enable-cors.org/server.html). We recommend enabling CORS for all requests ('\*'),
so that diverse online tools can access your data. If you aren't sure if your server has CORS enabled you can use
[test-cors.org](https://www.test-cors.org/). Enter the URL of your STAC root [Catalog](catalog-spec/catalog-spec.md) JSON
and make sure it gets a response.
[test-cors.org](https://www.test-cors.org/). Enter the URL of your STAC root [Catalog](catalog-spec/catalog-spec.md) or
[Collection](collection-spec/collection-spec.md) JSON and make sure it gets a response.

### STAC on the Web

Expand All @@ -81,7 +81,7 @@ surprised that there is nothing about HTML in the entire specification. This is
should be on web pages without ending up with very bad looking pages. But the importance of having web-accessible versions
of every STAC Item is paramount.

The main recommendation is to have an HTML page for every single STAC Item and Catalog. They should be visually pleasing,
The main recommendation is to have an HTML page for every single STAC Item, Catalog and Collection. They should be visually pleasing,
crawlable by search engines and ideally interactive. The current best practice is to use a tool in the STAC ecosystem called
[STAC Browser](https://github.com/radiantearth/stac-browser/). It can crawl most any valid STAC implementation and generate unique web
pages for each Item and Catalog (or Collection). While it has a default look and feel, the design can easily be
Expand Down Expand Up @@ -393,6 +393,10 @@ file that just has the bands needed for display

## Catalog & Collection Practices

*Note: This section uses the term 'Catalog' (with an uppercase C) to refer to the JSON entity specified in the
[Catalog spec](catalog-spec/catalog-spec.md), and 'catalog' (with a lowercase c) to refer to any full STAC implementation,
which can be any mix of Catalogs Collections and Items.*

### Static and Dynamic Catalogs

As mentioned in the main [overview](overview.md), there are two main types of catalogs - static
Expand Down Expand Up @@ -446,7 +450,7 @@ providers, and users could browse down to both. The leaf Items should just be li

### Catalog Layout

Creating a catalog involves a number of decisions as to what folder structure to use to represent sub-catalogs, items
Creating a catalog involves a number of decisions as to what folder structure to use to represent sub-catalogs, Items
and assets, and how to name them. The specification leaves this totally open, and you can link things as you want. But
it is recommended to be thoughtful about the organization of sub-catalogs, putting them into a structure that a person
might reasonably browse (since they likely will with [STAC on the Web](#stac-on-the-web) recommendations). For example
Expand All @@ -463,14 +467,14 @@ if you follow these recommendations.
1. Root documents (Catalogs / Collections) should be at the root of a directory tree containing the static catalog.
2. Catalogs should be named `catalog.json` and Collections should be named `collection.json`.
3. Items should be named `<id>.json`.
4. Sub-Catalogs should be stored in subdirectories of their parent
4. Sub-Catalogs or sub-Collections should be stored in subdirectories of their parent
(and only 1 subdirectory deeper than a document's parent, e.g. `.../sample/sub1/catalog.json`).
5. Items should be stored in subdirectories of their parent Catalog.
This means that each Item and its assets are contained in a unique subdirectory.
6. Limit the number of Items in a Catalog or sub-Catalog, grouping / partitioning as relevant to the dataset.
5. Items should be stored in subdirectories of their parent Catalog or Collection.
This means that each Item and its assets are contained in a unique subdirectory.
6. Limit the number of Items in a Catalog or Collection, grouping / partitioning as relevant to the dataset.
7. Use structural elements (Catalog and Collection) consistently across each 'level' of your hierarchy.
For example, if levels 2 and 4 of the hierarchy only contain Collections,
don't add a Catalog at levels 2 and 4.
don't add a Catalog at levels 2 and 4.

#### Dynamic Catalog Layout

Expand All @@ -483,7 +487,7 @@ different sub-catalog organization structures. For example one catalog could div
by providers, and users could browse down to both. The leaf Items should just be linked to in a single canonical location
(or at least use a rel link that indicates the location of the canonical one). It is recommended that dynamic catalogs
provide multiple 'views' to allow users to navigate in a way that makes sense to them, providing multiple 'sub-catalogs'
from the root Catalog that enable different paths to browse (country/state, date/time, constellation/satellite, etc). But the
from the root that enable different paths to browse (country/state, date/time, constellation/satellite, etc). But the
canonical 'rel' link should be used to designate the primary location of the Item to search engine crawlers.

#### Mixing STAC Versions
Expand Down Expand Up @@ -608,9 +612,9 @@ implement it.
#### Relative Published Catalog

This is a self-contained catalog as described above, except it includes an absolute `self` link at
the root catalog, to identify its online location. This is designed so that a self-contained catalog (of either type, with its
the root to identify its online location. This is designed so that a self-contained catalog (of either type, with its
assets or just metadata) can be 'published' online
by just adding one field (the self link) to its root catalog. All the other links should remain the same. The resulting catalog
by just adding one field (the self link) to its root (Catalog or Collection). All the other links should remain the same. The resulting catalog
is no longer compliant with the self-contained catalog recommendations, but instead transforms into a 'relative published catalog'.
With this, a client may resolve Item and sub-catalog self links by traversing parent and root links, but requires reading
multiple sources to achieve this.
Expand All @@ -632,8 +636,8 @@ a number of the common official relations that are used in production STAC imple
| alternate | It is recommended that STAC Items are also available as HTML, and should use this rel with `"type" : "text/html"` to tell clients where they can get a version of the Item or Collection to view in a browser. See [STAC on the Web in Best Practices](#stac-on-the-web) for more information. |
| canonical | The URL of the [canonical](https://en.wikipedia.org/wiki/Canonical_link_element) version of the Item or Collection. API responses and copies of catalogs should use this to inform users that they are direct copy of another STAC Item, using the canonical rel to refer back to the primary location. |
| via | The URL of the source metadata that this STAC Item or Collection is created from. Used similarly to canonical, but refers back to a non-STAC record (Landsat MTL, Sentinel tileInfo.json, etc) |
| prev | Indicates that the link's context is a part of a series, and that the previous in the series is the link target. Typically used in STAC by API's, to return smaller groups of Items or Catalogs. |
| next | Indicates that the link's context is a part of a series, and that the next in the series is the link target. Typically used in STAC by API's, to return smaller groups of Items or Catalogs. |
| prev | Indicates that the link's context is a part of a series, and that the previous in the series is the link target. Typically used in STAC by API's, to return smaller groups of Items or Catalogs/Collections. |
| next | Indicates that the link's context is a part of a series, and that the next in the series is the link target. Typically used in STAC by API's, to return smaller groups of Items or Catalogs/Collections. |

### Versioning for Catalogs

Expand Down
11 changes: 6 additions & 5 deletions catalog-spec/catalog-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ and fields to be compliant.
This Catalog specification primarily defines a structure for information to be discoverable. Any use
that is publishing a set of related spatiotemporal assets is strongly recommended to also use the
STAC Collection specification to provide additional information about the set of Items
contained in a Catalog, in order to give contextual information to aid in discovery. Every STAC Collection is
also a valid STAC Catalog.
contained in a Catalog, in order to give contextual information to aid in discovery.
STAC Collections all have the same fields as STAC Catalogs, but with different allowed
values for `type` and `stac_extensions`.

## Catalog fields

Expand Down Expand Up @@ -89,11 +90,11 @@ The following types are commonly used as `rel` types in the Link Object of a STA
| ------- | ----------- |
| self | STRONGLY RECOMMENDED. *Absolute* URL to the location that the Catalog file can be found online, if available. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. |
| root | STRONGLY RECOMMENDED. URL to the root STAC Catalog or [Collection](../collection-spec/README.md). Catalogs should include a link to their root, even if it's the root and points to itself. |
| parent | URL to the parent STAC Catalog or Collection. Non-root Catalogs should include a link to their parent. |
| child | URL to a child STAC Catalog or Collection. |
| parent | URL to the parent STAC entity (Catalog or Collection). Non-root Catalogs should include a link to their parent. |
| child | URL to a child STAC entity (Catalog or Collection). |
| item | URL to a STAC Item. |

**Note:** A link to at least one `item` or `child` Catalog is **REQUIRED**.
**Note:** A link to at least one `item` or `child` (Catalog or Collection) is **REQUIRED**.

There are additional `rel` types in the [Using Relation Types](../best-practices.md#using-relation-types) best practice, but as
they are more typically used in Collections, as Catalogs tend to just be used to structure STAC organization, so tend to just use
Expand Down
6 changes: 3 additions & 3 deletions collection-spec/collection-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,9 +252,9 @@ This is done where there is not a clear official option, or where STAC uses an o
| Type | Description |
| ------- | ------------------------------------------------------------ |
| self | STRONGLY RECOMMENDED. *Absolute* URL to the location that the Collection file can be found online, if available. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. |
| root | URL to the root STAC Catalog or Collection. Collections should include a link to their root, even if it's the root and points to itself. |
| parent | URL to the parent STAC Catalog or Collection. Non-root Collections should include a link to their parent. |
| child | URL to a child STAC Catalog or Collection. |
| root | URL to the root STAC entity (Catalog or Collection). Collections should include a link to their root, even if it's the root and points to itself. |
| parent | URL to the parent STAC entity (Catalog or Collection). Non-root Collections should include a link to their parent. |
| child | URL to a child STAC entity (Catalog or Collection). |
| item | URL to a STAC Item. All Items linked from a Collection MUST refer back to its Collection with the [`collection` relation type](../item-spec/item-spec.md#relation-types). |
| license | The license URL(s) for the Collection SHOULD be specified if the `license` field is set to `proprietary` or `various`. If there is no public license URL available, it is RECOMMENDED to put the license text in a separate file and link to this file. |
| derived_from | URL to a STAC Collection that was used as input data in the creation of this Collection. See the note in [STAC Item](../item-spec/item-spec.md#derived_from) for more info. |
Expand Down
6 changes: 3 additions & 3 deletions item-spec/item-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ It is important that an Item identifier is unique within a Collection, and that
[Collection identifier](../collection-spec/collection-spec.md#id) in turn is unique globally. Then the two can be combined to
give a globally unique identifier. Items are *[strongly recommended](#collections)* to have Collections, and not having one makes
it more difficult to be used in the wider STAC ecosystem.
If an Item does not have a Collection, then the Item identifier should be unique within its root Catalog.
If an Item does not have a Collection, then the Item identifier should be unique within its root Catalog or root Collection.

As most geospatial assets are already uniquely defined by some
identification scheme from the data provider it is recommended to simply use that ID.
Expand Down Expand Up @@ -192,8 +192,8 @@ This happens where there is not a clear official option, or where STAC uses an o
| Type | Description |
| ------------ | ------------------------------------------------------------ |
| self | STRONGLY RECOMMENDED. *Absolute* URL to the Item if it is available at a public URL. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. |
| root | URL to the root STAC Catalog or Collection. |
| parent | URL to the parent STAC Catalog or Collection. |
| root | URL to the root STAC entity (Catalog or Collection). |
| parent | URL to the parent STAC entity (Catalog or Collection). |
| collection | STRONGLY RECOMMENDED. URL to a Collection. *Absolute* URLs should be used whenever possible. The referenced Collection is STRONGLY RECOMMENDED to implement the same STAC version as the Item. A link with this `rel` type is *required* if the `collection` field in properties is present. |
| derived_from | URL to a STAC Item that was used as input data in the creation of this Item. |

Expand Down
24 changes: 14 additions & 10 deletions overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,11 @@ A Catalog is a very simple construct - it just provides links to Items or to oth
The closest analog is a folder in a file structure, it is the container for Items, but it can
also hold other containers (folders / catalogs).

The Collection specification shares some fields with the catalog spec but has a number of additional fields:
The Collection entity shares most fields with the Catalog entity but has a number of additional fields:
license, extent (spatial and temporal), providers, keywords and summaries. Every Item in a Collection links
back to their Collection, so clients can easily find fields like the license. Thus every Item implicitly
shares the fields described in their parent Collection.
shares the fields described in their parent Collection. Collection entities can be used just like Catalog
entities to provide structure, as they provide all the same options for linking and organizing.

But what *should* go in a Collection, versus just in a Catalog? A Collection will generally consist of
a set of assets that are defined with the same properties and share higher level metadata. In the
Expand All @@ -78,27 +79,30 @@ provide multiple grouping paths, serving as a sort of faceted search.

The second case is used when one wants to represent diverse data in a single place. If an organization
has an internal catalog with Landsat 8, Sentinel 2, NAIP data and several commercial imagery providers
then they'd have a root catalog that would link to a number of different Collections.
then they'd have a root Catalog that would link to a number of different Collections.

So in conclusion it's best to use Collections for what you want a user to find as a starting point, and then
catalogs are just for structuring and grouping the data. Future work includes a mechanism to actually
So in conclusion it's best to use Collections for what you want user to find as starting point, and then
Catalogs are just for structuring and grouping the data. Future work includes a mechanism to actually
search Collection-level data, hopefully in concert with other specifications.

## Catalog Overview

*NOTE: The below examples all say Catalog, but those can all be Collections as well, as it has all the fields necessary to
serve as a Catalog*

There are two required element types of a Catalog: Catalog and Item. A STAC Catalog
points to [STAC Items](item-spec/README.md), or to other STAC catalogs. It provides a simple
linking structure that can be used recursively so that many Items can be included in
a single Catalog, organized however the implementor desires.

STAC makes no formal distinction between a "root" catalog and the "child" catalogs. A root catalog
is simply the top-most catalog -- it has no parent. A nested catalog structure is useful (and
STAC makes no formal distinction between a "root" Catalog and the "child" Catalogs. A root Catalog
is simply the top-most Catalog or Collection -- it has no parent. A nested catalog structure is useful (and
recommended) for breaking up massive numbers of catalog Items into logical groupings. For example,
it might make sense to organize a catalog by date (year, month, day), or geography (continent,
country, state/prov). See the [Catalog Layout](best-practices.md#catalog-layout) best practices
section for more.

A simple Catalog structure might look like this:
A simple STAC structure might look like this:

- catalog (root)
- catalog
Expand Down Expand Up @@ -164,8 +168,8 @@ each Item and Catalog, as well as ways to achieve that.

## Collection Overview

A STAC Collection extends the core fields of the Catalog construct to provide additional metadata to describe the set of Items it
contains. The required fields are fairly
A STAC Collection includes the core fields of the Catalog entity and also provides additional metadata to describe
the set of Items it contains. The required fields are fairly
minimal - it includes the 4 required Catalog fields (id, description, stac_version and links), and adds license
and extents. But there are a number of other common fields defined in the spec, and more common fields are also
defined in [STAC extensions](extensions/). These serve as basic metadata, and ideally Collections also link to
Expand Down

0 comments on commit dfb2300

Please sign in to comment.