diff --git a/best-practices.md b/best-practices.md index 77e01e6b..9a33eade 100644 --- a/best-practices.md +++ b/best-practices.md @@ -71,8 +71,8 @@ So to enable all the great web tools (like [stacindex.org](http://stacindex.org) [Google Cloud Storage](https://cloud.google.com/storage/docs/cross-origin), or [Apache Server](https://enable-cors.org/server_apache.html). Many more are listed on [enable-cors.org](https://enable-cors.org/server.html). We recommend enabling CORS for all requests ('\*'), so that diverse online tools can access your data. If you aren't sure if your server has CORS enabled you can use -[test-cors.org](https://www.test-cors.org/). Enter the URL of your STAC root [Catalog](catalog-spec/catalog-spec.md) JSON -and make sure it gets a response. +[test-cors.org](https://www.test-cors.org/). Enter the URL of your STAC root [Catalog](catalog-spec/catalog-spec.md) or +[Collection](collection-spec/collection-spec.md) JSON and make sure it gets a response. ### STAC on the Web @@ -81,7 +81,7 @@ surprised that there is nothing about HTML in the entire specification. This is should be on web pages without ending up with very bad looking pages. But the importance of having web-accessible versions of every STAC Item is paramount. -The main recommendation is to have an HTML page for every single STAC Item and Catalog. They should be visually pleasing, +The main recommendation is to have an HTML page for every single STAC Item, Catalog and Collection. They should be visually pleasing, crawlable by search engines and ideally interactive. The current best practice is to use a tool in the STAC ecosystem called [STAC Browser](https://github.com/radiantearth/stac-browser/). It can crawl most any valid STAC implementation and generate unique web pages for each Item and Catalog (or Collection). While it has a default look and feel, the design can easily be @@ -393,6 +393,10 @@ file that just has the bands needed for display ## Catalog & Collection Practices +*Note: This section uses the term 'Catalog' (with an uppercase C) to refer to the JSON entity specified in the +[Catalog spec](catalog-spec/catalog-spec.md), and 'catalog' (with a lowercase c) to refer to any full STAC implementation, +which can be any mix of Catalogs Collections and Items.* + ### Static and Dynamic Catalogs As mentioned in the main [overview](overview.md), there are two main types of catalogs - static @@ -446,7 +450,7 @@ providers, and users could browse down to both. The leaf Items should just be li ### Catalog Layout -Creating a catalog involves a number of decisions as to what folder structure to use to represent sub-catalogs, items +Creating a catalog involves a number of decisions as to what folder structure to use to represent sub-catalogs, Items and assets, and how to name them. The specification leaves this totally open, and you can link things as you want. But it is recommended to be thoughtful about the organization of sub-catalogs, putting them into a structure that a person might reasonably browse (since they likely will with [STAC on the Web](#stac-on-the-web) recommendations). For example @@ -463,14 +467,14 @@ if you follow these recommendations. 1. Root documents (Catalogs / Collections) should be at the root of a directory tree containing the static catalog. 2. Catalogs should be named `catalog.json` and Collections should be named `collection.json`. 3. Items should be named `.json`. -4. Sub-Catalogs should be stored in subdirectories of their parent +4. Sub-Catalogs or sub-Collections should be stored in subdirectories of their parent (and only 1 subdirectory deeper than a document's parent, e.g. `.../sample/sub1/catalog.json`). -5. Items should be stored in subdirectories of their parent Catalog. -This means that each Item and its assets are contained in a unique subdirectory. -6. Limit the number of Items in a Catalog or sub-Catalog, grouping / partitioning as relevant to the dataset. +5. Items should be stored in subdirectories of their parent Catalog or Collection. + This means that each Item and its assets are contained in a unique subdirectory. +6. Limit the number of Items in a Catalog or Collection, grouping / partitioning as relevant to the dataset. 7. Use structural elements (Catalog and Collection) consistently across each 'level' of your hierarchy. For example, if levels 2 and 4 of the hierarchy only contain Collections, -don't add a Catalog at levels 2 and 4. + don't add a Catalog at levels 2 and 4. #### Dynamic Catalog Layout @@ -483,7 +487,7 @@ different sub-catalog organization structures. For example one catalog could div by providers, and users could browse down to both. The leaf Items should just be linked to in a single canonical location (or at least use a rel link that indicates the location of the canonical one). It is recommended that dynamic catalogs provide multiple 'views' to allow users to navigate in a way that makes sense to them, providing multiple 'sub-catalogs' -from the root Catalog that enable different paths to browse (country/state, date/time, constellation/satellite, etc). But the +from the root that enable different paths to browse (country/state, date/time, constellation/satellite, etc). But the canonical 'rel' link should be used to designate the primary location of the Item to search engine crawlers. #### Mixing STAC Versions @@ -608,9 +612,9 @@ implement it. #### Relative Published Catalog This is a self-contained catalog as described above, except it includes an absolute `self` link at -the root catalog, to identify its online location. This is designed so that a self-contained catalog (of either type, with its +the root to identify its online location. This is designed so that a self-contained catalog (of either type, with its assets or just metadata) can be 'published' online -by just adding one field (the self link) to its root catalog. All the other links should remain the same. The resulting catalog +by just adding one field (the self link) to its root (Catalog or Collection). All the other links should remain the same. The resulting catalog is no longer compliant with the self-contained catalog recommendations, but instead transforms into a 'relative published catalog'. With this, a client may resolve Item and sub-catalog self links by traversing parent and root links, but requires reading multiple sources to achieve this. @@ -632,8 +636,8 @@ a number of the common official relations that are used in production STAC imple | alternate | It is recommended that STAC Items are also available as HTML, and should use this rel with `"type" : "text/html"` to tell clients where they can get a version of the Item or Collection to view in a browser. See [STAC on the Web in Best Practices](#stac-on-the-web) for more information. | | canonical | The URL of the [canonical](https://en.wikipedia.org/wiki/Canonical_link_element) version of the Item or Collection. API responses and copies of catalogs should use this to inform users that they are direct copy of another STAC Item, using the canonical rel to refer back to the primary location. | | via | The URL of the source metadata that this STAC Item or Collection is created from. Used similarly to canonical, but refers back to a non-STAC record (Landsat MTL, Sentinel tileInfo.json, etc) | -| prev | Indicates that the link's context is a part of a series, and that the previous in the series is the link target. Typically used in STAC by API's, to return smaller groups of Items or Catalogs. | -| next | Indicates that the link's context is a part of a series, and that the next in the series is the link target. Typically used in STAC by API's, to return smaller groups of Items or Catalogs. | +| prev | Indicates that the link's context is a part of a series, and that the previous in the series is the link target. Typically used in STAC by API's, to return smaller groups of Items or Catalogs/Collections. | +| next | Indicates that the link's context is a part of a series, and that the next in the series is the link target. Typically used in STAC by API's, to return smaller groups of Items or Catalogs/Collections. | ### Versioning for Catalogs diff --git a/catalog-spec/catalog-spec.md b/catalog-spec/catalog-spec.md index 573e848a..325ea58f 100644 --- a/catalog-spec/catalog-spec.md +++ b/catalog-spec/catalog-spec.md @@ -36,8 +36,9 @@ and fields to be compliant. This Catalog specification primarily defines a structure for information to be discoverable. Any use that is publishing a set of related spatiotemporal assets is strongly recommended to also use the STAC Collection specification to provide additional information about the set of Items -contained in a Catalog, in order to give contextual information to aid in discovery. Every STAC Collection is -also a valid STAC Catalog. +contained in a Catalog, in order to give contextual information to aid in discovery. +STAC Collections all have the same fields as STAC Catalogs, but with different allowed +values for `type` and `stac_extensions`. ## Catalog fields @@ -89,11 +90,11 @@ The following types are commonly used as `rel` types in the Link Object of a STA | ------- | ----------- | | self | STRONGLY RECOMMENDED. *Absolute* URL to the location that the Catalog file can be found online, if available. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. | | root | STRONGLY RECOMMENDED. URL to the root STAC Catalog or [Collection](../collection-spec/README.md). Catalogs should include a link to their root, even if it's the root and points to itself. | -| parent | URL to the parent STAC Catalog or Collection. Non-root Catalogs should include a link to their parent. | -| child | URL to a child STAC Catalog or Collection. | +| parent | URL to the parent STAC entity (Catalog or Collection). Non-root Catalogs should include a link to their parent. | +| child | URL to a child STAC entity (Catalog or Collection). | | item | URL to a STAC Item. | -**Note:** A link to at least one `item` or `child` Catalog is **REQUIRED**. +**Note:** A link to at least one `item` or `child` (Catalog or Collection) is **REQUIRED**. There are additional `rel` types in the [Using Relation Types](../best-practices.md#using-relation-types) best practice, but as they are more typically used in Collections, as Catalogs tend to just be used to structure STAC organization, so tend to just use diff --git a/collection-spec/collection-spec.md b/collection-spec/collection-spec.md index f4e331c7..c1614b99 100644 --- a/collection-spec/collection-spec.md +++ b/collection-spec/collection-spec.md @@ -252,9 +252,9 @@ This is done where there is not a clear official option, or where STAC uses an o | Type | Description | | ------- | ------------------------------------------------------------ | | self | STRONGLY RECOMMENDED. *Absolute* URL to the location that the Collection file can be found online, if available. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. | -| root | URL to the root STAC Catalog or Collection. Collections should include a link to their root, even if it's the root and points to itself. | -| parent | URL to the parent STAC Catalog or Collection. Non-root Collections should include a link to their parent. | -| child | URL to a child STAC Catalog or Collection. | +| root | URL to the root STAC entity (Catalog or Collection). Collections should include a link to their root, even if it's the root and points to itself. | +| parent | URL to the parent STAC entity (Catalog or Collection). Non-root Collections should include a link to their parent. | +| child | URL to a child STAC entity (Catalog or Collection). | | item | URL to a STAC Item. All Items linked from a Collection MUST refer back to its Collection with the [`collection` relation type](../item-spec/item-spec.md#relation-types). | | license | The license URL(s) for the Collection SHOULD be specified if the `license` field is set to `proprietary` or `various`. If there is no public license URL available, it is RECOMMENDED to put the license text in a separate file and link to this file. | | derived_from | URL to a STAC Collection that was used as input data in the creation of this Collection. See the note in [STAC Item](../item-spec/item-spec.md#derived_from) for more info. | diff --git a/item-spec/item-spec.md b/item-spec/item-spec.md index 5f609c52..641fd3da 100644 --- a/item-spec/item-spec.md +++ b/item-spec/item-spec.md @@ -84,7 +84,7 @@ It is important that an Item identifier is unique within a Collection, and that [Collection identifier](../collection-spec/collection-spec.md#id) in turn is unique globally. Then the two can be combined to give a globally unique identifier. Items are *[strongly recommended](#collections)* to have Collections, and not having one makes it more difficult to be used in the wider STAC ecosystem. -If an Item does not have a Collection, then the Item identifier should be unique within its root Catalog. +If an Item does not have a Collection, then the Item identifier should be unique within its root Catalog or root Collection. As most geospatial assets are already uniquely defined by some identification scheme from the data provider it is recommended to simply use that ID. @@ -192,8 +192,8 @@ This happens where there is not a clear official option, or where STAC uses an o | Type | Description | | ------------ | ------------------------------------------------------------ | | self | STRONGLY RECOMMENDED. *Absolute* URL to the Item if it is available at a public URL. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. | -| root | URL to the root STAC Catalog or Collection. | -| parent | URL to the parent STAC Catalog or Collection. | +| root | URL to the root STAC entity (Catalog or Collection). | +| parent | URL to the parent STAC entity (Catalog or Collection). | | collection | STRONGLY RECOMMENDED. URL to a Collection. *Absolute* URLs should be used whenever possible. The referenced Collection is STRONGLY RECOMMENDED to implement the same STAC version as the Item. A link with this `rel` type is *required* if the `collection` field in properties is present. | | derived_from | URL to a STAC Item that was used as input data in the creation of this Item. | diff --git a/overview.md b/overview.md index f8dbebe0..306daf5f 100644 --- a/overview.md +++ b/overview.md @@ -53,10 +53,11 @@ A Catalog is a very simple construct - it just provides links to Items or to oth The closest analog is a folder in a file structure, it is the container for Items, but it can also hold other containers (folders / catalogs). -The Collection specification shares some fields with the catalog spec but has a number of additional fields: +The Collection entity shares most fields with the Catalog entity but has a number of additional fields: license, extent (spatial and temporal), providers, keywords and summaries. Every Item in a Collection links back to their Collection, so clients can easily find fields like the license. Thus every Item implicitly -shares the fields described in their parent Collection. +shares the fields described in their parent Collection. Collection entities can be used just like Catalog +entities to provide structure, as they provide all the same options for linking and organizing. But what *should* go in a Collection, versus just in a Catalog? A Collection will generally consist of a set of assets that are defined with the same properties and share higher level metadata. In the @@ -78,27 +79,30 @@ provide multiple grouping paths, serving as a sort of faceted search. The second case is used when one wants to represent diverse data in a single place. If an organization has an internal catalog with Landsat 8, Sentinel 2, NAIP data and several commercial imagery providers -then they'd have a root catalog that would link to a number of different Collections. +then they'd have a root Catalog that would link to a number of different Collections. -So in conclusion it's best to use Collections for what you want a user to find as a starting point, and then -catalogs are just for structuring and grouping the data. Future work includes a mechanism to actually +So in conclusion it's best to use Collections for what you want user to find as starting point, and then +Catalogs are just for structuring and grouping the data. Future work includes a mechanism to actually search Collection-level data, hopefully in concert with other specifications. ## Catalog Overview +*NOTE: The below examples all say Catalog, but those can all be Collections as well, as it has all the fields necessary to +serve as a Catalog* + There are two required element types of a Catalog: Catalog and Item. A STAC Catalog points to [STAC Items](item-spec/README.md), or to other STAC catalogs. It provides a simple linking structure that can be used recursively so that many Items can be included in a single Catalog, organized however the implementor desires. -STAC makes no formal distinction between a "root" catalog and the "child" catalogs. A root catalog -is simply the top-most catalog -- it has no parent. A nested catalog structure is useful (and +STAC makes no formal distinction between a "root" Catalog and the "child" Catalogs. A root Catalog +is simply the top-most Catalog or Collection -- it has no parent. A nested catalog structure is useful (and recommended) for breaking up massive numbers of catalog Items into logical groupings. For example, it might make sense to organize a catalog by date (year, month, day), or geography (continent, country, state/prov). See the [Catalog Layout](best-practices.md#catalog-layout) best practices section for more. -A simple Catalog structure might look like this: +A simple STAC structure might look like this: - catalog (root) - catalog @@ -164,8 +168,8 @@ each Item and Catalog, as well as ways to achieve that. ## Collection Overview -A STAC Collection extends the core fields of the Catalog construct to provide additional metadata to describe the set of Items it -contains. The required fields are fairly +A STAC Collection includes the core fields of the Catalog entity and also provides additional metadata to describe +the set of Items it contains. The required fields are fairly minimal - it includes the 4 required Catalog fields (id, description, stac_version and links), and adds license and extents. But there are a number of other common fields defined in the spec, and more common fields are also defined in [STAC extensions](extensions/). These serve as basic metadata, and ideally Collections also link to