Skip to content

Commit

Permalink
Merge pull request #1313 from w3c/dcat-dataseries-issue1272-rev
Browse files Browse the repository at this point in the history
Revise section on dataset series as discussed in issue 1272
  • Loading branch information
riccardoAlbertoni authored Mar 8, 2021
2 parents fe2da02 + 07df66e commit 203442e
Showing 1 changed file with 57 additions and 46 deletions.
103 changes: 57 additions & 46 deletions dcat/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1876,7 +1876,7 @@ <h4>Property: was generated by</h4>
</section>

</section> <!-- end class Dataset -->
<section id="Class:DatasetSeries">
<section id="Class:Dataset_Series">

<h3>Class: Dataset Series</h3>

Expand All @@ -1888,7 +1888,7 @@ <h3>Class: Dataset Series</h3>
</aside>

<p>The following property is specific to this class:
<a href="#Property:datasetseries_has_part">hasPart</a>.
<a href="#Property:dataset_series_has_part">hasPart</a>.

</p>
<p>The following properties of the super-class <a href="#Class:Resource"><code>dcat:Resource</code></a> and <a href="#Class:Dataset"><code>dcat:Dataset</code></a> are also available for use:
Expand Down Expand Up @@ -1932,10 +1932,11 @@ <h3>Class: Dataset Series</h3>
</tbody>
</table>

<section id="Property:datasetseries_has_part">
<section id="Property:dataset_series_has_part">
<h4>Property: has part</h4>
<!--
<div class="issue" data-number="1307"> </div>

-->
<table class="definition">
<thead><tr><th>RDF Property:</th><th><a href="http://purl.org/dc/terms/hasPart">dct:hasPart</a></th></tr></thead>
<tbody>
Expand Down Expand Up @@ -3457,48 +3458,25 @@ <h2>Resource life-cycle</h2>

<h2>Dataset series</h2>

<p>With "dataset series" we refer to data, somehow interrelated, that are published separately, although they could be merged into a single dataset. An example is budget data split by year and/or country, instead of being made available in a single dataset.</p>
<p>With "dataset series" we refer to data, somehow interrelated, that are published separately. An example is budget data split by year and/or country, instead of being made available in a single dataset.</p>

<p>Dataset series are defined in [[ISO-19115]] as a <q>collection of datasets [&hellip;] sharing common characteristics</q>. However, their use is not limited to geospatial data, although in other domains they can be named differently (e.g., time series, data slices) and defined more or less strictly (see, e.g., the notion of "dataset slice" in [[VOCAB-DATA-CUBE]]).</p>

<p>The reasons and criteria for splitting data into series are manyfold, and they may be related to, e.g., data characteristics, publishing process, and how they are typically used. For instance, data huge in size (as geospatial ones) are more easily handled (by data providers as well as data consumers) by splitting them into smaller ones. Another example is data released on a yearly basis, which are typically published as separate datasets, instead of appending the new data to the first in the series.</p>

<p>There are no common rules and criteria across domains to decide when dataset series should be created and how they should be organized. The situation is similar to the one concerning versioning (see <a href="#dataset-versions"></a>), and, likewise, DCAT does not adopt any specific definition of dataset series, and when and how they should be created and organized. The purpose of this section is limited to providing guidance on how dataset series can be specified in DCAT.</p>
<p>The reasons and criteria for grouping datasets into series are manyfold, and they may be related to, e.g., data characteristics, publishing process, and how they are typically used. For instance, data huge in size (as geospatial ones) are more easily handled (by data providers as well as data consumers) by splitting them into smaller ones. Another example is data released on a yearly basis, which are typically published as separate datasets, instead of appending the new data to the first in the series.</p>

<p>As there are no common rules and criteria across domains to decide when dataset series should be created and how they should be organized, DCAT does not prescribe any specific approach, and refer for guidance and domain- and community practices. The purpose of this section is limited to providing guidance on how dataset series can be specified in DCAT.</p>

<section id="dataset-series-specification">

<h2>How to specify dataset series</h2>

<p>Existing DCAT implementations adopt two main alternative approaches to specifying dataset series:</p>

<ol>
<li>The dataset series is typed as a <code>dcat:Dataset</code>, whereas its child datasets are typed as <code>dcat:Distribution</code>'s.</li>
<li>Both the dataset series and its child datasets are typed as a <code>dcat:Dataset</code>'s, and the two are usually linked by using the [[DCTERMS]] properties <code>dct:hasPart</code> / <code>dct:isPartOf</code>.</li>
</ol>

<p>In both cases, the dataset series is sometimes soft-typed by using the [[DCTERMS]] property <code>dct:type</code> (e.g., this is the approach used in [[GeoDCAT-AP]], and adopted in [[DCAT-AP-IT]] and [[GeoDCAT-AP-IT]]).</p>

<p>Compared with the second option, the first one may have the advantage of simplifying metadata management, and avoiding the creation of datasets having the same values for almost all their metadata elements. On the other hand, this approach reduces the ability of being discovered, as distribution metadata are not rich as datasets' ones. Moreover, using distributions may result cumbersome or unfeasible when the number of child datasets is too high.</p>

<p>DCAT makes dataset series first class citizens of data catalogs by minting a new class <code>dcat:DatasetSeries</code>. <code>dcat:DatasetSeries</code> is defined as subclass of <code>dcat:Dataset</code>.</p>

<p><!--As stated in the note to <a href="#Class:Distribution"></a>,-->DCAT recommends, as the default approach, typing dataset series as <code>dcat:DatasetSeries</code> and child datasets as <code>dcat:Dataset</code>, and linking them by using properties <code>dct:hasPart</code> and/or <code>dct:isPartOf</code>, and possibly soft-typing the dataset series via property <code>dct:type</code>.</p>

<!--aside class="ednote">
<p>The creation of a specific class for dataset series is under discussion.</p>
<div class="issue" data-number="1272"></div>
</aside-->

<p>The approach based on the use of <code>dcat:Distribution</code> for typing child datasets is however recognized as a possible alternative, whenever it addresses more effectively the requirements of a given application scenario.</p>

<p>Here and in the following sections, guidance will focus on the default approach.</p>

<aside class="example" id="ex-dataset-series-containment" title="">
<p>In the following example, yearly budget data are grouped into a series. The series is typed as <code>dcat:DatasetSeries</code> and soft-typed with property <code>dct:type</code>, the child datasets are typed as <code>dcat:Dataset</code>. The series and the datasets are linked by using properties <code>dct:hasPart</code> and <code>dct:isPartOf</code>.</p>
<p>DCAT makes dataset series first class citizens of data catalogs by minting a new class <a href="#Class:Dataset_Series"><code>dcat:DatasetSeries</code></a>, defined as subclass of <a href="#Class:Dataset"><code>dcat:Dataset</code></a>. The dataset series and the child datasets are linked by using the [[DCTERMS]] property <a href="#Property:dataset_series_has_part"><code>dct:hasPart</code></a> and/or its inverse <a href="#Property:dataset_series_is_part_of"><code>dct:isPartOf</code></a>. Note that a dataset series can also be the child of another dataset series.</p>
<div class="issue" data-number="1307"> </div>

<aside class="example" id="ex-dataset-series-containment" title="Yearly budget datasets grouped into a series">
<p>In the following example, yearly budget data are grouped into a series. The series is typed as <code>dcat:DatasetSeries</code>, the child datasets are typed as <code>dcat:Dataset</code>. The series and the datasets are linked by using both <code>dct:hasPart</code> and <code>dct:isPartOf</code>.</p>
<pre>
ex:budget a dcat:DatasetSeries ;
dct:type &lt;http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series&gt: ;
title "Budget data"@en ;
dct:hasPart ex:budget-2018 ,
ex:budget-2019 , ex:budget-2020 ;
Expand All @@ -3521,13 +3499,12 @@ <h2>How to specify dataset series</h2>
</pre>
</aside>

<p>It is worth noting that a dataset series may evolve over time, by acquiring new datasets. E.g., a dataset series about yearly budget data will acquire a new child dataset every year. In such cases, it might be important to link the yearly releases with relationship specifying the previous, next, and latest ones. In such scenario, DCAT recommends following the approach described in <a href="#version-types"></a>, using the [[VOCAB-ADMS]] properties <code>adms:prev</code>, <code>adms:next</code>, and <code>adms:last</code>, respectively.</p>
<p>Dataset series may evolve over time, by acquiring new datasets. E.g., a dataset series about yearly budget data will acquire a new child dataset every year. In such cases, it might be important to link the yearly releases with relationships specifying the previous, next, and latest ones. In such scenario, DCAT makes use of the [[VOCAB-ADMS]] properties <a href="#Property:resource_prev"><code>adms:prev</code></a>, <a href="#Property:resource_next"><code>adms:next</code></a>, and <a href="#Property:resource_last"><code>adms:last</code></a>, respectively.</p>

<aside class="example" id="ex-dataset-series-releases" title="">
<aside class="example" id="ex-dataset-series-releases" title="Linking datasets in a series">
<p>The following example extends <a href="#ex-dataset-series-containment"></a> by specifying the publication date (<code>dct:issued</code>) of each child dataset, and the previous (<code>adms:prev</code>) and next release (<code>adms:next</code>).</p>
<pre>
ex:budget a dcat:DatasetSeries ;
dct:type &lt;http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series&gt: ;
title "Budget data"@en ;
dct:hasPart ex:budget-2018 ,
ex:budget-2019 , ex:budget-2020 ;
Expand Down Expand Up @@ -3562,16 +3539,25 @@ <h2>How to specify dataset series</h2>

<section id="dataset-series-properties">

<h2>Property values inheritance in dataset series</h2>
<h2>Dataset series metadata</h2>

<div class="issue" data-number="1273"></div>

<!--
<p>A dataset series can be seen as the result of subsetting (or slicing) a single dataset based on the values of one or more metadata element. E.g., a statistical dataset about employment may be split into smaller datasets about the same age-group and/or gender.</p>
<p>Although any metadata element can be used for subsetting, the most frequent cases concern the spatial and temporal dimensions. Re-using the example of budget data, the dataset can be split not only by year, but also by country / region. Other examples concerns child datasets using different temporal / spatial resolution, unit of measurement, or reference system.</p>
<p>Because of its role of "container", the dimensions described in child dataset metadata should be reflected in the dataset series, via upstream inheritance - i.e., properties of child datasets are inherited by their parent (the dataset series).</p>
<p>Typically, this means that, for each of the relevant properties, the dataset series takes as value the union of those specified in child datasets. For instance:</p>-->

<p>Properties about dataset series can be classified into two groups.</p>

<p>The first group is about properties describing the dataset series itself. For instance, this is the case of property <a href="#Property:dataset_series_update_frequency"><code>dct:accrualPeriodicity</code></a>, whose value should correspond to the frequency upon which a new child dataset is added.</p>

<p>The second group is about properties reflecting the dimensions described in child dataset metadata, via upstream inheritance - i.e., property values of child datasets are inherited by their parent (the dataset series).</p>

<p>Typically, this means that, for each of the relevant properties, the dataset series takes as value the union of those specified in child datasets. For instance:</p>

<ul>
Expand All @@ -3580,23 +3566,21 @@ <h2>Property values inheritance in dataset series</h2>
<li>If each child dataset uses a different spatial reference system, the dataset series will have multiple spatial reference systems.</li>
</ul>

<p>Finally, some annotation properties of child datasets may need to be taken into account as well at the level of dataset series. In particular, properties concerning the creation / publication / update dates of child datasets, as well as their update frequency, may affect the corresponding ones in the series. For these properties, DCAT recommends the following approach:</p>
<p>Finally, some annotation properties of child datasets may need to be taken into account as well at the level of dataset series. In particular, properties concerning the creation / publication / update dates of child datasets may affect the corresponding ones in the series. For these properties, DCAT recommends the following approach:</p>
<ul>
<li>The creation date (<code>dct:created</code>) of the dataset series should correspond to the earliest creation date of the child datasets.</li>
<li>The publication date (<code>dct:issued</code>) of the dataset series should correspond to the earliest publication date of the child datasets.</li>
<li>The update date (<code>dct:modified</code>) of the dataset series should correspond to the latest publication or update date of the child datasets.</li>
<li>The update frequency (<code>dct:accrualPeriodicity</code>) of the dataset series should correspond to the one of the child dataset most frequently updated.</li>
<li>The publication date (<a href="#Property:dataset_series_release_date"><code>dct:issued</code></a>) of the dataset series should correspond to the earliest publication date of the child datasets.</li>
<li>The update date (<a href="#Property:dataset_series_update_date"><code>dct:modified</code></a>) of the dataset series should correspond to the latest publication or update date of the child datasets.</li>
</ul>

<aside class="note">
<p>To ensure dataset series metadata be correct and updated, mechanisms can be put in place to implement upstream inheritance automatically. However, DCAT does not recommend any specific strategy to be adopted.</p>
</aside>

<aside class="example" id="ex-dataset-series-properties" title="">
<p>The following example is a variant <a href="#ex-dataset-series-releases"></a>, with child datasets corresponding to yearly budget data for specific countries. The temporal resolution (<code>dcat:temporalResolution</code>), temporal coverage (<code>dcat:temporal</code>), and spatial coverage (<code>dcat:spatial</code>) of the dataset series correspond to the union of those of the child datasets. Moreover, the dataset series specifies as publication date the one of the first published child dataset, whereas the date of publication of the last child dataset is specified as update date (<code>dct:modified</code>). Finally, the update frequency (<code>dct:accrualPeriodicity</code>) of the dataset series is annual, as the child datasets are published on a yearly basis, and not updated after their publication.</p>
<aside class="example" id="ex-dataset-series-properties" title="Dataset series metadata">
<p>The following example is a variant <a href="#ex-dataset-series-releases"></a>, with child datasets corresponding to yearly budget data for specific countries. The temporal resolution (<code>dcat:temporalResolution</code>), temporal coverage (<code>dcat:temporal</code>), and spatial coverage (<code>dcat:spatial</code>) of the dataset series correspond to the union of those of the child datasets. Moreover, the dataset series specifies as publication date the one of the first published child dataset, whereas the date of publication of the last child dataset is specified as update date (<code>dct:modified</code>). Finally, the update frequency (<code>dct:accrualPeriodicity</code>) of the dataset series is annual, as the child datasets are published on a yearly basis.</p>
<pre>
ex:budget a dcat:DatasetSeries ;
dct:type &lt;http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series&gt; ;
title "Budget data"@en ;
dct:hasPart ex:budget-2018-be , ex:budget-2019-be , ex:budget-2020-be ,
ex:budget-2018-fr , ex:budget-2019-fr , ex:budget-2020-fr ,
Expand Down Expand Up @@ -3668,6 +3652,33 @@ <h2>Property values inheritance in dataset series</h2>

</section>

<section id="dataset-series-before-dcat3">

<h2>Dataset series in existing DCAT implementations</h2>

<aside class="ednote">
<p>To be decided whether to keep or not this section.</p>
</aside>

<p>Existing DCAT implementations adopt two main alternative approaches to specifying dataset series:</p>

<ol>
<li>The dataset series is typed as a <code>dcat:Dataset</code>, whereas its child datasets are typed as <code>dcat:Distribution</code>'s.</li>
<li>Both the dataset series and its child datasets are typed as a <code>dcat:Dataset</code>'s, and the two are usually linked by using the [[DCTERMS]] properties <code>dct:hasPart</code> / <code>dct:isPartOf</code>.</li>
</ol>

<p>In both cases, the dataset series is sometimes soft-typed by using the [[DCTERMS]] property <code>dct:type</code> (e.g., this is the approach used in [[GeoDCAT-AP]], and adopted in [[DCAT-AP-IT]] and [[GeoDCAT-AP-IT]]).</p>

<p>These options are not formally incompatible with DCAT, so they can cohexist with <code>dcat:DatasetSeries</code> during the upgrade to DCAT 3.</p>

<!--
<p>Compared with the second option, the first one may have the advantage of simplifying metadata management, and avoiding the creation of datasets having the same values for almost all their metadata elements. On the other hand, this approach reduces the ability of being discovered, as distribution metadata are not rich as datasets' ones. Moreover, using distributions may result cumbersome or unfeasible when the number of child datasets is too high.</p>
<p>The approach based on the use of <code>dcat:Distribution</code> for typing child datasets is however recognized as a possible alternative, whenever it addresses more effectively the requirements of a given application scenario.</p>
-->

</section>

</section>


Expand Down

0 comments on commit 203442e

Please sign in to comment.