-
Notifications
You must be signed in to change notification settings - Fork 12
/
eventstreams.bs
299 lines (225 loc) · 14.4 KB
/
eventstreams.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
<pre class='metadata'>
Title: Linked Data Event Streams
Shortname: LDES
Level: 1
Status: LS
Group: TREE community group
URL: https://w3id.org/ldes/specification
Markup Shorthands: markdown yes
Editor: Pieter Colpaert, https://pietercolpaert.be
Repository: https://github.com/SEMICeu/LinkedDataEventStreams
Abstract: A Linked Data Event Stream is a collection of immutable objects (such as version objects, sensor observations or archived representations). Each object is described in RDF.
</pre>
# Introduction # {#introduction}
A Linked Data Event Stream (LDES) (`ldes:EventStream`) is a collection (`rdfs:subClassOf tree:Collection`) of immutable objects, each object being described using a set of RDF triples ([[!rdf-primer]]).
This specification uses the [TREE specification](https://treecg.github.io/specification) for its collection and fragmentation (or pagination) features, which in its turn is compatible to other specifications such as [[!activitystreams-core]], [[!VOCAB-DCAT-2]], [[!LDP]] or [Shape Trees](https://shapetrees.org/TR/specification/). For the specific compatibility rules, read the [TREE specification](https://treecg.github.io/specification).
Note: When a client once processed a member, it should never have to process it again. A Linked Data Event Stream client can thus keep a list of (or cache) already processed member IRIs. A reference implementation of a client is available as part of the Comunica framework on [NPM and Github](https://github.com/treecg/event-stream-client).
The base URI for LDES is `https://w3id.org/ldes#`, and the preferred prefix is `ldes:`. Other prefixes are used following [prefix.cc](https://prefix.cc/sosa).
<div class="example" highlight="turtle">
```turtle
ex:C1 a ldes:EventStream ;
ldes:timestampPath sosa:resultTime ;
tree:shape ex:shape1.shacl ;
tree:member ex:Observation1 .
ex:Observation1 a sosa:Observation ;
sosa:resultTime "2021-01-01T00:00:00Z"^^xsd:dateTime ;
sosa:hasSimpleResult "..." .
```
</div>
The `ldes:EventStream` instance SHOULD have these properties:
* `tree:shape`: the shape of the collection defines its members. It tells clients all old and new members of the stream have been and will be validated by that shape. As a consequence of the immutability of the members, this shape MAY evolve, but it MUST always be backwards compatible to the earlier version.
* `tree:member` indicating the members of the collection.
The `ldes:EventStream` instance MAY have these properties:
* `ldes:timestampPath` indicating how you can understand using a timestamp (`xsd:dateTime`) a member precedes another member in the LDES
* `ldes:versionOfPath` indicating the non-version object (see example bellow).
<div class="example">
```turtle
ex:C2 a ldes:EventStream ;
ldes:timestampPath dcterms:created ;
ldes:versionOfPath dcterms:isVersionOf ;
tree:shape ex:shape2.shacl ;
tree:member ex:AddressRecord1-version1 .
ex:AddressRecord1-version1 dcterms:created "2021-01-01T00:00:00Z"^^xsd:dateTime ;
adms:versionNotes "First version of this address" ;
dcterms:isVersionOf ex:AddressRecord1 ;
dcterms:title "Streetname X, ZIP Municipality, Country" .
```
</div>
Note: When you need to change an earlier version of an `ldes:EventStream`, there are two options: create a new version of the object with a new shape that is backward compatible, and add the new version of that object again as a member on the stream, or replicate and transform the entire collection into a new `ldes:EventStream`. You can indicate that the new `ldes:EventStream` is derived from another ldes:EventStream.
Note: in Example 1, we consider the Observation object to be an immutable object and we can use the existing identifiers. In Example 2 however, we still had to create version IRIs in order to be able to link to immutable objects.
<!--What’s an authoritative source and what’s a third party indexer?-->
# Fragmenting and pagination # {#tree}
The focus of an LDES is to allow clients to replicate the history of a dataset and efficiently synchronize with its latest changes.
Linked Data Event Streams MAY be fragmented when their size becomes too big for 1 HTTP response.
Fragmentations MUST be described using the features in the [TREE specification](https://treecg.github.io/specification).
All relation types from the TREE specification MAY be used.
<div class="example">
```turtle
ex:C1 a ldes:EventStream ;
ldes:timestampPath sosa:resultTime ;
tree:shape ex:shape1.shacl ;
tree:member ex:Obervation1, ... ;
tree:view <?page=1> .
<?page=1> a tree:Node ;
tree:relation [
a tree:GreaterThanOrEqualToRelation ;
tree:path sosa:resultTime ;
tree:node <?page=2> ;
tree:value "2020-12-24T12:00:00Z"^^xsd:dateTime
] .
```
</div>
An `tree:importStream` MAY be used to describe a publish-subscribe interface to subscribe to new members in the LDES.
Note: A 1-dimensional fragmentation based on creation time of the immutable objects is probably going to be the most interesting and highest priority fragmentation for an LDES, as only the latest page, once replicated, should be subscribed to for updates.
However, it may happen that a time-based fragmentation cannot be applied. For example: the backend system on which the LDES has been built does not receive the events at the time they were created, due to human errors (forgetting to indicate that a change was made),
external systems or just latency. Applying a time-based fragmentation in that situation will result in losing caching, due to the ever-changing pages. Instead, in the spirit of an LDES’s goal, the publisher should publish the events in the order they were received
by the backend system (that order is never changing), trying to give as many pages as possible an HTTP `Cache-Control: public, max-age=604800, immutable` header.
Note: Cfr. [the example in the TREE specification on “searching through a list of objects ordered in time”](https://treecg.github.io/specification/#timesearch), also a search form can optionally make a one dimensional feed of immutable objects more searchable.
# Retention policies # {#retention}
By default, an LDES MUST keep all data that has been added to the `tree:Collection` (or `ldes:EventStream`) as defined by the TREE specification.
It MAY add a retention policy in which the server indicates data will be removed from the server.
Third parties SHOULD read retention policies to understand what subset of the data is available in this `tree:View`, and MAY archive these members.
In the LDES specification, three types of retention policies are defined which can be used with a `ldes:retentionPolicy` with an instance of a `tree:View` as its subject:
1. `ldes:DurationAgoPolicy`: a time-based retention policy in which data generated before a specified duration is removed
2. `ldes:LatestVersionSubset`: a version subset based on the latest versions of an entity in the stream
3. `ldes:PointInTimePolicy`: a point-in-time retention policy in which data generated before a specific time is removed
Different retention policies MAY be combined.
When policies are used together, a server MUST store the members as long they are not all matched.
## Time-based retention policies ## {#time-based-retention}
A time-based retention policy can be introduced as follows:
<div class="example">
```turtle
ex:C3 a ldes:EventStream ;
ldes:timestampPath prov:generatedAtTime ;
tree:view <> .
<> ldes:retentionPolicy ex:P1 .
ex:P1 a ldes:DurationAgoPolicy ;
tree:value "P1Y"^^xsd:duration . # Keep 1 year of data
```
</div>
A `ldes:DurationAgoPolicy` uses a `tree:value` with an `xsd:duration`-typed literal to indicate how long ago the timestamp, indicated by the `ldes:timestampPath` that MAY be redefined in the policy itself.
## Version-based retention policies ## {#version-subsets}
<div class="example">
In order to indicate you only keep 2 versions of an object referred to using `dcterms:isVersionOf`:
```turtle
ex:C2 a ldes:EventStream ;
ldes:timestampPath dcterms:created ;
ldes:versionOfPath dcterms:isVersionOf ;
tree:view <> .
<> ldes:retentionPolicy ex:P2 .
ex:P2 a ldes:LatestVersionSubset;
ldes:amount 2 ;
#If different from the Event Stream, this can optionally be overwritten here
ldes:timestampPath dcterms:created ;
ldes:versionOfPath dcterms:isVersionOf .
```
</div>
A `ldes:LatestVersionSubset` MUST define the predicate `ldes:amount` and MAY redefine the ldes:timestampPath and/or ldes:versionOfPath. It MAY also define a compound version key using `ldes:versionKey` (see example below) instead of the more `ldes:versionOfPath`.
The `ldes:amount` has a `xsd:nonNegativeInteger` datatype and indicated how many to keep that defaults to 1.
The `ldes:versionKey` is an `rdf:List` of SHACL property paths indicating objects that MUST be concatenated together to find the key on which versions are matched.
When the `ldes:versionKey` is set to an empty path `()`, all members MUST be seen as a version of the same thing.
<div class="example">
For sensor datasets the version key may get more complex, grouping observations by both the observed property as the sensor that made the observation.
```turtle
ex:C1 a ldes:EventStream ;
tree:view <> .
<> ldes:retentionPolicy ex:P3 .
ex:P3 a ldes:LatestVersionSubset;
ldes:amount 2 ;
ldes:versionKey ( ( sosa:observedProperty ) ( sosa:madeBySensor ) ) .
```
</div>
## Point-in-time retention policies ## {#point-in-time}
A point-in-time retention policy can be introduced as follows:
<div class="example">
```turtle
ex:C4 a ldes:EventStream ;
ldes:timestampPath prov:generatedAtTime ;
tree:view <> .
<> ldes:retentionPolicy ex:P4 .
ex:P4 a ldes:PointInTimePolicy ;
ldes:pointInTime "2023-04-12T00:00:00"^^xsd:dateTime . # Keep data after April 12th, 2023
```
</div>
A `ldes:PointInTimePolicy` uses a `ldes:pointInTime` with an `xsd:dateTime`-typed literal to indicate the point in time after which data is kept.
# Derived collections # {#derived}
We will extend the spec with multiple best practices on how to annotate that your newly published collection is derived from an LDES.
First we talk about a versioned LDES. Versioned LDESes allow for changing an object in an `ldes:EvenStream`, while maintaining the history of events.
It is achieved by defining change in an `ldes:EventStream` through new `tree:member` in the `ldes:EventStream` through added metadata for both the `ldes:EvenStream` and each `tree:member`.
Secondly, version materializations are defined that use a versioned LDES as a basis.
This technique allows to create **snapshots** in time of a versioned LDES.
Here we define a **snapshot** as `tree:Collection` of the most recent versions of all objects in the versioned LDES.
## Versioning ## {#versioning}
A versioned LDES is defined with two properties: `ldes:versionOfPath` and `ldes:timestampPath`.
* `ldes:versionOfPath`: declares the **property** that is used to define that a `tree:member` of an `ldes:EventStream` is a version.
* `ldes:timestampPath`: declares the property that is used to define the DateTime of a `tree:member`.
<div class="example" highlight="turtle" id="ldes-versioning-1">
A <b>versioned</b> LDES with one member.<br>
`dct:isVersionOf` is used as property for `ldes:versionOfPath`, which indicates that `ex:resource1v0` is a version of `ex:resource1`.<br>
`dct:issued` is used as property for `ldes:timestampPath`, which indicates that `ex:resource1v0` was issued in the LDES at "2021-12-15T10:00:00.000Z".
```turtle
ex:ES a ldes:EventStream;
ldes:versionOfPath dct:isVersionOf;
ldes:timestampPath dct:issued;
tree:member ex:resource1v0.
ex:resource1v0
dct:isVersionOf ex:resource1;
dct:issued "2021-12-15T10:00:00.000Z"^^xsd:dateTime;
dct:title "First version of the title".
```
</div>
<div class="example" highlight="turtle" id="ldes-versioning-2">
A <b>versioned</b> LDES with two members which are both versions of the same object.
```turtle
ex:ES a ldes:EventStream;
ldes:versionOfPath dct:isVersionOf;
ldes:timestampPath dct:issued;
tree:member ex:resource1v0, ex:resource1v1.
ex:resource1v0
dct:isVersionOf ex:resource1;
dct:issued "2021-12-15T10:00:00.000Z"^^xsd:dateTime;
dct:title "First version of the title".
ex:resource1v1
dct:isVersionOf ex:resource1;
dct:issued "2021-12-15T12:00:00.000Z"^^xsd:dateTime;
dct:title "Title has been updated once".
```
2 hours after `ex:resource1v0` was created, the title of `ex:resource1` was changed.
This change can be seen by the creation of `ex:resource1v1`, which is the newest version of `ex:resource1`.
</div>
## Version Materializations ## {#version-materializations}
A version materialization can be defined only if the original LDES defines both `ldes:versionOfPath` and `ldes:timestampPath`.
A version materialization replaces the subject of a member with its `ldes:versionOfPath` IRI, and selects a certain version of this object.
It also translates `created` style timestamp predicates to `modified`-style predicates.
<div class="example">
In this example an event stream with 2 versions of the same object got materialized until `2020-10-05T12:00:00Z`
```turtle
ex:ES1 a ldes:EventStream
ldes:versionOfPath dct:isVersionOf;
ldes:timestampPath dct:created;
tree:member [
dcterms:isVersionOf <A>;
dcterms:created "2020-10-05T11:00:00Z";
owl:versionInfo "v0.0.1";
rdfs:label "A v0.0.1"
], [
dcterms:isVersionOf <A> ;
dcterms:created "2020-10-06T13:00:00Z";
owl:versionInfo "v0.0.2";
rdfs:label "A v0.0.2"
].
```
towards the snapshot below
```turtle
ex:ES1v1 a tree:Collection ; # the members are no longer immutable
ldes:versionMaterializationOf ex:ES1;
ldes:versionMaterializationUntil "2020-10-05T12:00:00Z"^^xsd:dateTime;
tree:member <A>.
<A> rdfs:label "A v0.0.1";
dcterms:modified "2020-10-05T11:00:00Z".
```
</div>
A version materialization is thus a `tree:Collection` instance that has two predicates set:
* `ldes:versionMaterializationOf`: points to the orginal LDES
* `ldes:versionMaterializationUntil`: optionally gives a timestamp (`xsd:dateTime`) until when the materialization was made.
Note: We see `versionMaterializationUntil` mainly useful for historical and static datasets that deliberately will not be updated to the latest state of the LDES.