A streaming API for bulk data sharing or update #257

gra-moore · 2021-11-03T11:08:15Z

Problem statement

As discussed, the current API is chatty and location centric. This style of API does not fit the situation where data about many locations needs to be exchanged internally between systems and externally between partners.

The following strawman proposal defines a collection based streaming API for fetching or pushing data changes from or to a compliant API endpoint.

Key Qualities of the API

The API is independent of the payloads being exchanged.
The is no or minimal changes to the current set of data resource types.
API works for all existing resource types and future ones.
Full resource representations are exchanged not object deltas.

Background and Influences

The following APIs are examples of generic asynchronous, streaming, pull (and symmetrical push) based APIs.

SDShare W3C Note. http://sdshare.org
ODATA - http://odata.org
Universal Data API (UDA) - https://open.mimiro.io/specifications/uda/latest.html
NorFor datasets API

API

Conceptually we think about an endpoint that exposes a number of datasets. Each dataset represents or manages a collection of resources. A client can ask a dataset for all resources and also ask for all resources that have changed since it last asked.

When a client receives the resources it is responsible for updating its local storage. Each resource is a full representation and the client should logically delete the current resource representation and replace it with what it has received.

The API has a single entry point and is consistently structured to allow introspection and dynamic navigation.

Datasets

/datasets =>

[
{
"name" : "animals",
"url" : "/datasets/animal",
"changes" : "/datasets/animals/changes",
"containsTypes" : [ "http://data.adewg.io/Animal"],
"count" : 5000,
"lastModified" : "date time"
}
,
{

}
]

/datasets/{datasetid} =>

{
"name" : "animals",
"url" : "/datasets/animal",
"changes" : "/datasets/animals/changes",
"containsTypes" : [ "http://data.adewg.io/Animal"],
"count" : 5000,
"lastModified" : "date time"
}

Changes

/datasets/{datasetid}/changes?since={sinceToken}

The changes endpoint returns the following:

[
{
context object - application specific / expansion point for rdf, JSON-LD and Entity Graph Data Model
},

{
resource
},

{
continuation object
}
]

The continuation object contains a property 'token'. The token is an opaque string encoded with base64. This means the client should NOT try to guess at whats inside.

If a server has reloaded all data and needs the client to resync from the beginning, then the server can return an HTTP Header:

full-sync: true.

If this is returned the client should delete all local data. discard any since tokens and replace it with the resource from a call to /changes.

A client is responsible for storing the continuation token and then using it the next time it asks. Note that the continuation token can be used by the server to provide paging. A client typically calls to the changes endpoint until no entities are returned. It then stores the token and then waits for some period before asking again.

Required data model extensions

Ideally this API should work with little or no changes to the existing message structures that have been defined. This will make implementations more compatible with both API approaches.

That all resources have a resourceType property that indicates what kind of thing is being delivered. A corresponding list of resourceTypes MUST be defined. Ideally these types would be URIs that resolved to web pages that conferred the subject of the identifier.
That the location to which the resource is connected is conveyed in the data. Given that streams of data could encompass data from many locations this location must be communicated in the data. This should be added the icarResource type.

Dataset Push

The datasets concept can also be used in a push based scenario. In this model a client streams a sequence of resources to the datasets entities endpoint.

/dataset/{datasetid}/entities

See full sync versus incremental in this situation.

Web Socket Variant

It is also possible to deliver the same service over a websocket. In this model the client connects with a since token and then receives changes over the socket. The socket remains open and the server can then push any further changes to the client as they become available.

Security

Recommended to use JWT tokens to indicate a client.

The server is then responsible for mapping that client to a set of claims and then which locations that should provide access to. The set of resources exposed should only be for the locations that a given client can access.

cookeac · 2022-05-20T06:08:57Z

Resolved by #283

stale · 2022-08-30T23:10:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

cookeac added agenda-next-meeting and removed agenda-next-meeting labels Nov 4, 2021

cookeac added the agenda-next-meeting label Nov 18, 2021

cookeac removed the agenda-next-meeting label Dec 16, 2021

cookeac added the agenda-next-meeting label Feb 23, 2022

cookeac removed the agenda-next-meeting label Apr 7, 2022

alamers mentioned this issue May 19, 2022

Write up docs for deciding between location centric vs streaming api #293

Closed

cookeac linked a pull request May 20, 2022 that will close this issue

updated draft #283

Merged

stale bot added the stale-issue Identifies that an issue is stale and will be closed unless reactivated. label Aug 30, 2022

cookeac removed the stale-issue Identifies that an issue is stale and will be closed unless reactivated. label Aug 30, 2022

cookeac closed this as completed Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A streaming API for bulk data sharing or update #257

A streaming API for bulk data sharing or update #257

gra-moore commented Nov 3, 2021

cookeac commented May 20, 2022

stale bot commented Aug 30, 2022

A streaming API for bulk data sharing or update #257

A streaming API for bulk data sharing or update #257

Comments

gra-moore commented Nov 3, 2021

Problem statement

Key Qualities of the API

Background and Influences

API

Datasets

Changes

Required data model extensions

Dataset Push

Web Socket Variant

Security

cookeac commented May 20, 2022

stale bot commented Aug 30, 2022