Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A streaming API for bulk data sharing or update #257

Closed
gra-moore opened this issue Nov 3, 2021 · 2 comments · Fixed by #283
Closed

A streaming API for bulk data sharing or update #257

gra-moore opened this issue Nov 3, 2021 · 2 comments · Fixed by #283

Comments

@gra-moore
Copy link
Collaborator

Problem statement

As discussed, the current API is chatty and location centric. This style of API does not fit the situation where data about many locations needs to be exchanged internally between systems and externally between partners.

The following strawman proposal defines a collection based streaming API for fetching or pushing data changes from or to a compliant API endpoint.

Key Qualities of the API

  • The API is independent of the payloads being exchanged.
  • The is no or minimal changes to the current set of data resource types.
  • API works for all existing resource types and future ones.
  • Full resource representations are exchanged not object deltas.

Background and Influences

The following APIs are examples of generic asynchronous, streaming, pull (and symmetrical push) based APIs.

API

Conceptually we think about an endpoint that exposes a number of datasets. Each dataset represents or manages a collection of resources. A client can ask a dataset for all resources and also ask for all resources that have changed since it last asked.

When a client receives the resources it is responsible for updating its local storage. Each resource is a full representation and the client should logically delete the current resource representation and replace it with what it has received.

The API has a single entry point and is consistently structured to allow introspection and dynamic navigation.

Datasets

/datasets =>

[
{
"name" : "animals",
"url" : "/datasets/animal",
"changes" : "/datasets/animals/changes",
"containsTypes" : [ "http://data.adewg.io/Animal"],
"count" : 5000,
"lastModified" : "date time"
}
,
{

}
]

/datasets/{datasetid} =>

{
"name" : "animals",
"url" : "/datasets/animal",
"changes" : "/datasets/animals/changes",
"containsTypes" : [ "http://data.adewg.io/Animal"],
"count" : 5000,
"lastModified" : "date time"
}

Changes

/datasets/{datasetid}/changes?since={sinceToken}

The changes endpoint returns the following:

[
{
context object - application specific / expansion point for rdf, JSON-LD and Entity Graph Data Model
},

{
resource
},

{
resource
},

{
continuation object
}
]

The continuation object contains a property 'token'. The token is an opaque string encoded with base64. This means the client should NOT try to guess at whats inside.

If a server has reloaded all data and needs the client to resync from the beginning, then the server can return an HTTP Header:

full-sync: true.

If this is returned the client should delete all local data. discard any since tokens and replace it with the resource from a call to /changes.

A client is responsible for storing the continuation token and then using it the next time it asks. Note that the continuation token can be used by the server to provide paging. A client typically calls to the changes endpoint until no entities are returned. It then stores the token and then waits for some period before asking again.

Required data model extensions

Ideally this API should work with little or no changes to the existing message structures that have been defined. This will make implementations more compatible with both API approaches.

  • That all resources have a resourceType property that indicates what kind of thing is being delivered. A corresponding list of resourceTypes MUST be defined. Ideally these types would be URIs that resolved to web pages that conferred the subject of the identifier.

  • That the location to which the resource is connected is conveyed in the data. Given that streams of data could encompass data from many locations this location must be communicated in the data. This should be added the icarResource type.

Dataset Push

The datasets concept can also be used in a push based scenario. In this model a client streams a sequence of resources to the datasets entities endpoint.

/dataset/{datasetid}/entities

See full sync versus incremental in this situation.

Web Socket Variant

It is also possible to deliver the same service over a websocket. In this model the client connects with a since token and then receives changes over the socket. The socket remains open and the server can then push any further changes to the client as they become available.

Security

Recommended to use JWT tokens to indicate a client.

The server is then responsible for mapping that client to a set of claims and then which locations that should provide access to. The set of resources exposed should only be for the locations that a given client can access.

@cookeac
Copy link
Collaborator

cookeac commented May 20, 2022

Resolved by #283

@cookeac cookeac linked a pull request May 20, 2022 that will close this issue
@stale
Copy link

stale bot commented Aug 30, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale-issue Identifies that an issue is stale and will be closed unless reactivated. label Aug 30, 2022
@cookeac cookeac removed the stale-issue Identifies that an issue is stale and will be closed unless reactivated. label Aug 30, 2022
@cookeac cookeac closed this as completed Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants