-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A streaming API for bulk data sharing or update #257
Comments
Resolved by #283 |
Merged
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
stale
bot
added
the
stale-issue
Identifies that an issue is stale and will be closed unless reactivated.
label
Aug 30, 2022
cookeac
removed
the
stale-issue
Identifies that an issue is stale and will be closed unless reactivated.
label
Aug 30, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem statement
As discussed, the current API is chatty and location centric. This style of API does not fit the situation where data about many locations needs to be exchanged internally between systems and externally between partners.
The following strawman proposal defines a collection based streaming API for fetching or pushing data changes from or to a compliant API endpoint.
Key Qualities of the API
Background and Influences
The following APIs are examples of generic asynchronous, streaming, pull (and symmetrical push) based APIs.
API
Conceptually we think about an endpoint that exposes a number of datasets. Each dataset represents or manages a collection of resources. A client can ask a dataset for all resources and also ask for all resources that have changed since it last asked.
When a client receives the resources it is responsible for updating its local storage. Each resource is a full representation and the client should logically delete the current resource representation and replace it with what it has received.
The API has a single entry point and is consistently structured to allow introspection and dynamic navigation.
Datasets
/datasets =>
[
{
"name" : "animals",
"url" : "/datasets/animal",
"changes" : "/datasets/animals/changes",
"containsTypes" : [ "http://data.adewg.io/Animal"],
"count" : 5000,
"lastModified" : "date time"
}
,
{
}
]
/datasets/{datasetid} =>
{
"name" : "animals",
"url" : "/datasets/animal",
"changes" : "/datasets/animals/changes",
"containsTypes" : [ "http://data.adewg.io/Animal"],
"count" : 5000,
"lastModified" : "date time"
}
Changes
/datasets/{datasetid}/changes?since={sinceToken}
The changes endpoint returns the following:
[
{
context object - application specific / expansion point for rdf, JSON-LD and Entity Graph Data Model
},
{
resource
},
{
resource
},
{
continuation object
}
]
The continuation object contains a property 'token'. The token is an opaque string encoded with base64. This means the client should NOT try to guess at whats inside.
If a server has reloaded all data and needs the client to resync from the beginning, then the server can return an HTTP Header:
full-sync: true.
If this is returned the client should delete all local data. discard any since tokens and replace it with the resource from a call to /changes.
A client is responsible for storing the continuation token and then using it the next time it asks. Note that the continuation token can be used by the server to provide paging. A client typically calls to the changes endpoint until no entities are returned. It then stores the token and then waits for some period before asking again.
Required data model extensions
Ideally this API should work with little or no changes to the existing message structures that have been defined. This will make implementations more compatible with both API approaches.
That all resources have a resourceType property that indicates what kind of thing is being delivered. A corresponding list of resourceTypes MUST be defined. Ideally these types would be URIs that resolved to web pages that conferred the subject of the identifier.
That the location to which the resource is connected is conveyed in the data. Given that streams of data could encompass data from many locations this location must be communicated in the data. This should be added the icarResource type.
Dataset Push
The datasets concept can also be used in a push based scenario. In this model a client streams a sequence of resources to the datasets entities endpoint.
/dataset/{datasetid}/entities
See full sync versus incremental in this situation.
Web Socket Variant
It is also possible to deliver the same service over a websocket. In this model the client connects with a since token and then receives changes over the socket. The socket remains open and the server can then push any further changes to the client as they become available.
Security
Recommended to use JWT tokens to indicate a client.
The server is then responsible for mapping that client to a set of claims and then which locations that should provide access to. The set of resources exposed should only be for the locations that a given client can access.
The text was updated successfully, but these errors were encountered: