-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle huge set of Thing Descriptions (pagination, streaming, etc.) #16
Comments
Yet another possibility would be that a discovery call simply returns a list of links to the actual TDs. |
Edit: Sorry, your 3rd point was added later on and I didn't see it in my email. My comment was talking about this exactly. Another idea would be treating it like search engines where the most relevant results are placed first, like 10 TDs, which hopefully does not make a huge document. The client then goes to other "pages" and looks again. Based on the client's processing capabilities, it can ask for more TDs in the first request, like saying that a shopping website should display 100 items in a page based on user preference. |
IMHO pagination is the best option. We just need to represent a set of TDs; it could be JSON or JSON-LD or more efficient formats as presented in point 1.
A more sophisticated approach could be a Level Of Detail method. So that only
For databases that support SPARQL (i.e. SPARQL endpoints) pagination is handled with a combination of |
|
Separation of concerns. A client could tell options to the directory service: whether it only wants URLs, or URL and intro, or full TDs, together with parameters of the response (size, format etc). When a Thing Directory has lots of data, it might want to expose a different service/API (with subscription, pagination, etc) vs when it has relatively simple set of TDs. They scale differently so it's also the servers' interest. On the client side, generally with HTTP we should be able to use the Fetch standard that allows handling Response's via stream reader, or blob, or arraybuffer, json, string etc. It allows options/URI variables, cross-origin policies etc. We might just want a convenient wrapper on top (in Scripting). Also, with WSS we could have a sub-protocol for handling this. For other (eventually supported) protocols, the Thing Directory implementation should handle flow control, following the specifics of that protocol. |
We need to add an Editor's note about this to the FPWD. Farshid will create a PR. This will not close this issue, it will just point it out in the draft. |
pagination was discussed in https://w3c.github.io/w3c-api/ |
Discussion in discovery call Feb 15:
Use cases:
Requirements:
|
Next steps:
|
Yet another link worth looking at (from a former W3C community group): |
For convenience here's the link to Github pagination documentation: |
The W3C API Spec, shared by @ashimura. We are also collecting some pagination practices at linksmart/thing-directory#6. I really like Github's API but returning the links in headers will not be possible with CoAP. I don't know if we should limit ourselves in the HTTP API design with regards to what is possible with CoAP. |
Proposal for extension of discovery-context and a sample response (please see inline comments): {
"@context":{
"discovery":"https://www.w3.org/2021/wot/discovery#",
"tdd":"https://www.w3.org/2021/wot/discovery#",
"dcterms":"http://purl.org/dc/terms/",
"DirectoryDescription":{
"@id":"discovery:DirectoryDescription"
},
"LinkDescription":{
"@id":"discovery:LinkDescription"
},
"thingGraph":{
"@id":"discovery:ThingGraph",
"dcterms:description":"A graph of things, basically a shorthand for a named json-ld @graph, following: https://w3c.github.io/json-ld-syntax/#named-graph-data-indexing",
"@container":[
"@graph",
"@index"
]
},
"pagination":{
"@id":"discovery:Pagination",
"dcterms:description":"A block of pagination information, inspired by: https://www.w3.org/community/hydra/wiki/Pagination#PartialCollection",
"@type":"@none"
}
}
} {
"@context":[
"https://www.w3.org/2019/wot/td/v1",
"https://w3c.github.io/wot-discovery/context/discovery-context.jsonld"
],
"@id":"urn:my.tdd.response",
"name":"My TDD response",
"base":"http://server:port", // Could we allow inheritance for TDs listed in "thingGraph"?
"version":{ ... },
"securityDefinitions":{ ... }, // Could we allow inheritance for TDs listed in "thingGraph"?
"thingGraph":{
"thing_000001":{ ... },
"thing_000002":{ ... },
"thing_000010":{ ... },
"thing_000020":{ ... },
"thing_000100":{ ... }
},
"pagination":{
"size":5,
"self":"a relative path in here",
"next":"a relative path and / or query string in here"
}
} |
I think TD is not really useful for describing a page of TD collection. The directory will already have another TD describing the APIs at the top level. I prefer a simple response containing only what is necessary. For the collection object, array is better than dictionary because of size (no duplicate key/id) and order (sorting by attributes other than key). With query parameters such as If not using HTTP headers, everything in body: {
"@context": "<discovery or tdd context>",
"@type": "Collection", // or TDCollection
"items": [ {TD}, ... ], // or tds
"page": 1,
"perPage": 100,
"total": 350 // if ?count=true
} If using HTTP headers, body: [ {TD}, ... ] Content-Range header: Optional Link header for self, next links: |
I agree on having a simple response (format). However, what is necessary, depends on the individual use cases. So, defining some parts as mandatory and other parts as optional, might be the solution here. Wrt "base" and "securityDefinitions" inheritance, I think we could leave this out for the moment as I'm not sure whether this is actually possible in JSON-LD. Concerning the "container" in which TDs are wrapped: I'd like to have an option to name the respective TDs and have the possibility to create shortcuts with the help of the TD names e.g. for describing links between things as (proprietary / optional) part of my response without having the need to dig into the individual TDs (for the name and / or links section). I'd assume that this doesn't complicate the server side implementation too much and removes a lot of burden from the client side. Therefor I propose object instead of array for it. I would not name this container as "items", since in the TD "items" is already "Used to define the characteristics of an array". Using the HTTP header for transporting pagination or other additional information should not be considered. |
Use cases (clients) of a directory service could be using various protocols (HTTP, CoAP, MQTT, etc). For HTTP, we have streaming support and libraries to handle transparent streaming. One of the "best" common mechanisms would be a generic streaming API (the reply is a stream of TDs), which is easily implementable on most protocols given the existing libraries, but if someone needs to implement from scratch, it will be a variation of an observe/pagination/indexing mechanism. So IMHO the best common mechanism for IoT discovery would be an observe pattern, something like what is spec'd in the Scripting API (there page size is 1 at the API level, but could be more on the wire). We need to discern between chunks of TD and pages of TD, and any given response can be either a TD chunk or a page of TDs (but not a mix), for instance when we have a few huge TDs and a lot of small TDs, all of that match the discovery query. Segmentation/reassembly could be handled seamlessly by the runtime (or Scripting implementation, where there is Scripting), or by the application if it requests so (for instance buffers are so small that a full TD cannot be processed in-situ). Quite unlikely scenario, but then solvable with a request option. |
Could have arrays of objects, where objects have metadata + tds, like: [ { "id": <local_id>,
"td": {<TD>}
},
...
] |
Issues/proposals:
Concerns:
Other comments:
|
Let's look at examples here and follow them: linksmart/thing-directory#6
|
The draft spec for paginated listing has been added with the following response model: {
"@context": "<discovery context>",
"id": "/td?offset=0&limit=10",
"type": "Collection",
"items": [
{
"@context": "https://www.w3.org/2019/wot/td/v1",
"id": "urn:example:simple-td",
"title": "Simple TD",
"security": "basic_sc",
"securityDefinitions": {
"basic_sc": {
"scheme": "basic"
}
}
},
... nine more TDs
],
"total": 350,
"nextLink": "/td?offset=10&limit=10"
} Streaming support is left as optional and possible via server-driven content negotiation, without any specification. The listing section of the draft spec: anchored link - API spec is not yet available. |
Towards the solution using headers, the current Linked Data Platform Paging 1.0, as I commented on issue #54, this would solve some problems related to nested contexts or namespace collisions. |
I've taken a deeper look at LDP Paging 1.0. It does not describe pagination of a single TD in JSON form (other listing mechanisms may allow that). Otherwise it is same as the header-based proposal above with a few additions. Following that and adding our requirements, the operation can be as follows: Request Request
|
This relates to w3c#16 (comment)
While technically solved, the API may still change (there are some ongoing discussions and PRs) so will keep it open for now. |
Add alternative payload format (2). Minor cleanups should be in followup issues and PRs. Should close issue #16 also.
Use case
A TD Directory manage a huge set of TDs, maybe around 1000-10000 TDs. A client queries the TD directory where about 1000 TDs would match.
Problem statement
How are the 1000 TDs responded to the client in a resource-efficiently way? Will this be a huge file where all TDs are encapsulated? Or will the TDs be fragmented into blocks and answered? Or will there be a stream?
First brainstormings
The text was updated successfully, but these errors were encountered: