Consider using "JSON Lines" for large TDs #93

mmccool · 2020-11-09T15:46:48Z

A protocol for returning JSON line-by-line (or rather chunk-by-chunk) which may be useful for returning large TDs. Suggested by @farshidtz

See https://jsonlines.org/

danielpeintner · 2020-11-10T08:06:30Z

I am not sure if a line-by-line approach works out for JSON-LD processing (without any further requirements).

There are some strong requirements, e.g., that @context is necessary to do further processing.

{
    "id": "urn:dev:ops:32473-WoTLamp-1234",
    "title": "MyLampThing",
    "@type": "saref:LightSwitch", // this (and similar) lines are not interpretable till the context is known

    ... BIG CHUNK OF DATA ....

    "@context": [
        "http://www.w3.org/ns/td",
        { "saref": "https://w3id.org/saref#" }
    ]
}

relu91 · 2020-11-10T08:15:07Z

From the home page, it seems that it is not applicable to our use case (i.e. Big TDs). I quote:

Each Line is a Valid JSON Value
The most common values will be objects or arrays, but any JSON value is permitted.
See json.org for more information about JSON values.

This means that the format is meant to be used with a list of JSON values, like a list of Objects, Arraries, or strings. It wouldn't work with a big JSON object.

egekorkan · 2020-11-10T09:24:17Z

I also don't see the use case very well. Even outside of the JSON-LD related features, TD has interdependencies like the base value, securityDefinitions and to some extent the dependency between readOnly, writeOnly, observable and the forms. Also, any DataSchema term could be influenced by the other DataSchema terms, i.e. if there is "type":"number", there is a possibility of maximum appearing somewhere inside the interaction. Even worse for objects and arrays.

So I would say that processing a JSON Line document does not make a lot of sense but transmitting it chunk by chunk before processing makes sense. However, wouldn't it make more sense to rely on the transportation mechanism for that?

relu91 · 2020-11-10T09:33:15Z

So I would say that processing a JSON Line document does not make a lot of sense but transmitting it chunk by chunk before processing makes sense.

The point is that I think we had a misunderstanding. JSON lines do not seem to split big JSON objects, it will send it as a whole. For example:

{
/* super big TD */
}// send the whole object

While here:

{/* super big TD */} // send this first
{/* super big TD */} // then this one

However, wouldn't it make more sense to rely on the transportation mechanism for that?

Generally, speaking yes. HTTP can handle big files easily. However, originally, we thought that big TDs could occupy TDD resources and could cause DOS problems. Moreover, I am not sure that every protocol binding could handle big files. Does COAP have such capability?

Finally, I think this might be an optimization but we could leave it out the spec. I mean, it does not have the highest priority on my mind.

egekorkan · 2020-11-10T09:38:08Z

I see. Just to answer the small question :)

Does COAP have such capability?

Yes -> https://tools.ietf.org/html/rfc7959 and w3c/wot-binding-templates#49

farshidtz · 2020-11-10T09:49:22Z

I agree, this was suggested in the wrong context. It does not solve the "super big TD" problem.

It can be used to deliver TDs one-by-one, as mentioned by @relu91:

{/* super big TD */} // send this first
{/* super big TD */} // then this one

allowing the clients to consume them one at a time and interrupt at any time, instead of:

[
  {/* super big TD */},
  {/* super big TD */}
]

This is similar to paginating with page size of one, except that the client doesn't need to make a new requests for consecutive TDs.

JSON Lines responses can be requested through content negotiation. The use cases are for e.g. when querying several TDs and stopping after you receive an expected TD or before you run out of memory.

egekorkan · 2020-11-10T10:24:08Z

Ah I see, makes a lot of sense like this :)

mmccool · 2020-11-30T14:23:03Z

Well, if I were designing a system to send TDs incrementally, I would do something like a recursive approach, e.g. send the JSON with elements down to some maximum depth, with detailed sub-elements replaced with references that would then be sent later. The problem with this is it's still hard to limit the max size of each chunk.

It might be easier to just encode the TD as a string or binary blob, and then just send that in chunks (which should be easy to define). Then the query would still return a JSON outer wrapper, but the TD itself would be encoded as a string value which would have to be unpacked.

Note that for signed and/or encrypted TDs we may have to deal with this use case anyway.

Returning chunked string-encoded TDs could be an option on the filter. If the consumer is not concerned about incoming size it could be dropped. On the server side though if someone tried to read a really large TD they might get an error if it exceeds some max size, but the error could indicate that that particular TD can only be read in "chunked" mode. If a query returns multiple TDs than if any TD exceeds the max size then the entire query would have to return that error.

farshidtz · 2021-10-04T13:17:14Z

I propose closing this issue and continue the discussion on #117.

farshidtz · 2022-01-10T15:56:05Z

From Discovery call:

The pagination feature provides similar functionality, i.e. paginate TDs one by one.
Discussion on incremental transfer can follow at Incremental TDs provided by constrained devices #117

mmccool mentioned this issue Nov 9, 2020

Handle huge set of Thing Descriptions (pagination, streaming, etc.) #16

Closed

farshidtz mentioned this issue Mar 17, 2021

Information model for interactions #116

Closed

farshidtz mentioned this issue Mar 24, 2021

Listing with chunked transfer #145

Merged

farshidtz added the Propose Closing label Oct 4, 2021

farshidtz closed this as completed Jan 10, 2022

JKRhb mentioned this issue Jan 10, 2022

CoAP Blockwise Indicator w3c/wot-binding-templates#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using "JSON Lines" for large TDs #93

Consider using "JSON Lines" for large TDs #93

mmccool commented Nov 9, 2020

danielpeintner commented Nov 10, 2020

relu91 commented Nov 10, 2020

egekorkan commented Nov 10, 2020

relu91 commented Nov 10, 2020

egekorkan commented Nov 10, 2020

farshidtz commented Nov 10, 2020 •

edited

Loading

egekorkan commented Nov 10, 2020

mmccool commented Nov 30, 2020

farshidtz commented Oct 4, 2021

farshidtz commented Jan 10, 2022

Consider using "JSON Lines" for large TDs #93

Consider using "JSON Lines" for large TDs #93

Comments

mmccool commented Nov 9, 2020

danielpeintner commented Nov 10, 2020

relu91 commented Nov 10, 2020

egekorkan commented Nov 10, 2020

relu91 commented Nov 10, 2020

egekorkan commented Nov 10, 2020

farshidtz commented Nov 10, 2020 • edited Loading

egekorkan commented Nov 10, 2020

mmccool commented Nov 30, 2020

farshidtz commented Oct 4, 2021

farshidtz commented Jan 10, 2022

farshidtz commented Nov 10, 2020 •

edited

Loading