Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support JSON values that aren’t mapped #4

Closed
gkellogg opened this issue Jun 30, 2018 · 38 comments
Closed

Support JSON values that aren’t mapped #4

gkellogg opened this issue Jun 30, 2018 · 38 comments

Comments

@gkellogg
Copy link
Member

gkellogg commented Jun 30, 2018

  • Consider using `”@type”: “@JSON” to describe native values in the compact form.
  • Native values should include all JSON types: strings, booleans, numbers, and null as well as objects and arrays.
  • Expanded form can record these as values of @value.
    • Does interfere with some use of [] and {} in framing

Original issue is Support JSON values that aren't mapped #333

@azaroth42
Copy link
Contributor

👍 This is a better (IMO) solution for native JSON values such as GeoJSON than requiring every community to map all of their constructs into -LD.

To quote (with slight edits) the example from the original issue:

{
  "@context": {
    "@vocab": "http://example/",
    "@base": "http://example/",
    "json-value": {"@type": "@json"}
  },
  "@id": "foo",
  "json-value": {"native": "json"}
}

This seems very sensible, and fits with our charter. We can later make @json an alias for whatever literal type a future RDF WG might assign for JSON.

@akuckartz
Copy link

I would prefer a more LD friendly solution for GeoJSON. #7 ?

@azaroth42
Copy link
Contributor

@akuckartz I didn't mean to imply that GeoJSON-LD was a bad thing to do, just that if the requirement is "support native JSON data structures in the JSON-LD context", then GeoJSON could be managed that way without then layering on GeoJSON-LD. GeoJSON-LD is great ... but if you don't need to interact with the -LD part of it, just record the JSON structure, there's overhead that could be minimized.

There's a separate issue for the list of lists feature beyond #7 that was already accepted to be part of 1.1. #7 would additionally let the semantics of the list of lists be expressed.

@gkellogg
Copy link
Member Author

gkellogg commented Aug 28, 2018

The key is the expanded form; my thought was that the previous example might expand to something like the following:

[{
  "@id": "http://example/foo",
  "http://example/json-value": [{
    "@value": {"native": "json"},
    "@type": "@json"
  }]
}]

Regarding #7, this is not in conflict with a potentially more semweb-y mapping for GeoJSON, but there are other reasons why you might want to preserve raw JSON within JSON-LD.

When turned into RDF, we would need a datatype to describe the value, so that you would get something like the following:

@base <http://example/foo> .
@prefix jsonld: <https://www.w3.org/ns/json-ld#> .

<foo> <json-value> '{"native": "json"}'^^jsonld:json .

Where the JSON is normalized to use minimal whitespace.

@iherman
Copy link
Member

iherman commented Aug 29, 2018

I think defining a jsonld:json datatype woukd make a lot if sense at this day and age... and would offer a clean solution.

@davidlehn
Copy link
Contributor

Will need to note that the whole feature is somewhat implementation dependent. Native JSON serialization/deserialization issues may some effect on key ordering, float representation, etc.

@davidlehn
Copy link
Contributor

davidlehn commented Aug 31, 2018

Should perhaps be jsonld:JSON to better align with https://www.w3.org/TR/rdf11-concepts/#section-html

@azaroth42
Copy link
Contributor

WG resolved to add a @JSON keyword, mapped to jsonld:JSON to identify the JSON data type.

@BigBlueHat
Copy link
Member

I'm concerned this opens a Pandora's box...or maybe several. Sadly, I wasn't here for the call and had overlooked this issue earlier, so I fear I'm only just now raising these concerns...

We're (rather passively) introducing a namespace specific to JSON-LD: https://www.w3.org/ns/json-ld#

We're inviting developers to avoid/ignore the graph model JSON-LD encodes:

{
  "@context": {
    "data": {"@type": "@json"}
  },
  "data": {"everything": "imaginable"}
}

I fear providing this as a "solution for native JSON values such as GeoJSON" sends the wrong message...and it begins to invalidate the reason to have JSON-LD at all (see the example above).

Are we also planning to do this for YAML? Because the use cases would be identical...

@cwebber
Copy link

cwebber commented Sep 4, 2018

Having implemented the RDF canonicalization spec with a minor headache, this sounds like a full on migraine.

yo dawg I heard you like canonicalization so I put a tree data serialization canonicalization algorithm in your graph data serialization canonicalization algorithm so you can normalize while you normalize

@ajs6f
Copy link
Member

ajs6f commented Sep 4, 2018

@BigBlueHat I appreciate (and to some extent share) that concern, but I wonder if there's a historical analogy: I've not seen the kind of problem you are describing using XML literals within RDF/XML. That may not be a valid analogy, but it's a bit suggestive...

@azaroth42
Copy link
Contributor

Re YAML, I don't think we would do that, because (a) no one has asked for it and (b) YAML is a non-normative deliverable of how the patterns of JSON-LD could be used in YAML to accomplish the same ends. The charter says: "JSON-LD 1.1 examples specified in YAML" not a normative YAML-LD Rec.

We would be introducing a namespace, yes. We could also (as discussed on the call) add the data type to the RDF namespace, but we at least would need to document it. The consensus was that the creation of a new namespace was less work than putting it into an existing one, and a future RDF WG could take it over down the line.

I agree with @ajs6f about the use of XML literals in RDF/XML. Yes, you can create pointless RDF that simply wraps a single literal in XML or JSON ... but why would you bother to do that? It seems like an enormous waste of your time other than to meet some badly worded RFP.

@gkellogg
Copy link
Member Author

gkellogg commented Sep 4, 2018

As @ajs6f points out, other RDF syntaxes that leverage languages have a similar mechanism for including raw XML or HTML, this is really no different.

For RDF canonicalization, such values would be treated just as other datatyped literals. Part of the RDF serialization aspects should include whitespace normalization, which is fairly standard in JSON, so I don't really appreciate why things such as RDF Dataset Normalization and signatures would be at any disadvantage.

@gkellogg
Copy link
Member Author

gkellogg commented Sep 4, 2018

@BigBlueHat worries about introducing a new namespace:

We're (rather passively) introducing a namespace specific to JSON-LD: https://www.w3.org/ns/json-ld#.

In fact, this namespace already exists for URIs such as http://www.w3.org/ns/json-ld#expanded used in HTTP headers.

However, we don't need to use this namespace, and @iherman suggested that we could probably use the RDF namespace http://www.w3.org/1999/02/22-rdf-syntax-ns# and use http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON as the datatype, making it first-class with XMLLiteral and HTML datatypes. Updating the RDF namespace document is something we can do, apparently.

I agree that this no longer serves for GeoJSON, and we should consider some other example, but such examples doubtless exist, which is why this is a compelling feature.

@iherman
Copy link
Member

iherman commented Sep 5, 2018

I guess we can all agree that this is (a) technically doable (b) it may require normalization of the literal (at least optionally) and (c) it is not fundamentally different from the XML and HTML datatypes. (E.g., if we do have a standard for RDF canonicalization at some point, that standard must address the issue of literals and their normalization (or not), and the issues raised by @cwebber are also genuine problems for HTML literals.)

However. I guess we are back to our design principles set out at the beginning of the WG's life. We should not do this just because we can; we should have proper use cases, see relevant section. I cannot judge whether GeoJSON is a use case or not.

@Fak3
Copy link
Contributor

Fak3 commented Sep 5, 2018

There's a separate issue for the list of lists feature beyond #7 that was already accepted to be part of 1.1.

@azaroth42 is there a github issue for the list of lists support? If not, may I create one?

@gkellogg
Copy link
Member Author

gkellogg commented Sep 5, 2018

@Fak3 The lists of lists issue is #36, and it was closed as support was added for recursive lists.

@cwebber
Copy link

cwebber commented Sep 5, 2018

For RDF canonicalization, such values would be treated just as other datatyped literals. Part of the RDF serialization aspects should include whitespace normalization, which is fairly standard in JSON, so I don't really appreciate why things such as RDF Dataset Normalization and signatures would be at any disadvantage.

It isn't that simple. Whitespace is not the only issue. We will probably have to support something like this json canonicalization spec or something. That's a lot of extra work.

There's also a huge risk that people will open this loophole much, much wider than is anticipated, marking giant swaths of content as json-only. Yeah, I guess that's true for XML too, but to be honest no sane person could operate on XML-RDF as if it were real XML and have things survive... it was an RDF serialization format and little more. Here people are actually working with json-ld as if it were normal json and getting reasonable RDF interop. There are pain points occasionally, and we should try to remedy those, but I think this is opening an escape hatch that a good number of people will jump straight through.

Careful about rubbing this lamp... I think fulfilling this wish will have more side effects than anticipated and may undo a lot of the goals of json-ld. -1 from me.

@dlongley
Copy link
Contributor

dlongley commented Sep 5, 2018

I share the same concerns as @cwebber and @BigBlueHat.

@azaroth42
Copy link
Contributor

Re canonicalization (or even just whitespace normalization) ... can someone describe the issue and the risk here? If one implementation serializes to a string "{\"foo\": 1}" and another serializes to a string "{ \"foo\" : 1 }" ... what's the problem? They're not identity providing such that they need to be compared, they're just values.

@cwebber
Copy link

cwebber commented Sep 5, 2018

@azaroth42 Those would end up being two different signatures with linked data signatures. Without canonicalizing the json exactly the same way every time, LDS will break.

@gkellogg
Copy link
Member Author

gkellogg commented Sep 5, 2018

@cwebber said:

It isn't that simple. Whitespace is not the only issue. We will probably have to support something like this json canonicalization spec or something. That's a lot of extra work.

Good point, as by the time we see the data, its in a parsed form, and we can't depend on specific representation of numbers, for example.

At this point, I'd say that the work should be put on hold, certainly pending an important use case.

@azaroth42
Copy link
Contributor

We can defer, but I would like to note that canonicalization and LDS are explicitly out of scope of the WG, per the charter: https://www.w3.org/2018/03/jsonld-wg-charter.html

@gkellogg
Copy link
Member Author

@cyberphone, can you comment on the status of
https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-05? If we are to support JSON literals, it would be best to canonicalize them. When is this expected to become an RFC? How stable is the document? Are there other specs which are normatively referencing the spec?

@iherman part of testing requires an RDF transform and using dataset isomorphism. At that point, the precise lexical representation of JSON literals becomes important. Certainly, this could be left out of the spec, and used in test-suite instructions, but for many reasons, setting on a canonical form for JSON literals is going to be important, if we can overcome the normative citation issues.

@gkellogg
Copy link
Member Author

My Ruby version for JSON canonicalization: https://github.com/dryruby/json-canonicalization.

gkellogg added a commit to w3c/json-ld-api that referenced this issue Mar 20, 2019
gkellogg added a commit to w3c/json-ld-wg that referenced this issue Mar 20, 2019
@cyberphone
Copy link

@gkellogg It is great to see a sixth incarnation of the proposal!

Regarding progress the technical issues have (AFAICT...) been properly identified; the problem is rather that a bunch of people still consider canonicalization as pure stupidity. OTOH, it seems that none of the current Open Banking APIs has bought into the Base64Url-concept either.

FWIW, I will do a short presentation
https://cyberphone.github.io/ietf-signed-http-requests/hotrfc-shreq.pdf
at IETF-104 in Prague which shows how you can apply JCS on a mainstream application.

@iherman
Copy link
Member

iherman commented Mar 22, 2019

This issue was discussed in a meeting.

  • RESOLVED: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized
View the transcript JSON datatype
Rob Sanderson: link: #4
Rob Sanderson: PR: w3c/json-ld-api#72
Rob Sanderson: we also have discussed the JSON datatype on github
… Gregg, you’ve been the most involved (as always)
… could you summarize?
Gregg Kellogg: the issue comes down to representation
… if you are going to describe both the lexical and value space
… somewhat like HTML
… the lexical space cannon be guaranteed
… the JSON literal quality is lost when its turned into a native representation
… you loose the original key ordering, key escaping, and lexical numerical representations
… so it seems we will need to canonicalize
… which has been referenced in the issue
… it’s sadly not as close to done as I’d hoped
… and we can’t count on it being final in time
… so, do we care if two implementations use the same canonicalization
… so we have done some things about do we use Integer or Doubles for numbers
… so when you’d turn the JSON literal into RDF (in the toRDF space), we do need to say something about that at least
… and the elimination of whitespace
… and the ordering of keys
… I think that can be done
… there’s a lot of detail in that, but we should be able to reference ECMAScript for this
… or we could do it ourselves
Rob Sanderson: last time we talked about the canonicalization issue
… we also talked about HTML being not easily canonicalizable
Gregg Kellogg: HTML is a little different
… they will preserve order, and whitespace
… so you do have the opportunity return to that result
Ivan Herman: well, attribute order and things are not covered
… this would be a problem if you were to attempt to sign an HTML document
Gregg Kellogg: if we weren’t in an era when signatures weren’t as important as they are now, then maybe we wouldn’t need to care about this so much
Rob Sanderson: so, is there a JSON-LD document that could include a JSON “native” data type that also needs to be signed
… so if the only use case is to import GeoJSON
… do we need to worry
Ivan Herman: I have spent time on this issue with others
… aside from the canonicalization problem
… if we do make a native JSON type, we will have to put it into some namespace–rdf: or jsonld:
Rob Sanderson: +1 to RDF namespace
Ivan Herman: if we do that, we’ll have to write the SWIG mailing list, to announce the new datatype, etc.
… we can do this as part of our document
… the other problem is
… I did put a reference in the issue for the rules we have to follow when we point to something normatively
… my first reading is that unfortunately, this JSON canonicalization specification cannot be referred to normatively
… the second problem is bringing our own canonicalization into our document
… if we do that, I can safely say the Director would say no to that
… so, we can’t just take an IETF spec and put it into a W3C spec
… all of these are admin problems
… But I am still not convinced that we need the canonicalization as a normative part of our spec
… we could say that someone else may do this and reference forthcoming work
… but when the issue is that we have a JSON portion we want to store in RDF
… we can state that the only expectation is that [the same processor will produce the same output]
… none of the arguments that I heard is that canonicalization needs to be normative
Pierre-Antoine Champin: http://tinyurl.com/y2gmzxf8
Pierre-Antoine Champin: I was wondering about this example
… there’s an Integer in the non-canonical form
… would that be canonicalized or not?
Gregg Kellogg: yes, that would be canonicalized
… I don’t know any processors that would properly serialize that with a leading zero
… if you’re going to the internal representation
… it is the number 42
… some might do 42.0
… or 42E+0
… that would be fine, but I don’t think most JSON serializers would do that
Pierre-Antoine Champin: for the moment, we know how to sign this thing
Dave Longley: I think this falls into the same category as HTML
… it’s a string in the JSON; it’s not native HTML
… or a native number in the example’s case
… if we’re storing stuff in a string, then store it as a string
… but people want a native JSON object in their JSON
Pierre-Antoine Champin: but if you remove the leading 0 you don’t get the same signature
… so I’m assuming that the signature is dealing with the order or absence of order in the object when signed
… so if the object was a native JSON object, then it would already benefit
… and regardless we already have this problem with other string-expressed literals
Rob Sanderson: if you instead make it value 42.0
… since no one really serializes as 042
… whatever you change here will change the signature
… even though it will canonicalize as something different
Dave Longley: I disagree
Rob Sanderson: what do you disagree with?
Ivan Herman: I think in these examples, the current JSON-LD specification doesn’t say anything about what you put in strings
… we don’t suggest any sort of mini-canonicalization for things like this
… having built-in canonicalization for the native JSON representation
… would be a departure from what we’ve done previously
Dave Longley: my response to all that is that we have very consistent rules about moving non-string data into strings
… so we do have those sorts of specifications
… from a native JSON value into a string
… this same thing would exist for native JSON objects
… for things that come in via a string, those will stay as whatever that string is
… so strings have no issue
… so if you take pchampin’s example, and change it to a real number: 42
Gregg Kellogg: 42, 42.0, 42.0E0, 4.2E+1 are all the same number
Dave Longley: and if you put that in the playground, check the nquads tab, you’ll find the same number
Ivan Herman: yep I acknowledge that
Rob Sanderson: maybe then it’s the playground which is at fault
… I put in several examples, and the signature changes for all of these different 42’s as an integer
Dave Longley: you’re looking at the RSA signature, so you’ll see it change constantly
… because that injects random data
… what you need to look at is the N-Quads or normalized tabs
… the data there stays the same
Gregg Kellogg: this is in the data round tripping section
Gregg Kellogg: so, imo, if we create a datatype for JSON
… before there is a canonicalization for it
… then we’re in danger of doing things too early
… ultimately we need to deal with a canonicalized JSON
Pierre-Antoine Champin: +1
Gregg Kellogg: so the best thing we can do right now is nothing
… and defer this until there is a canonicalized form
… otherwise whitespace, object ordering, etc are all variable
… and the literals really won’t be worth doing any lexical representation is important
… better not to do anything until a canonicalization spec exists
Ivan Herman: my take would be milder
… the GeoJSON example doesn’t care about canonicalization
Rob Sanderson: +1 to ivan
Ivan Herman: with the canonicalization things differed
… and state that this feature is not recommended
… so we differ it, and if/when the canonicalization becomes standard or whatever, then we at that point suggest that that spec gets used
Rob Sanderson: it would be better to have a JSON datatype and state that later we’ll do canonicalization
Dave Longley: let’s provide rules for how to produce the JSON string that match the draft – but that you can do something else and be very clear it’s preferred that everyone do the same thing
Rob Sanderson: so we should start with JSON datatypes, and just suggest that you can’t sign these
Jeff Mixter: +1 to ivan and azaroth
Gregg Kellogg: if we don’t do canonicalization now, we don’t seem to be prevented from doing it later
… if we end up as a living spec, then we could do it that way
… and we could also suggest that for testing purposes it is always canonicalized
Rob Sanderson: a warning or a note?
Proposed resolution: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized (Rob Sanderson)
Rob Sanderson: I’d suggest a warning
Gregg Kellogg: +1
Jeff Mixter: +1
Ivan Herman: +1
Rob Sanderson: +1
Simon Steyskal: +1
Pierre-Antoine Champin: +1
Tim Cole: +1
Dave Longley: +0
Benjamin Young: +0 still have concerns about eager misuse
David I. Lehn: +0.5
Jeff Mixter: I echo bigbluehat concerns but I also have very valid reasons to add JSON to RDF data.
Dave Longley: +1 to everything Benjamin is saying … but that we should really also have JSON literals … but they should also all be converted to the same strings in processors :)
David Newbury: +1
Resolution #3: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized
Dave Longley: JSON literals can be an escape hatch but ONLY an escape hatch.

gkellogg added a commit to w3c/json-ld-api that referenced this issue Mar 24, 2019
gkellogg added a commit to w3c/json-ld-api that referenced this issue Mar 25, 2019
gkellogg added a commit that referenced this issue Mar 25, 2019
gkellogg added a commit to w3c/json-ld-api that referenced this issue Mar 26, 2019
gkellogg added a commit that referenced this issue Mar 26, 2019
@gkellogg gkellogg added propose closing and removed defer-future-version Defer this issue until a future version of JSON-LD labels Mar 28, 2019
@azaroth42
Copy link
Contributor

Agree done, closing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests