Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible oversights in processing when @type is not an IRI? #446

Open
trwnh opened this issue Nov 18, 2024 · 1 comment
Open

Possible oversights in processing when @type is not an IRI? #446

trwnh opened this issue Nov 18, 2024 · 1 comment

Comments

@trwnh
Copy link

trwnh commented Nov 18, 2024

JSON-LD 1.1 requires @type to be an IRI...

From 3.5 Specifying the Type: https://www.w3.org/TR/json-ld11/#specifying-the-type

In Linked Data, types are uniquely identified with an IRI.

From 9.16 Keywords: https://www.w3.org/TR/json-ld11/#keywords

The @type keyword MAY be aliased and MAY be used as a key in a node object or a value object, where its value MUST be a term, IRI reference, or a compact IRI (including blank node identifiers).

So it seems pretty clear to me that the intent and also normative requirement is for every entry in @type to be an IRI, ultimately.

...but there are documents in the wild that don't follow this...

HOWEVER: There are documents, producers, and other specs that currently exist that use raw string literals within type which is aliased to @type by a @context declaration. Notably, ATProto has this to say: https://atproto.com/specs/did#did-documents

The PDS service network location for the account is found under the service array, with id ending #atproto_pds, and type matching AtprotoPersonalDataServer

At the same time, there is no ATProto-specific context document or declaration: https://web.plc.directory/did/did:plc:ewvi7nxzyoun6zhxrhs64oiz

{
  "@context": [  // note there is no atproto context
    "https://www.w3.org/ns/did/v1",
    "https://w3id.org/security/multikey/v1",
    "https://w3id.org/security/suites/secp256k1-2019/v1"
  ],
  "alsoKnownAs": [
    "at://atproto.com"
  ],
  "id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz",
  "service": [
    {
      "id": "#atproto_pds",
      "serviceEndpoint": "https://enoki.us-east.host.bsky.network",
      "type": "AtprotoPersonalDataServer"  // this is a string literal, not an IRI
    }
  ],
  "verificationMethod": [
    {
      "controller": "did:plc:ewvi7nxzyoun6zhxrhs64oiz",
      "id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto",
      "publicKeyMultibase": "zQ3shunBKsXixLxKtC5qeSG9E4J5RkGN57im31pcTzbNQnm5w",
      "type": "Multikey"
    }
  ]
}

...but it doesn't seem to cause any errors?

I would expect this to be disallowed by the JSON-LD spec, but the JSON-LD Playground seems to be okay with it?

The following non-IRI value for @type shows up in expanded form:

        "@type": [
          "AtprotoPersonalDataServer"
        ]

When compacting against an empty context, it also seemingly works:

{  // note there is no context
  "@id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz",
  "https://w3id.org/security#verificationMethod": {
    "@id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto",
    "@type": "https://w3id.org/security#Multikey",
    "https://w3id.org/security#controller": {
      "@id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz"
    },
    "https://w3id.org/security#publicKeyMultibase": {
      "@type": "https://w3id.org/security#multibase",
      "@value": "zQ3shunBKsXixLxKtC5qeSG9E4J5RkGN57im31pcTzbNQnm5w"
    }
  },
  "https://www.w3.org/ns/activitystreams#alsoKnownAs": {
    "@id": "at://atproto.com"
  },
  "https://www.w3.org/ns/did#service": {
    "@id": "#atproto_pds",
    "@type": "AtprotoPersonalDataServer",  // this is a string literal, not an IRI
    "https://www.w3.org/ns/did#serviceEndpoint": {
      "@id": "https://enoki.us-east.host.bsky.network"
    }
  }
}

So, what gives?

Obviously the most correct thing here is to ask the ATProto people to provide a namespace and/or context document for their extension terms, but I'm wondering if the JSON-LD processing algorithms should detect these non-IRI types and possibly give a warning or error or otherwise elide them... or is it OK for these "non-IRI types" to exist?

Note that the RDF conversion to N-Quads (in the playground example above, for instance) will not output the @type of the single item in did:service.

This is partially due to a different problem that I'm not sure whether I should file an issue about or not -- relative IRI references for @id. In short, it seems like #atproto_pds is not automatically picking on the "id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz" of the top-level object.

No matter; we can change the service node's identifier from #atproto_pds to did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds and two additional quads get output:

<did:plc:ewvi7nxzyoun6zhxrhs64oiz> <https://www.w3.org/ns/did#service> <did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds> .
<did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds> <https://www.w3.org/ns/did#serviceEndpoint> <https://enoki.us-east.host.bsky.network> .

But we don't get a quad for rdf:type. So this backs up the notion that the "non-IRI types" are at least invalid RDF.

The question that remains: are they invalid JSON-LD as well? If not, should they be? If yes, what is to be done when processing them?

@davidlehn
Copy link
Contributor

I think you've described the correct behavior of everything. It's perhaps not as intuitive as it should be. The relative IRIs are not errors normally, and they will get through expansion, but when going to n-quads, will be dropped. If you want to see those types the issues on the JSON-LD Playground, go to the options tab and enable "safe" mode. Or use the canonized tab. In this case, it will report relative @id reference safe mode errors for the #atproto_pds id and the AtprotoPersonalDataServer type.

I'm not sure how that id is used, but as you note, it should probably be prefixed with the top level id as did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds. That's how DID documents do things and how the verificationMethod in same data above does it. It does look like that type should be defined in a context somewhere with a full URL.

jsonld.js (used on the playground) added the "safe mode" to be more strict about acceptable JSON-LD. It raises errors in many places where the algorithms would otherwise drop data or suggest warnings. This is important when the data is canonized and digitally signed. In such cases, dropped data (and other issues) cause serious issues. It's not the default on the playground due to the official processing not having that behavior. One day hopefully the playground can make it more clear how to use that feature. "safe mode" itself needs to be written up (probably by me) and the community can refine it so it's more generally available in tooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants