Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indirect anchor through a schema reference or Relative JSON Pointer #381

Closed
handrews opened this issue Aug 30, 2017 · 6 comments
Closed

Comments

@handrews
Copy link
Contributor

handrews commented Aug 30, 2017

[EDIT: I now thing that "anchorPointer" is a simpler and better option, you may want to start the "anchorPointer" proposal instead of the more complex and incomplete "indirectAnchor" described in the first few comments.

This is split off from #140, which is now just tracking the simple "anchor" case of setting the context URI directly (resolving references against the current context base).

I'm going to call this keyword "indirectAnchor" although I'm not attached to the name. Quotes from the #140 are edited to use this keyword name.

@dlax asked:

Can't we say that the "indirectAnchor"'s value must be a URI reference to a (sub)schema and has the effect to override the context with the instance pointed by referenced (sub)schema?

...and in response to my concern that this would be too convoluted for users, he brought up the following good point:

I find this quite symmetrical with how Hyper-Schema's LDO are currently defining the context which is already deviating from RFC5988 because the link is not on the instance. Accordingly, I don't think it'd be more convoluted to assume that anything related to link's context (here, the anchor) should be considered through the instance's schema indirection.

The question at this point is how to define such an approach where the point in the instance to which the referenced schema refers is unambiguous, given that a single subschema may validate against numerous parts of a JSON document.

I'll take a pass at this in a separate comment.

@handrews
Copy link
Contributor Author

It may be difficult or impossible to solve this completely, but I think the most obvious use cases may be manageable, and perhaps that's enough to gather feedback in the next draft. The canonical use case is collections and items.

Here we have a collection, with its own self link, and an array of ids of collections members. Each collection member defines its own self link (to its complete representation), item link (connecting the collection to the complete item representation), and collection link (connecting the item to the containing collection).

In practice, I would probably only use the item link, although I'm still thinking this pattern through. But having the three links illustrates several cases nicely, and they are all reasonable links to have.`

{
  "$id": "https://schemas.example.com/stuff",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "id": {"type": "integer", "minimum": 1}
    },
    "links": [
      {
        "rel": "self",
        "href": "/stuff/element/{id}"
      },
      {
        "rel": "item",
        "href": "/stuff/element/{id}"
      },
      {
        "rel": "collection",
        "href": "/stuff/element"
      }
    ]
  },
  "links": [
    {
      "rel": "self",
      "href": "/stuff"
    }
  ]
}

The collection's "self" link (the one in the root schema's "links" array) is straightforward. The context is the collection document that validates against the root schema, and the target is the same resource. No need for any sort of anchor change.

The id-only item's "self" link is also straightforward. This LDO produces an actual link for each id-only object in the instance array. That instance id-only object is the link context, and the single item resource is the target. Again, no need for any sort of anchor.

The id-only item's "collection" link has the same context (a specific id-only object in the instance array), and its target is the entire collection resource.

But the id-only item's "item" is not possible to express. The target resource is the same as the id-only item link's "self" link. But the context is the entire collection. The link says that the target resource is an item within the context, which is a collection. It's the inverse of the "collection" link, which works because the default context is correct, and the target resource does not require a fragment. So what we really need for that link is something like:

{
  "indirectAnchor": "#",
  "rel": "item",
  "href": "/stuff/{id},
}

The context resource is pretty easy to figure out. It's whatever instance validates against the root schema (which was indicated by a JSON Pointer fragment), which in this example is the entire collection. It's unambiguous.

However, it does introduce an other question: where does value for the {id} template variable come from? Does it still come from the immediate instance, or is it resolved from the context instance?

My first thought was that it resolves from the context, but there's no obvious way to do that. Even if we accept using dot-separated notation to access object members and array indices (which is a controversial point on its own), since the context instance is an array, we have no way to reconstruct which array index we started from. We could invent some preprocessing symbol to indicate "whatever index gets you back to the original link context", but that's starting to get complicated. Let's set it aside for the momment.


What about nested arrays? (The examples get increasingly contrived, please bear with me)

{
  "$id": "https://schemas.example.com/stuff",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "id": {"type": "integer", "minimum": 1},
      "inner": {
        "$id": "#innerCollection",
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "id": {"type": "integer", "minimum": 1}
           },
           "links": [
             {
               "indirectAnchor": "#innerCollection",
               "rel": "item",
               "/innerstuff/{id}"
            }
          ]
        }
      }
    }
  }
}

In this case, our indirect anchor refers to a plain name fragment, because it's not as obvious as the root pointer. The way I've written it, we could still use a pointer, but if you were defining the contents of "inner": {...} in a shared schema document's "definitions", and the using "$ref" to pull it in, then the author of the referenced schema can't know the path from which it will be referenced. So we declare a plain name fragment which is position-independent.

That's ll well and good, but now we have a similar problem with figuring out the right id to use if we are resolving from the adjusted context above. In this example, #innerCollection validates against all items inside of the outer collection. How do we determine which part of the instance is intended to be the new context? We could say that an indirectly identified context MUST either be unambiguous, or in the path from the root of the document to the default context.

That is, you reference an item that is lateral to your own location in an array.

While that is a significant limitation, I don't have any real use cases that are affected by it. My main use case is moving the context to an enclosing entity (typically the collection), which would work with this approach.

And of course we still have the same template variable resolution problem as the single-level-collection case.


There are other possible use cases that put the new context in other relationships with the default one- a sub-instance, or something in a not obviously related part of the instance (under a lateral object property). But I think even just solving the "item" problem would be useful. I don't see a good solution yet.

@dlax
Copy link
Member

dlax commented Aug 30, 2017

However, it does introduce an other question: where does value for the {id} template variable come from? Does it still come from the immediate instance, or is it resolved from the context instance?

My first thought was that it resolves from the context, but there's no obvious way to do that.

My first thought was to use the immediate instance :)
At least for the rel="item" link inserted in a collection (array) schema, it makes sense.

@handrews I need to further think about the whole thing; thanks for putting this up!

@handrews
Copy link
Contributor Author

@dlax for rel="item" immediate instance definitely makes more sense.

But what if there was a collection-level attribute that was used in the URI? This is actually a problem even without the nesting. Here's the instance (so "items" is a property name, not a schema keyword):

{
  "meta": {
    "owner": "handrews"
  },
  "items": [
    {"id": 1},
    {"id": 2}
  ]
}

If the per-item rel="item" link is:

{
  "indirectAnchor": "#",
  "rel": "item",
  "href": "/{meta.owner}/things/{items.*.id}",
}

How should that work? I stuffed a "" in the variable name to indicate "whichever item we started under", but... is that reasonable? Will people understand it? Is it a reasonable thing to ask implementations to handle? Keeping in mind that with nested arrays, you might have multiple "" at multiple levels (although I suppose each one is no harder to handle because of that, you just have to resolve in the right order).

This is a little outside of "indirectAnchor" but closely related so let's keep thinking about it here. We can split off more issues if we figure out that part of this should be solved separately.

@handrews
Copy link
Contributor Author

This is a little outside of "indirectAnchor" but closely related so let's keep thinking about it here. We can split off more issues if we figure out that part of this should be solved separately.

Of course, as soon as I wrote that and went to work on something else, I figured out a good proposal for it. Filed as #382, so let's address it there. I may edit down the comments above to make this issue more focused and accessible.

@handrews handrews changed the title Indirect anchor through a schema reference. Indirect anchor through a schema reference or Relative JSON Pointer Aug 30, 2017
@handrews
Copy link
Contributor Author

handrews commented Aug 30, 2017

Having just written up "hrefPointers" (#382), and looking back over how hard it is to explain "indirectAnchor" and how many cases are still unclear, I think it may be better to revisit the "anchorPointer" idea.

I feel like "hrefPointers" makes a pretty compelling case for using Relative JSON Pointers (RJP) in Hyper-Schema by solving a significant problem without resorting to either "$data" or complex pre-processing. And if we accept "hrefPointers" then "anchorPointer" will work in the same way, building a Hyper-Schema usage pattern rather than a random collection of keyword behaviors.

RJPs are not encoded in URI fragments, and are not directly used to construct a new URI. This is fine, as the default context resource in an application/json document does not itself have a URI unless it is the root of the instance. In both the existing default context rules and "anchorPointer", we simply provide the means to locate the context in the instance. The instance's media type determines whether there is any fragment that can be used to identify it in a URI.

Of course, in a +json media type that supports JSON Pointer fragments, constructing either the default or adjusted context URI is straightforward.

"anchorPointer" would NOT affect the "href" URI Template resolution starting point, which would be managed separately in "hrefPointers"

So, how does it work? "anchorPointer" would solve the above use cases like this (trimmed down to show only the links that need it):

{
  "$id": "https://schemas.example.com/stuff",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "id": {"type": "integer", "minimum": 1}
    },
    "links": [
      {
        "anchorPointer": "",
        "rel": "item",
        "href": "/stuff/element/{id}"
      }
    ]
  }
}

In this case, we know that the desired context is the root document, so we use a (string form, not URI form!) root JSON Pointer, which is the empty string.

{
  "$id": "https://schemas.example.com/stuff",
  "items": {
     "properties": {
      "id": {"type": "integer", "minimum": 1},
      "inner": {
        "items": {
          "properties": {
            "id": {"type": "integer", "minimum": 1}
           },
           "links": [
             {
               "anchorPointer": "1",
               "rel": "item",
               "/innerstuff/{id}"
} ] } } } } }

In this case, the "collection" is just the array immediately containing the id-only objects. We can't use an absolute JSON Pointer because there is no wildcarding or other way to point to the correct inner array.

Instead we go up one level, which is an RJP of "1", to indicate that the array is the context. As with the root absolute JSON Pointer, there is no trailing slash (a trailing slash actually indicates the object member whose name is the empty string).

In the first example, if we'd wanted to use an RJP, it would have been "2", as the context is the object containing the array that contains the id-only objects (two levels up, instead of the array which is one level up).

As noted in #382, we do not need to revive RJP as its own draft to use this- we can put the syntax in an appendix, and then if we get a lot of feedback that it should be its own RFC, we can worry about that then.


I really feel like "anchorPointer" and "hrefPointers" are by far the most elegant and consistent solutions to these related problems. The only point that has been really controversial in the past is adding RJP syntax, but doing it as an appendix under our control seems quite reasonable.

I am open to an alternative syntax with the same functionality, but with our heavy use of JSON Pointer, RJP seems like the best approach. We used it when I was at Riverbed with many engineering teams, as can be seen in their public schemas. Engineers did not have trouble with the concept.

@handrews handrews added this to the draft-07 (wright-*-02) milestone Aug 30, 2017
@handrews
Copy link
Contributor Author

Merged #385

@ghost ghost removed the Priority: Critical label Sep 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants