Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add restrictions on how to pull schemas over HTTP #29

Closed
awwright opened this issue May 13, 2016 · 14 comments
Closed

Add restrictions on how to pull schemas over HTTP #29

awwright opened this issue May 13, 2016 · 14 comments

Comments

@awwright
Copy link
Member

awwright commented May 13, 2016

Misbehaved clients might pose a problem if they pull a schema over the network every time it's being validated against, when it's instead possible to cache for a long period of time. Server owners won't like JSON Schema very much if this becomes a problem.

JSON Schema does not rely on or need HTTP, even if schemas are referenced with an http or https URI. However, in some hypermedia cases, it is still useful to download schemas over the network.

For these cases, add a section about behavior of clients when they make HTTP requests:

  • Clients SHOULD set or prepend a User-Agent header specific to the JSON Schema implementation, that is not merely the HTTP library being used (if any). e.g. Instead of User-Agent: curl/7.43.0 use User-Agent: so-cool-json-schema/1.0.2 curl/7.43.0. Since symbols are listed in decreasing order of significance, the JSON Schema library name/version goes first, then the more generic HTTP library name (if any)
  • Clients SHOULD set a From header so that server operators can contact the owner of a potentially misbehaving script.
  • Clients SHOULD observe caching headers and not re-request documents within their freshness period
@Relequestual
Copy link
Member

I'm not sure the first one should be MUST. I would suggest SHOULD.

@awwright
Copy link
Member Author

Yeah, also added cache requirement

@awwright
Copy link
Member Author

Added such a section in a59f5c9

@Relequestual
Copy link
Member

I agree, although I'm concerned you comotted this directly to master... it's a change in the specification...

@awwright
Copy link
Member Author

@Relequestual It's almost the exact same language written here, but I'll do a PR next time too I guess

@Relequestual
Copy link
Member

I better explained why I'm unhappy about commits directly to master in at #4 (comment), which you have seen, and I'm now reading your reply =]

@handrews
Copy link
Contributor

handrews commented Sep 23, 2016

@Relequestual , @awwright : I confess that I do not understand the user-agent requirement. A per-application user agent makes no sense to me, and would be very confusing to implement in a client system that is intended for re-use. I may just not be thinking about this the right way, could you elaborate on this a bit?

To me, having applications specify their own user-agent string is like having links in an HTML page add user-agent headings. The browser is the user-agent, not the "application" that is using the browser to manage its HTTP interactions.

@awwright
Copy link
Member Author

awwright commented Sep 23, 2016

@handrews Most implementations should not be downloading schemas over the network.

Of those that are, the User-Agent header is still specific to the validator.

It also suggests that the person running the program set an email in a From line.

This is just re-iterating all the stuff that already exists in HTTP, really.

And it's just a suggestion, there's no requirement-level language.

@handrews
Copy link
Contributor

Thanks- I missed that it was specific to the validator rather than the application. That makes waaaay more sense. And yes, totally in favor of SHOULDs that clarify how to apply HTTP headers and the like.

I am a little puzzled by the expectation that clients shouldn't download schemas. In an open-ended system, how else will they make use of representations that use previously unseen schemas? (probably a discussion for somewhere other than a closed issue). I do expect them to use describedBy rather than profile to avoid overwhelming the canonical schema host.

@awwright
Copy link
Member Author

This thread is fine for discussion...

@handrews Most uses of JSON Schema I've seen so far are just for declaratively stating things about JSON values so you don't have to do it in code... This use would be, for instance, an HTTP server ensuring that an incoming PUT request is valid. The JSON Schema is already a part of the program, and loaded into memory.

The other side of the request can also make use of JSON Schema too, however, if the HTTP server publishes the schema and links to it. Frequently, this will be a script or bot or something that will also have the schema pre-loaded into memory.

But not all of them will. I do imagine programs that are generic, for a wide variety of servers and APIs, that need to download a schema before you interact with them. These would function similar to HTML user-agents. I haven't seen very many of these, though.

There's probably a bunch of other use-cases that I'm unaware of, but as far as downloading/storage of JSON Schemas is concerned, they still probably can fit into one of these three classes.

@handrews
Copy link
Contributor

@awwright : Ah, OK this helps me understand your perspective on this. While I agree with your three use cases, my perspective comes from designing and writing (with other collaborators) just such a user-agent. It probably won't take over the world :-) but it is used extensively within the product suite for which it was designed, and could be used independently.

sleepwalker and reschema are based entirely on the approach of both client and server using JSON schema (I'm just linking to them to prove they exist- no need to dig into them unless you want to).

ReSchema is (as you might guess) the schema-validating library, used on both the client and server. Sleepwalker is the client (the server was not open sourced). We used JSON Schema Draft 04, plus '$merge' support and a custom alternative to Hyper-Schema that we (mostly the lead architect) designed after I was unable to get buy-in on Hyper-Schema as it stands in Draft 04. This is where many of my yet-to-be-filed proposals around HyperSchema will come from (although as seen in email on the google group, I'm no longer in favor of $merge).

Sleepwalker won't do anything without a schema. It runs all requests and responses through validation (although you can turn that off for performance vs reliability). It fulfills all URI templates from instance data (although you can also supply your own values if you want to, for instance, filter a collection with user-supplied search terms). No code in this system other that Sleepwalker ever constructs URIs. Given a relationship between resources A and B, a link defined on B can be executed from A by (through the schema) automatically mapping instance data from A into B's template variables. All interactions through the APIs are hypermedia-driven.

As you work with instance data, it wraps the python dict/list-based data structure and as you index into object properties or array elements, it uses ReSchema to track the equivalent position in the schema. This allows proper handling of links from within instance data (e.g. links out of every element of an array, such as "full" links for entries in a collection representation). This is where the notion of "indexing into a schema" that I elaborated on in some other issue came from. It's also why I view Relative JSON Pointer as absolutely essential- the entire project would have been impossible without it.

So when I talk about schemas driving every aspect of an open-ended set of hyper-media driven APIs, I am not speculating about future projects or possibilities. This is a system that has been running for a few years now in an industry-leading product suite (although if it was only released about a year ago).

The one major piece that we did not get done before shipping this out (after which the company was bought and I left, so I have no idea what they're doing with it now) was connecting the schemas up to the instances with profile and describedBy. Developers either had the schemas provided to them with the libraries, or grabbed them from the canonical location themselves. But the intention was to allow for/encourage downloading as the ecosystem grew because this was (at least at one point) intended to work across the company's entire product line, which (as it consists of physical/virtual appliances) also needed to support many versions in the field. So provisioning a client with every possible schema was pretty much doomed to failure in the long run- it had to support downloading.

I can't say much about my next project yet as everything is in a very early stage, but my personal intention is to take a similar approach but close the gaps (like the profile/describedBy links) and be fully standards-based (hopefully by nudging JSON Hyper-Schema into a usable state). The target API ecosystem will (hopefully) be even more open-ended than the appliance management environment at Riverbed.

For the most part, if I'm pushing for something with JSON [Hyper-]Schema, it's because I have a clear, concrete need for it to achieve those goals. Basically, I'm writing exactly the kind of user-agent you mention (fortunately, it will be a long-range project, not something for next quarter). We got very close to it with Sleepwalker and ReSchema. The biggest challenge I see for finishing the job is having a standard that fully supports the necessary functionality. I could make something up myself, or expand the Hyper-Schema replacement implemented by ReSchema, but I'd really rather not.

I don't say this out of self-promotion, and I can delete this comment if it's obnoxious :-)
I just keep seeing variations of "I don't think that actually happens" and thinking "of course it happens, I did it two years ago and you can see it on github."

@Relequestual
Copy link
Member

It looks like, from my limited understanding, that this issue doesn't cause any harm to the way you described using json-schema over the network. For frequency use, caching headers should be observed. In instances where this is an issue, exactly how that works will be worked out, as needed, if needed.

I don't understand how the resolution of this issue being in agreement with the suggestion, would have an impact on the use case you explained. I'm probably missing something, in which case can you please spell it out for me =]

I THINK I understand your point about hyper-schema not being quite how you'd like it. Could you file a new issue and explain the current situation, the use case, and functionality you'd like to solve such use case. prose before or after is fine, but it does, at least for me, make these issues a little tricker to comprehend.

@handrews
Copy link
Contributor

@Relequestual I don't think there's any change that needs to be made to the resolution here. I was misunderstanding part of the guidance, and @awwright cleared that up.

I will be filing a series of hyper-schema-related issues, I was hoping to get the old repo's issues ported and closed first (although it now seems like there is no one who is both willing and able to close old repo issues, which is immensely frustrating and demotivating).

@Relequestual
Copy link
Member

@handrews Right. Great!

Mmmm demotivation runs through this project. I bring you item A: json-schema-org/json-schema-org.github.io#1

Agreed. It's really gutting that we can't find anyone who can perform the actions required on the old repo. Github are unwilling to help without us having some sort of trade mark or some such. They can't rule out the "owner" might come back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants