Defining the path forward #282

gregsdennis · 2022-12-06T23:31:03Z

gregsdennis
Dec 6, 2022
Maintainer

DECISION: We likely won't be releasing an "interim" spec. Discussions elsewhere and the decisions from them point us toward just progressing on the next version.

In leaving IETF and their publication processes, we find ourselves needing to define our own. Along with this is a concern from the community regarding the stability of the specification.

Stability

On the surface, users wanting "stability" means that they really just want a spec that's not marked as a "draft" so that they can be confident referencing it from their own specifications and other work. The "draft" label is really just part of the IETF process that indicates a specification is still undergoing development, so that will definitely be going.

However, we (the JSON Schema core team) recognize that stability goes much deeper than a mere label on the specification document. So, what does it mean for the specification to be "stable?"

JSON Schema at its various versions is already used in production systems globally. The problem for those who maintain those systems is that one version may contain features that are incompatible with other versions, either through deprecation or updated processing rules. Migrating schemas to newer versions comes with different sets of challenges depending on the current and target versions. As such it's difficult to know what needs to be updated during such migrations, and often users simply opt to leave things as they are rather than risk breaking something that's already working. The migration challenge usually presents itself when the tooling they're using falls out of maintenance or drops support for the JSON Schema version they're using, and they're forced to update. Recently, the alterschema tool has been created to aid with this migration, but the necessity of such a tool is merely an indicator of a larger problem: JSON Schema contains instabilities.

Besides any publication process changes, the looming problem of this instability needs to be addressed.

Options

So far, two primary proposals have been put forward:

@jdesrosiers' Specification Development Life Cycle (SDLC), which, among other things, aims to incorporate backward- and forward-compatibility guarantees into a new feature management system based on stability stages.
@awwright's JSON Schema Extensions proposal, which, while not completely incompatible with the SDLC proposal, does present a few changes to how we think about schemas while also discussing how we can use those new ideas to guarantee backward- and forward-compatibility.

Most feedback regarding these proposals has been generally positive, but what has been lacking is a path forward. If these proposals shed light on some end state, how do we get there?

The Path Forward

Before we can know that, we need to know where we stand currently. If we were to break down the specification into a list of features, what can we declare as stable? Do we even agree what the features are?

I've started a Google spreadsheet to identify and categorize features defined by the Core and Validation specifications as they currently stand in the main branch of the spec repo (which is considered draft-next). This sheet is available for anyone to view, but editing will be restricted to the core team.

I would also like to propose the idea of publishing a stable-ish interim spec which moves us closer to our goals of full stability guarantees. This will show the community that this project is still active and that we're beginning to address their concerns about stability. To achieve this, we will need:

Some sort of publication mechanism - As bare bones as possible, probably something as simple as converting the spec to Markdown and rendering it as HTML and PDF then hosting it on the website.
Analysis of current state - This will come from the spreadsheet linked above as well as work that others have done. We need to determine what we agree is stable and declare that in the next publication.

Completing these two tasks should get us to a point where we can put out something that pushes us toward our goals, communicates that we're still active, and invites feedback from the community.

Edit - Survey results

Thanks to the core team for adding their stability opinions to my spreadsheet (and for patience while I've been on leave). Here are some results:

Considered stable by everyone

$comment, additionalProperties, allOf, anyOf, const, deprecated, description, enum, examples, exclusiveMaximum,
exclusiveMinimum, maximum, maxItems, maxLength, maxProperties, minimum, minItems, minLength, minProperties,
not, oneOf, patternProperties, properties, propertyNames, readOnly, required, title, type, uniqueItems, writeOnly

Also, pattern was flagged by @ether as "semi-stable" as support of ECMA-style regex varies across languages, but otherwise it's regarded as stable.

The rest of the keywords have various objections to "stable" status from several members. Some of these are in the vein of "it hasn't been around long enough in its current state" (i.e. it's recently added or changed), while others are more practical concerns. I encourage everyone to read through everyone else's notes to get an understanding of where we're all coming from.

There are also a couple viewpoints that everything is stable by virtue of there being published documents that define them. I think this highlights the need for us to further nail down exactly what we mean when we talk about "stability." I tend to think of a feature as stable if it isn't expected to change functionally from one version of the document to the next. I feel that calling something defined in an immutable document is stable is something of a tautology. Sure, that document (e.g. the 2020-12 publication) is isn't going to change, so what it defines isn't going to change, but I don't think that's what we're talking about. We're looking at changes in how individual features are expressed over multiple publications.

handrews · 2022-12-07T00:08:37Z

handrews
Dec 7, 2022

Thanks, @gregsdennis . Can we have each core team member label their sheet (or sheets - I expect to use two) in the spreadsheet document with our github usernames? That will help us discuss the different state assessments.

To capture a discussion point from yesterday's call, let's emphasize that what is "interim" about the "interim spec" is the process and format of publishing the spec. It will be a full, non-draft specification intended for regular use, and all stability guarantees will be real stability guarantees. Where the "interim" nature of this publication shows up with respect to stability is that it will be very conservative about guaranteeing stability. If the core team is not confident of something's stability, it will not yet be marked stable, even if it is a critically important part of JSON Schema.

I'd like to add that since publishing a specification is the one thing that reliably provokes feedback from a broad range of implementers, users, and downstream specifications, that this interim publication will have an explicit goal of guiding readers to understand how these process changes will impact them, and how to ask questions provide feedback regarding such impacts. This does not mean adding a lot of editorializing into the specification itself, but might make use of the equivalent of CREFs, which we have used to set expectations regarding likely future changes in the past.

In my view, the unresolved topics that we want to get feedback on by publishing this "interim" spec include but are not limited to:

the definition of pre-stable and deprecation states
the criteria for moving from one state to the next
the frequency with which those state changes can happen
the degree to which the specification is mutable (related to the frequency point above)
how schema authors and users make use of, or prevent the use of, not-yet-stable keywords and features
the role of meta-schemas, and the expected behavior in the absence of an explicit meta-schema
the actual process for getting a new feature or change from proposal through the stages to stable, and the roles involved
the scope of "standard" JSON Schema
- is it the set of vocabularies we published in 2020-12?
- how does it relate to generative use cases, if at all?
the boundaries between this spec and:
- the media type specification (still an IETF draft within the HTTP APIs working group)
- any referencing specification

This is in addition to the usual sort of possible schema changes. For example, we've long talked of a machine-readable vocabulary description file. Some of these might be closely related to process questions.

Perhaps the most important thing is to figure out whether what we publish makes implementers more confident or less confident that implementing the latest spec will allow them to have a more stable and easily-updated architecture. The jump between draft-07 and 2019-09 involved significant architectural changes around annotation and error collection, as well as dynamic behavior at runtime. I believe some of @awwright 's changes could be similarly disruptive (for example, tryError which requires suppressing a runtime error and continuing evaluation). Will implementers feel that this spec sets expectations correctly and gives enough guidance on what can be coded with confidence and what needs to be isolated in case it changes?

36 replies

handrews Jan 18, 2023

@jimmylewis I forgot there's actually an ADR documenting the intended process direction. I'll update the previous comment so folks don't have to go searching for it if they read that first.

jdesrosiers Jan 18, 2023
Maintainer

@jimmylewis Thanks for sharing your data! We don't have the ability collect data like this, so it's very helpful when people share. Usually, the only thing we have to go off of is anecdotal based on SO, Slack, and other channels. My impression is that people only use draft-04 because they have some constraint forcing that choice, which is usually OpenAPI 3.0 (or older). Of those who have a choice, the majority of people seem to be choosing draft-07 right now, although the main reason to not choose a newer release seems to be concerns about lack of tooling support for the newer versions. Of course those impressions are skewed toward people who ask questions and engage with our community. There could be a quiet majority with different views and we just don't see it.

I agree that schemastore is not a good measure of what people are using or what they want to use. Schemastore recommends that people upload schemas in the lowest version they can, which artificially skews numbers toward the earlier drafts. I don't know if it's possible, but it would be interesting to see your data with schemastore schemas filtered out. Even better would be if you could filter out schemas used with OpenAPI. That would give us a better idea about what people are actually choosing to use.

I agree that draft-07 is a good place to start for a stable release, but simply declaring draft-07 stable isn't feasible. The whole point of stability is that later releases won't include incompatible changes, but 2019-09 and 2020-12 violate that guarantee. I don't think everyone would be comfortable with effectively abandoning those changes. However, it is reasonable to use draft-07 as a guide for what should be stable in the next release.

Maybe having a more formally staged pipeline like CSS would help JSON Schema build more trust

That's exactly what I've been proposing! I hope that's what we end up doing, but things are uncertain right now. It's good to hear support for that approach.

jimmylewis Jan 20, 2023

@handrews thanks for the link, I see this discussion has been going on for quite a while. The additional context helps for the periodic interloper like me. :)

@jdesrosiers

simply declaring draft-07 stable isn't feasible. The whole point of stability is that later releases won't include incompatible changes, but 2019-09 and 2020-12 violate that guarantee.

For sure, and I think the incompatible changes are almost always improvements learned the hard way. Hopefully they stop changing. 😆

My comment was more to reinforce that the scope of draft-7, however it may have been refined by the newer drafts, is well established and tested across a wide range of schemas. Had it been marked Stable at the time, I think we'd be discussing whether JSON Schema is approaching a stable 2.0 version, and how to manage breaking changes across major versions.

I don't know if it's feasible to assume the stable features will never change. If there's a 1.0 Stable marked today, will it never change? Why didn't that consistency happen any time before? At some point the plan needs to accommodate such changes across stable versions.

Of those who have a choice, the majority of people seem to be choosing draft-07 right now, although the main reason to not choose a newer release seems to be concerns about lack of tooling support for the newer versions.

Is there any sharable aggregated data to support that lack of tooling is the decision pivot? Survey results or quotes from folks in the ecosystem would be great to refer to here. (I'll be honest, I have a lack of data on this one, and a lot of conjecture as to why, but as one of those tooling people, I'm interested.)

I think there's also a lot of inertia of rest once a popularly used schema is written. Until that schema changes in some significant way (i.e. more than adding new properties), there's no real need to update to a newer draft. Unless the tooling ecosystem evolves to drop support for older drafts... (and newer tools are not supporting older drafts as much, but that's a glacial pace of evolution).

I agree that schemastore is not a good measure of what people are using or what they want to use.

I'm going to disagree with this. I think that discounting SchemaStore is discounting a huge portion of the developer community who rely on JSON Schema but are already satisfied. It doesn't overlap with use cases driving the recent JSON Schema drafts, but I think it represents a large group of people who just want to edit their package.json file, or their tsconfig.json file, or their CMakeSettings.json file, or their build pipelines, etc... and they don't know or care about which draft of JSON Schema is being used. They want to know if their document is valid, what properties are available and which are required, what format those properties need to be - they just want a schema, and they want it automatically. People in each of those ecosystems (NPM, TypeScript, CMake, etc) found JSON Schema to be a common way to make these experiences work, but I posit that they aren't asking for new features or otherwise getting involved because their needs are already met - mostly with Draft-4 even, since they haven't had a compelling enough reason to update since then. SchemaStore catalogues hundreds of schemas, and Visual Studio has over a million users consuming those schemas; I'm sure VS Code has millions more, and I can only imagine for other editors. I wouldn't dismiss how well SchemaStore measures what people want; it's providing a lot of value today. (Let's ignore that SchemaStore is advising people towards older drafts. At its core, SchemaStore is just a centralized catalog.)

I think this is where the interests in JSON Schema may start to separate, and the word "tooling" starts to become fuzzy as well, such as the differences between code generation tools vs. a JSON editor/IDE vs. a validator ("does this object match the schema?") or a forms UI generated from the schema. I don't do API development, so I'm unfamiliar with how the OAS3.1 development fits into JSON Schema (or vice versa). You've both mentioned OpenAPI one of the main drivers in the features in the 2019 and 2020 drafts, and obviously, OpenAPI has a large userbase and has been working closely with the JSON Schema efforts. Are there other large projects or ecosystems that are also driving JSON Schema changes in future ~~drafts~~ releases, or is it mainly driven by the needs of OpenAPI?

OpenAPI is standardizing a primarily machine-readable format for describing APIs. I think that SchemaStore is trying to solve a different problem, namely, creating a catalog of schemas intended for guiding human editing inside a text editor, mostly config files or declarative documents, and mostly small or simple schemas at that. The schemas themselves are hand-authored in most cases, sometimes by the community to represent the document schema enforced by another body of code that is not an API endpoint (in these cases, the schema is not authoritative, but it is informative). And for schema authors, the more complex features make it more confusing to write a schema - there's a higher concept count and learning curve. A lot of the new features are not necessary for most of the schemas on SchemaStore, so why should authors not target the lowest/broadest/easiest schema version that meets their needs?

Getting back to @mwadams point that I quoted before:

One concern I have is that the spec is already (at 2020-12) ahead of most implementers and very, very far ahead of most users.

Why is the spec so far ahead of users, and why isn't the gap closing? If a stable version of JSON Schema is declared, what's the incentive for schema authors or tool owners to overcome inertia and start adopting/supporting it? Or is it another round of the waiting game? (If a stable version were declared tomorrow, do we expect it to take 3-4 years to reach a critical mass like draft-7 did? What's being developed in the meantime? How does a subsequent stable release timeframe take into account a slow adoption curve?)

jdesrosiers Jan 24, 2023
Maintainer

I don't know if it's feasible to assume the stable features will never change.

There are lots of examples where this kind of thing works. You mentioned one example, CSS. It's certainly possible, but I think it will require us to change the way we approach spec development.

If there's a 1.0 Stable marked today, will it never change?

It won't change in a backward incompatible way, but it will evolve.

Why didn't that consistency happen any time before?

It just hasn't been the mindset we've been working with. Compatibility between releases just wasn't a goal.

Is there any sharable aggregated data to support that lack of tooling is the decision pivot? Survey results or quotes from folks in the ecosystem would be great to refer to here.

No, we don't have data. All we have is our impressions from what we hear from working with the community every day. I don't keep or maintain a list of quotes from community conversations, although there have been many times I've found it would have been useful to do so. However, we just had a developer advocate join us and I will suggest that he create a survey to try to get more some more concrete data on this subject.

I think that discounting SchemaStore is discounting a huge portion of the developer community who rely on JSON Schema but are already satisfied.

I see what you're saying, but I don't think I would give it as much weight as you do. For example, you can express a conditional in draft-04, but it requires confusing and verbose boolean logic constructs to make it work. The if/then keywords in draft-07 make it much easier for schema authors to write schemas. From a schema consumer perspective, they don't know or care if it was draft-04 or draft-07, the end result is the same. Updating from draft-04 to draft-07 makes no difference to schema consumers, but makes a big difference for schema authors. Schema consumers may be happy with draft-04, but schema authors aren't and it doesn't hurt consumers to use draft-07 as long as tooling supports it. (I know you're not arguing to stay on draft-04, this is just a general example)

Are there other large projects or ecosystems that are also driving JSON Schema changes in future drafts releases, or is it mainly driven by the needs of OpenAPI?

We do work closely with OpenAPI, but their needs don't drive the evolution of JSON Schema. In fact, the Vocabulary System was created so that we don't need to add things to JSON Schema that are specific to the needs of any particular project. Open API has some unique needs based on their domain and they have created their own JSON Schema Vocabulary to support those needs. Generally, all JSON Schema is designed for is the data validation use case. Nearly all changes have been to address schema authoring difficulties that we see in the community. For example, we got so many questions about how to do conditional validation that we decided to add the if/then keywords to make it easier.

A lot of the new features are not necessary for most of the schemas on SchemaStore, so why should authors not target the lowest/broadest/easiest schema version that meets their needs?

The problem with using the lowest version is that implementations are dropping support for older versions and newer implementations often don't even support older drafts at all. Newer versions of JSON Schema do have a couple more complex features, but using that version doesn't mean you have to use them. In the case of most simple schemas, a draft-04 schema is identical to a 2020-12 schema. So, I'd say, why not use the latest version? Having some advanced features available that you aren't using doesn't make it any harder to write a simple schema.

Why is the spec so far ahead of users, and why isn't the gap closing?

My impression is that the gap is closing. I just did a survey of the most recently active JSON Schema questions on StackOverflow and what version of JSON Schema they were using.

Unknown - 6 - They didn't say what version and only used keyword available in every version, so I couldn't guess
draft-04 - 3 - These were all cases where the author didn't have a choice (mostly OpenAPI)
draft-07 - 4
2020-12 - 6

The results are consistent with my impression of what I've been seeing recently. 2020-12 isn't lagging with people who are writing schemas today. In fact, it looks like it's the most common choice by a small margin. I expect people aren't upgrading without good reason, but when they start something new, 2020-12 is usually their choice.

If a stable version of JSON Schema is declared, what's the incentive for schema authors or tool owners to overcome inertia and start adopting/supporting it? Or is it another round of the waiting game?

I expect to see the same dynamic we've always had with new releases. People want to use the latest version for new work and want their previous work to keep working the way it always has. So, yes, it will be years before the majority of what people are working with is the stable version, but that doesn't mean people won't be benefiting from the new version while that transition is happening.

We don't plan on stuffing a bunch of new features into the next release, so the incentive to use the new version isn't features. The incentive is risk management. It's a good software engineering choice. Any releases after the first stable release will be backward compatible. That means your old schemas will always be up-to-date with the latest version. A 2023 schema is also a 2025 schema. You don't need to worry about implementers removing support for a version and having to pin to an unsupported old version of a library. That library can be kept fully up-to-date and your old schemas will continue to evaluate the same way.

jimmylewis Jan 24, 2023

Great answers all around, thanks for being patient and open about all my questions.

awwright · 2022-12-09T05:33:33Z

awwright
Dec 9, 2022
Maintainer

If these proposals shed light on some end state, how do we get there?

If we were to break down the specification into a list of features, what can we declare as stable? Do we even agree what the features are?

I tend to agree that a “stable” specification should be prioritized. I’m also optimistic that there’s a straightforward path: While there’s many keywords that might be “unstable” or experimental, we can stabilize the media type definition independently of non-core keywords. This is the “stable core” approach: introducing those capitalized BCP 14 keywords so that newer features can be introduced and standardized, in a way that guarantees interoperability between implementations, and between publications of JSON Schema.

You've hit upon a bit of a paradox: We want to stop saying "draft" because that scares people away from using JSON Schema implementations in production (which are themselves stable), but we haven't guaranteed that JSON Schema won't keep changing in incompatible ways in the future, so how can we call it "stable"? I'm not aware of a term halfway in between these two. (I think the "draft" term is appropriate, if misunderstood; most of my work for the past several years has been researching how to stabilize JSON Schema Core and publish a suitable media type definition).

I've only casually mentioned what the "next step" is, so let me explain now: I think I'll file an issue to add in the "indeterminate" state that I discuss in my paper, to find consensus over the details, and then write up the changes and file a PR.

As @handrews points out, the paper contains many different ideas that may or may not be any good, but I'll try to break them apart into different issues that can each go in or not, independently of each other, and I'll file these in series. I’ve been struggling to decide on a good starting point, there’s a few different issues that could all go first, but the “indeterminate result” is the most critical aspect of this.

The reason to do this before anything else is once we have rigorous interoperability requirements, we will have a solid yardstick to judge which keywords are stable, and which changes would break implementations.

Regarding the “publication mechanism”, I'm unsure how the current formatting impacts stability. The page on the website looks perfectly fine. And converting the formatting to Markdown is interesting, but hardly necessary. Can you describe how this helps stability, or is it just a nice feature to help editing?

10 replies

awwright Dec 13, 2022
Maintainer

If we release a spec that looks like an I-D and says it's an I-D, people are going to think it's an I-D. We'd be in exactly the same situation we are now. We aren't really associated with IETF or following their process, but people will think that we are.

I'm not sure how someone reads the specification, sees that it looks like an I-D, and writes an implementation incorrectly because of this. Or any other reason this would be meaningful to a reader. Can you please explain?

In any event, you can customize the look to whatever way you want.

jdesrosiers Dec 14, 2022
Maintainer

Austin, we've discussed this many times. There's an ADR and a blog post covering it.

From the ADR ...

Our perceived involvement with IETF causes confusion and misunderstanding within our community in the cases were our practices and the realities of our situation deviate from a typical IETF I-D lifecycle.

It's not about implementing the spec correctly. It's about the message we're sending by presenting as an I-D. People think I-D means "unfinished" and "not production ready", and they ask when JSON Schema is going to be "done". This isn't the way people should be perceiving the spec and it's our fault because we are presenting it in a way that makes that interpretation the most reasonable conclusion. The spec being an I-D sends the wrong messages. People aren't getting the message that they shouldn't make assumptions based on the spec looking like an I-D, so we aren't addressing the problem if we continue to present as an I-D.

you can customize the look to whatever way you want.

Can we hide the header? That would be a minimum requirement. Everything about this says it's an I-D. The only thing there that's not harmful is the publish date.

Workgroup: Internet Engineering Task Force
Internet-Draft: draft-bhutton-json-schema-01
Published: 16 June 2022
Intended Status: Informational
Expires: 18 December 2022

Even if you could remove everything that says the wrong things, I'd lean toward not using it. Even people seeing that we use an IETF toolchain could lead to confusion and incorrect assumptions. It's less likely, but I think it's best just to make a clean break.

awwright Dec 16, 2022
Maintainer

It does seem plausible that casual readers could be confused by the technical meaning of “draft” and this is concerning. Now, while I haven’t seen any cases specifically, I was led to believe it’s purely the about word “draft", the idea anyone is looking at the selection of toolchain to decide that they shouldn’t implement or use JSON Schema is a stretch—nobody should care about the toolchain except the editors. If this is the case, there is something going on that I need to closely examine.

You can hide just about everything. You can see my PR I just made for an example of how you write a document that's not for I-D submission.

jdesrosiers Dec 19, 2022
Maintainer

the idea anyone is looking at the selection of toolchain to decide that they shouldn’t implement or use JSON Schema is a stretch

Fair enough. I agree it's unlikely. As long as what we produce doesn't include "draft", "IETF", or anything else that might lead someone to misunderstand our relationship with IETF, then I'm open to considering it. I suggest creating a new issue proposing what you want to use.

handrews Jan 10, 2023

@awwright

While there’s many keywords that might be “unstable” or experimental, we can stabilize the media type definition independently of non-core keywords. This is the “stable core” approach: introducing those capitalized BCP 14 keywords so that newer features can be introduced and standardized, in a way that guarantees interoperability between implementations, and between publications of JSON Schema.

@awwright Do you feel that what @Relequestual just proposed allows for this approach? While (last time it came up) you and I had slightly different notions of what this entails, we're both in agreement that this level of stability could be accomplished sooner rather than later.

Defining how that works would be one (or a few) of the issues that would spin off from this discussion once we have agreement on the high-level direction, so I will wait to talk about the differences in how we see it until we are in that issue-level discussion. I don't want to add to the volume of comment sprawl here.

Relequestual · 2022-12-09T10:41:25Z

Relequestual
Dec 9, 2022
Maintainer

On the discussion about how do we reach users at all stages of use, I've found this small guide a good way to think about framing the personas and what stories to tell to each.
https://pipdecks.com/pages/innovation-curve

(Edit: Disclaimer: I was for a period of time part of an affiliate scheme for the product the linked site sells. The above link is not an affiliate link nor a product page. I was not paid to link to this information. The card/article credits "E.Rogers, Diffusion of Innovations" for the concept in case anyone would prefer the original source.)

0 replies

Relequestual · 2022-12-09T15:25:48Z

Relequestual
Dec 9, 2022
Maintainer

There's a lot of useful discussion from the replies of the original posting, and that's great! Thanks for making the time.

I brought up before, "promises" and "guarantees" are nice, and saying we promise that a keyword of feature will ALWAYS be stable seems like a good idea, except we give ourselves a get out clause with "deprecation", which is a little handwavey. I'm not saying that's a problem, just stating what I see.

If we really want to or have to break something we promised would remain forever stable, suddenly our stability promises loose all meaning.

Is a forever promise really feasable? What if we want to modify $id to support a new form of thing? What if we decide that WHATWG is right and URIs/IRIs are done and everythings a "URL" but some might still not be network addressable?
While this seems unlikely, it's feasable to believe it could happen, or something similar. (After all, we've "done" URIs to IRIs just recently.)

You could argue, we could create a new keyword, $id2. That... would be pretty horiffic. And would we really expect and require people to support both? That would be... a mess.

What's an alternative solution?
Well, we could say "stable" means "stable for at least 5 years, and then it may be broken, or not!".

Why? People are making "bets" on using JSON Schema and how long what they are doing can remain useable in production.
Five years would be a long enough time for most development use cases. We can renew that at any point, pushing it a year out with each yearly snapshot, if we so wished.

We could make it shorter... 3 years. Maybe 5 years would be TOO long, allowing people to think "eh, I'll have gone by then." We don't want that.

At that point, you could almost say, why not tick/tock a release every 1.5 years where one is "new stable" and the other is "experimental"?

Well, that wouldn't be ideal either, given we want users to be able to use the latest and greatest keywords as soon as we make them available. (We acknowledged the lag between "hey, this is done" and it being in a published spec, is just too long.)

Where does this leave us?
Maybe we could have multiple layers of stable with a 3 or 5 year promise attached?
While it might not be what we feel is ideal, all things considered, it seems preferable over "forever" stability promises which end up loosing their meaning.

Wait, doesn't this sound more related to the SDLC as opposed to making an immediate release without defining everything up front?
A little, but making forever promises feels like a critical thing we need to get right up front. As in, should we be making them, or something that gives users what they want but doesn't leave us in a position of potential compromise?

Huge thanks to @mwadams for helping me think through and explore this and being a sounding board.

Neither I nor @mwadams is advocating for any specific direction here, and I'm mearly presenting what I see, and presenting some thoughts and scenarios.

4 replies

gregsdennis Dec 9, 2022
Maintainer Author

I don't believe "forever" guarantees are too hard to have. JS does it. .Net does it. (Anything you could do in .Net 1 from 20 years ago you can still do today with .Net 7, which is actually at least 10-15 versions later depending on how you count. [Their versioning system is all over the place.])

Mostly we'll be expanding capability. $id now accepting IRIs is an additive change. On the other hand, $id no longer accepting fragments is a subtractive change, and therefore breaking. This effort is to really think through each feature and identify potential cases like this so that we can consider what's stable and what's not. $id, I think, will likely not be completely stable for this interim release, but we will definitely want it stable for the following process-led release.

The point here is that we want to be able to say to users, "$id means X," and not deviate from that. If we need something that works similarly but in a breaking way, then we'll need to have a different mechanism (e.g. a new keyword, preferably not $id2) that handles that new functionality.

I would expect implementors to support both for the duration that they're both active, yes. Eventually, the older keyword would be deprecated and the requirement on implementors to support it would be lifted.

Existing implementors who already support it could keep it (for legacy support) or remove it (e.g. it's bloat) as they see fit for their library.
New implementors would not need to support it.

I think this second point is of utmost importance.

Regarding the timeline, I expect the SDLC or whatever process we have in place will require that a feature that is replaced remain "stable" for a specified number of iterations before deprecation. For example $id and $id2 (or whatever) would both be stable requirements for a number of years before $id is deprecated. And since they do roughly the same thing, there would need to be language governing what happens if they're both present.

making forever promises feels like a critical thing we need to get right up front

That's why we want to be very conservative with this first release. Only mark stable that which we are 100% certain will not change, like minLength which hasn't changed since it was introduced with the first JSON Schema draft.

awwright Dec 10, 2022
Maintainer

@gregsdennis Every part of this is exactly on the mark... I hope I'm not being too confusing when I say there's new things that should come in first... they are just requirements that will ensure implementations handle new and unrecognized features in a consistent, unsurprising way, and that we're not painting ourselves into a corner publishing a specification that can never be changed.

What if we want to modify $id to support a new form of thing?

You introduce a new keyword that does whatever thing it is you're looking to do.

What if we decide that WHATWG is right and URIs/IRIs are done and everythings a "URL" but some might still not be network addressable?

You introduce new keywords or format identifiers. Call it $id_URL or whatever.

If all these different keywords gets to be too cumbersome for schema authors, then you can always define a new media type. This isn't far fetched, JSON is kind of unwieldy as is and there's a small cottage industry of alternate syntaxes, Orderly, etc.

mwadams Jan 10, 2023
Collaborator

FWIW (and I'm not disagreeing with the point that long term guarantees are needed) the .NET example is exactly what I'm thinking about. They make no backwards compatibility guarantees (and have made numerous breaking changes over the years, mostly in corner cases). But they do in practice work incredibly hard to maintain backwards compatibility, to the extent that they drop features because they can't be implemented in a backwards compatible way.

Our trust in their compatibility comes from experience rather than assertion.

handrews Jan 10, 2023

@gregsdennis regarding:

On the other hand, $id no longer accepting fragments is a subtractive change, and therefore breaking.

There's an important subtlety here:

We're not removing fragments in general, but just the empty fragment which is the only remaining syntactically legal fragment since 2019-09
The empty fragment is functionally identical to not having a fragment (but not in a way that RFC 3986 normalization can determine)
Starting with 2019-09 and continuing in 2020-12 (both the original release and the 2022 patch release), we have had a CREF notifying of our likely intent to make the empty fragment syntactically invalid
We already forbade fragments in $schema, which was where $id values were most likely to be used as full URIs with empty fragments (since $schema has always required a full URI)

Regarding @mwadams 's point about experience vs assertion and "numerous breaking changes over the years, mostly in corner cases": While removing support for the empty fragment from $id is technically a breaking change, it is one that we have laid the groundwork for and communicated for over three years, across three publications of the spec. We did not use the language of "deprecation", but we effectively deprecated empty fragments in $id as of 2019-09 (when we removed all other fragment support and split the declaration of plain name fragments into $anchor).

I'd argue that our approach to changing $id is almost a model for small tweaks to keywords that are technically incompatible but functionally consistent. Using clear in-text language of "deprecation" rather than a CREF would have been better, and is something we can take into account if a similar corner case tweak is required in the future.

I don't know if we're likely to have any more cases like this, but it's an interesting scenario distinct from true breaking changes (because no observable behavior becomes impossible, we're just removing redundant and confusing syntax). I'm curious as to whether this is the sort of allowable corner case breaking change @mwadams had in mind.

handrews · 2022-12-10T03:36:16Z

handrews
Dec 10, 2022

I have added my keyword assessments to the spreadsheet as a second sheet (use the tabs at the bottom of the screen to switch). Non-keyword features will be a separate sheet that I hope to have done sometime next week. Or at least start putting parts of it up. I will try to reply to all of the other threads where needed by Monday.

0 replies

jdesrosiers · 2022-12-12T19:42:34Z

jdesrosiers
Dec 12, 2022
Maintainer

Part of the proposal here is to release a spec without making decisions about the lifecycle of the spec. There are many details that we can defer, but there are several things that we need to at least tentatively define in order to put out anything. How the spec will be published? Will the release live forever (like current drafts), or will it be replaced or can it changed at some point? What does it mean for users and implementers when the spec changes or is replaced?

I suggest we continue work on defining a process, but keep it as minimal as possible for now. It would be just enough that users and implementers would know what they're getting into when they choose to use/implement this interim spec.

0 replies

jdesrosiers · 2022-12-12T20:28:56Z

jdesrosiers
Dec 12, 2022
Maintainer

Can we be more explicit about what kind of feedback we expect this interim spec to provide about our process? There are a few aspects of this that don't make sense to me.

First, if we put off deciding on any process decisions, we aren't giving people an opportunity to provide feedback because we haven't given them anything to provide feedback on.

Second, process is orthogonal to the specification. Certain process decisions can limit what we can do with the spec, but the spec doesn't tell you anything explicitly about the process used to develop and release that spec. So, I'm not sure what producing a spec is going to tell people about our process that allows them to provide feedback.

Third, for the most part, people don't care about how we develop and release the spec. How many JavaScript programmers know or care about TC-39? Probably not many. All users want/need to know is, "how will I be effected when the spec changes?". Anything else only effects those of us working on the spec. We should do what we think will work best for us, not what others think will work for us.

I'm sure this is all mostly strawman argument based on unclear understanding of what is meant by "process" and "SDLC" in this proposal, so feel free to not directly address any parts that don't make sense as long as we can answer the initial question. What kind of feedback do we expect to get from an interim spec out our process?

4 replies

gregsdennis Dec 12, 2022
Maintainer Author

I'm not expecting feedback about the process at all. I'm expecting feedback about the spec and the set features we decide to declare to be stable.

The purpose of this proposal is to publish a version of the spec that lists stability guarantees: keywords and other features that we promise won't change in subsequent versions of the spec. Everything else can be deferred.

When I say that, I mean that we should use whatever tools we need to publish the spec for now. We're not committing to that toolset or any process; we just need something that will get the job done.

jdesrosiers Dec 13, 2022
Maintainer

I'm expecting feedback about the spec and the set features we decide to declare to be stable.

Fair enough. That's a clear scope that makes sense to me. But, ...

@gregsdennis I'm not expecting feedback about the process at all.
@handrews this interim publication will have an explicit goal of guiding readers to understand how these process changes will impact them

It seems there is some disagreement about the goals of the interim release and what we expect to learn from it.

gregsdennis Dec 15, 2022
Maintainer Author

I think you're reading into that comment by Henry a bit. There are no process changes in the vein of the SDLC going into the interim release. Of course, we will have to make some changes just to publish something, but we're not looking for feedback on that.

Henry's comment goes on to mention things that we are looking for feedback on, and these are most of the things I believe you've been most focused on recently. I expect this feedback would be provoked by explicitly stating these areas in the specification (or adjacent publications) as unresolved, undefined, or not fully formed (or whatever phrasing you want). This helps make clear to the reader what they can consider to be "set in stone."

handrews Jan 10, 2023

@gregsdennis yes, your reading of my comment is correct. The idea is for this to be kind of a non-process release, which will include some implications for future process which will hopefully provoke feedback.

@Relequestual 's comment today does a better job of conveying this by framing things around proposed stability guarantees rather than locked-in guarantees. With that framing, it should be more clear that a defined process is not a pre-requisite.

Relequestual · 2023-01-10T11:59:35Z

Relequestual
Jan 10, 2023
Maintainer

I'm in agreement with @gregsdennis's posting on json-schema-org/json-schema-spec#1368 (comment)

In regard to the conflation of process/policy, I suggest the following:

we focus on completing the charter discussion/proposal first.

we then create a set of policies that help fulfill that charter (e.g. compatibility guarantees and other promises)

finally, we can create a process that ensure we adhere to those policies (e.g. publication procedures and the mechanisms by which we implement compatibility guarantees)

I think this will help isolate the differences and let us focus on each.

I'll continue to push the charter work forward.

I feel like there's a middle ground or at least a least problematic and most agreeable in relation to the next release.

It seems like we all want a release "soon", but also want or need a longer process to make promises.

What I'm proposing is slightly and subtily differnet to what we have seen so far.

I propose the following:

We aim to make an interim release
The interim release includes the changes we have made "so far" in the development branch
The interim release gets a Year-Month identifier as previous, but no "draft" identifier
We identify keyword and features we believe are a good candidate for being called stable in the next release. The identification is an indication of intent, and not making any promises at this time.
There is no other classification of stability for anything else at this time. Effectivly NULL value. This is different to "experimental" or "not yet stable".

This would give us:

A release to show we're still alive and making progress
A release with updates we know the community want and have been waiting for a long time
A release which makes no specific stability or compatability promises, allowing us further time to discuss at a slower pace
A release which doesn't directly or indirectly discourage people from implementing specific features or keywords
Signals about the things we see as very likely to be stable in the next release, making it clear people need to speak up if they have problems with that
Signals to implementers that now is the time to consider rebuilding their foundations if required
Signals to the community that we are on the path to making stability promises
Time to get our governance in order (starting with the charter inc TSC)

Based on the discussion I've seen, I think these are all the things we all want without huge compromise.
And, I think this will give us a better foundation off of which to make the next move to stability.

I know different people at different points have proposed parts of this previously. I'm not claiming 100% originality here. But I think this mix of things gives us the most things we all want, and allows us to move forward.

2 replies

handrews Jan 10, 2023

@Relequestual I agree with pushing the charter work, and will comment on that discussion today. Thanks for folding that into the priority plan here.

I think that these points are the key new goals enabled by your reformulated proposal:

A release which makes no specific stability or compatability promises, allowing us further time to discuss at a slower pace

Many of my frustrations with the project come from the pressure to address hugely complex topics and ideas in time for a Q1 release. If we can buy ourselves a year, then I can see being able to address these topics within the time I have available to volunteer, which is not possible for me to do within Q1.

In particular, the questions around non-keyword features/capabilities will require a lot of work based on what folks have submitted to the spreadsheet so far.

A release which doesn't directly or indirectly discourage people from implementing specific features or keywords

This is absolutely critical, and addresses one of my biggest concerns that I've raised at each step of the SDLC discussions but has remained present in each iteration despite that. We need to encourage more engagement with not-yet-stable things, not less.

Signals about the things we see as very likely to be stable in the next release, making it clear people need to speak up if they have problems with that

This is the most important thing, and I like this middle ground between iron-clad guarantees (which we might mess up) and just publishing as-is with no attempt to move towards stability at all

I would suggest that we include commentary somewhere, perhaps similar to CREFs but with more helpful rendering that either yellow highlighting or roll-over text, about the implications of stability and possible alternatives. For example, if/then/else makes a trade-off regarding runtime dependencies and parallelism. So we could:

mark it stable and accept that that specific kind of dependency is part of JSON Schema's capabilities (a.k.a. non-keyword features, some of which support specific keyword behaviors)
mark it unstable and replace the three keywords with a single keyword that does the same thing
deprecate it and add a new single keyword that does the same thing, which would allow us to restrict this specific type of runtime dependency as a capability

These are a couple of ways we might address the weird corner case nature of this group of keywords (paging @mwadams again- this is less corner-case-y than the no-empty-fragments-in-$id situation, but again changing or replacing these keywords with a single-keyword approach would not remove any functionality, and might provoke interesting feedback on whether folks care about this trade-off and how they'd like us to address it).

Signals to implementers that now is the time to consider rebuilding their foundations if required

This is where the non-keyword features/capabilities stuff becomes very important. We do not need to nail it all down to put out a release such as this, but we should indicate that this is an important part of what we are sorting out, and perhaps point to a location other than GitHub where we publish updates on what this looks like.

It is the underlying capabilities, rather than the individual keywords, that dictate the architecture of JSON Schema implementations. While most of the assertion and older applicator keywords are essentially stable, the internal capabilities required, including which ones are required to be available to extension keywords, is not well-defined, much less stabilized. This is the level at which JSON Schema is not close enough to "done" to be locked in, and we need to educate our community on the issues and implications here.

While not radically different what I had originally proposed informally and what Greg first wrote up here, there are important tweaks and clarifying articulations that I think are crucial, and that I enthusiastically support. I just woke up so I might think of some more things later on, but this is my main impression on first read.

handrews Jan 10, 2023

Additional caveat: I will support this direction if and only if we next move to discussing individual topics needed to get to the next release in separate issues, not as a PR for a document, and not trying to resolve them significantly further in this sprawling discussion.

This discussion should be considered resolved if we have agreement on this (or another) direction, at which point we should move to issues.

Defining the path forward #282

gregsdennis Dec 6, 2022 Maintainer

Stability

Options

The Path Forward

Edit - Survey results

Replies: 8 comments · 56 replies

jdesrosiers Jan 18, 2023 Maintainer

jdesrosiers Jan 24, 2023 Maintainer

awwright Dec 9, 2022 Maintainer

awwright Dec 13, 2022 Maintainer

jdesrosiers Dec 14, 2022 Maintainer

awwright Dec 16, 2022 Maintainer

jdesrosiers Dec 19, 2022 Maintainer

Relequestual Dec 9, 2022 Maintainer

Relequestual Dec 9, 2022 Maintainer

gregsdennis Dec 9, 2022 Maintainer Author

awwright Dec 10, 2022 Maintainer

mwadams Jan 10, 2023 Collaborator

jdesrosiers Dec 12, 2022 Maintainer

jdesrosiers Dec 12, 2022 Maintainer

gregsdennis Dec 12, 2022 Maintainer Author

jdesrosiers Dec 13, 2022 Maintainer

gregsdennis Dec 15, 2022 Maintainer Author

Relequestual Jan 10, 2023 Maintainer

gregsdennis
Dec 6, 2022
Maintainer

Replies: 8 comments 56 replies

jdesrosiers Jan 18, 2023
Maintainer

jdesrosiers Jan 24, 2023
Maintainer

awwright
Dec 9, 2022
Maintainer

awwright Dec 13, 2022
Maintainer

jdesrosiers Dec 14, 2022
Maintainer

awwright Dec 16, 2022
Maintainer

jdesrosiers Dec 19, 2022
Maintainer

Relequestual
Dec 9, 2022
Maintainer

Relequestual
Dec 9, 2022
Maintainer

gregsdennis Dec 9, 2022
Maintainer Author

awwright Dec 10, 2022
Maintainer

mwadams Jan 10, 2023
Collaborator

jdesrosiers
Dec 12, 2022
Maintainer

jdesrosiers
Dec 12, 2022
Maintainer

gregsdennis Dec 12, 2022
Maintainer Author

jdesrosiers Dec 13, 2022
Maintainer

gregsdennis Dec 15, 2022
Maintainer Author

Relequestual
Jan 10, 2023
Maintainer