Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SchemaComparator #157

Merged
merged 32 commits into from
Apr 24, 2024
Merged

SchemaComparator #157

merged 32 commits into from
Apr 24, 2024

Conversation

ghik
Copy link
Contributor

@ghik ghik commented Apr 22, 2024

Required for softwaremill/tapir#3645

SchemaComparator compares two schemas for compatibility and returns a list of SchemaCompatibilityIssues.

Currently, the comparator only understands a fixed set of common schema patterns, likely to be generated by tapir from common Scala types:

  • trivial "any" and "nothing" schemas
  • primitive schemas (containing type which is not object or array + simple assertions, e.g. minimum, maximum etc.)
  • simple array types (type array with items + simple array assertions)
  • tuple-like array types (type array with prefixItems + simple array assertions)
  • map types (type object with additionalProperties + simple object assertions)
  • product types (type object with properties, possible required and dependentRequired)
  • discriminated coproduct types (no type, oneOf/anyOf with simple local references and discriminator)
  • union types and non-discriminated coproduct types (no type, oneOf/anyOf, no discriminator)

The following JSON Schema keywords are currently not understood at all:

  • $schema, $vocabulary, $id, $anchor, $anchor, $dynamicAnchor, $dynamicRef, $defs (maybe some of them can be ignored like annotations?)
  • allOf, not (except in "nothing" schema)
  • if, then, else
  • contains, maxContains, minContains, unevaluatedItems
  • patternProperties, unevaluatedProperties, dependentSchemas

If schemas don't fall into any of the above categories, or contain any of the mentioned unsupported keywords, they are considered opaque and compared only for plain equality (with annotations stripped). If they are not equal a generic "fallback" error is returned that indicates SchemaComparator's inability to determine compatibility between schemas (they may or may not be compatible).

@ghik ghik requested a review from adamw April 22, 2024 14:39
@adamw
Copy link
Member

adamw commented Apr 23, 2024

The most important missing feature is comparison of undiscriminated unions expressed with anyOf or oneOf - this is tricky because matching elements of anyOf/oneOf between two schemas can turn into a combinatorial explosion of comparisons.

Let's start with the simplest thing that works, and then we can refine :) I think for a start a very simplistic matching of schemas listed in anyOf will work (for each each schema on the left, there is a compatible schema on the right).

*
* Determining compatibility (or incompatibility) of arbitrary schemas with certainty is non-trivial, or outright
* impossible in general. For this reason, this method works in a "best effort" manner, assuming that the schemas
* match one of the typical schema patterns generated by frameworks like `tapir`. In more complex situations,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libraries! ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pardon

def noSchema: Nothing =
throw new NoSuchElementException(s"could not resolve schema reference ${s.$ref.get}")

normalize(named.getOrElse(name, noSchema), named)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we dereference all local schemas, then in an endpoint, if a schema is being referenced by many other schemas, a single change in that schema will cause a lot of incompatibility errors? unless we de-duplicate, e.g. by using a set, and they get combined?

Copy link
Contributor Author

@ghik ghik Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a cache that prevents wasting time on comparing the same pairs of schemas multiple times, but:

  • cached issues will be duplicated in the final list (or tree, actually) of issues, so for better presentation we would need some additional deduplication mechanism - it could be done with some form of "references" to errors similar to schema references, or alternatively - implemented purely in the presentation layer that displays issues to the user
  • the cache is currently not reusable between toplevel schema comparisons - it would need to be extracted to some higher layer for full OpenApi comparisons

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, for full OpenAPI comparisons (which is our goal), we will have to compare schemas for each endpoint. But won't simply adding the issues to a set solve the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues are not a flat list - they form a tree like structure that allows you to "track the path" to incompatibilities within schemas (see SubschemaCompatibilityIssue hierarchy)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I must have missed that ... checking again ;)

Copy link
Contributor Author

@ghik ghik Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, there are several different ways we could deal with this duplication. This largely depends on how exactly we want the incompatibilities to be presented to the user. So far, I have only implemented a simple human-readable description for every issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good start. We'll probably want to just return the list of the issues initially.

$comment = None,
title = None,
description = None,
default = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't defaults affect comparisons? if a writer has a schema with default = 5, and a reader with default = 7, the deserialisation might be different?

Copy link
Contributor Author

@ghik ghik Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to JSON Schema spec, the default keyword is an annotation, which is "for documentation and user interface display purposes".

Changing the default value cannot make any previously valid data invalid according to a new schema, so technically it's not an incompatibility. Whether this can be an incompatibility on a higher, semantic level is debatable. I can imagine situations where it is and where it isn't.

For example, an integer field in a request may have had a default value of 0, but then the server decided to make this field nullable, with a default value of null, so that it can distinguish between passing 0 explicitly and not passing anything. The server can do this in a fully compatible way. It depends on its implementation.

In general, compatibility on a "semantic" level can be broken in many ways just by changing the implementation of the server, even without changing the format at all. I assumed that here we focus purely on syntactic incompatibilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds reasonable, thanks for the explanation :)

@adamw
Copy link
Member

adamw commented Apr 23, 2024

Nice work - well beyond what I imagined originally, but very thorough :).

@ghik ghik marked this pull request as ready for review April 24, 2024 11:12
@ghik ghik requested a review from adamw April 24, 2024 14:35
@adamw adamw merged commit 19ef388 into master Apr 24, 2024
6 of 7 checks passed
@mergify mergify bot deleted the schema-comparator branch April 24, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants