SchemaComparator #157

ghik · 2024-04-22T14:36:59Z

SchemaComparator compares two schemas for compatibility and returns a list of SchemaCompatibilityIssues.

Currently, the comparator only understands a fixed set of common schema patterns, likely to be generated by tapir from common Scala types:

trivial "any" and "nothing" schemas
primitive schemas (containing type which is not object or array + simple assertions, e.g. minimum, maximum etc.)
simple array types (type array with items + simple array assertions)
tuple-like array types (type array with prefixItems + simple array assertions)
map types (type object with additionalProperties + simple object assertions)
product types (type object with properties, possible required and dependentRequired)
discriminated coproduct types (no type, oneOf/anyOf with simple local references and discriminator)
union types and non-discriminated coproduct types (no type, oneOf/anyOf, no discriminator)

The following JSON Schema keywords are currently not understood at all:

$schema, $vocabulary, $id, $anchor, $anchor, $dynamicAnchor, $dynamicRef, $defs (maybe some of them can be ignored like annotations?)
allOf, not (except in "nothing" schema)
if, then, else
contains, maxContains, minContains, unevaluatedItems
patternProperties, unevaluatedProperties, dependentSchemas

If schemas don't fall into any of the above categories, or contain any of the mentioned unsupported keywords, they are considered opaque and compared only for plain equality (with annotations stripped). If they are not equal a generic "fallback" error is returned that indicates SchemaComparator's inability to determine compatibility between schemas (they may or may not be compatible).

…property checking

adamw · 2024-04-23T14:44:52Z

The most important missing feature is comparison of undiscriminated unions expressed with anyOf or oneOf - this is tricky because matching elements of anyOf/oneOf between two schemas can turn into a combinatorial explosion of comparisons.

Let's start with the simplest thing that works, and then we can refine :) I think for a start a very simplistic matching of schemas listed in anyOf will work (for each each schema on the left, there is a compatible schema on the right).

adamw · 2024-04-23T14:48:55Z

apispec-model/src/main/scala/sttp/apispec/validation/SchemaComparator.scala

+   *
+   * Determining compatibility (or incompatibility) of arbitrary schemas with certainty is non-trivial, or outright
+   * impossible in general. For this reason, this method works in a "best effort" manner, assuming that the schemas
+   * match one of the typical schema patterns generated by frameworks like `tapir`. In more complex situations,


libraries! ;)

apispec-model/src/main/scala/sttp/apispec/validation/SchemaComparator.scala

adamw · 2024-04-23T15:09:24Z

apispec-model/src/main/scala/sttp/apispec/validation/SchemaComparator.scala

+        def noSchema: Nothing =
+          throw new NoSuchElementException(s"could not resolve schema reference ${s.$ref.get}")
+
+        normalize(named.getOrElse(name, noSchema), named)


if we dereference all local schemas, then in an endpoint, if a schema is being referenced by many other schemas, a single change in that schema will cause a lot of incompatibility errors? unless we de-duplicate, e.g. by using a set, and they get combined?

There's a cache that prevents wasting time on comparing the same pairs of schemas multiple times, but:

cached issues will be duplicated in the final list (or tree, actually) of issues, so for better presentation we would need some additional deduplication mechanism - it could be done with some form of "references" to errors similar to schema references, or alternatively - implemented purely in the presentation layer that displays issues to the user

the cache is currently not reusable between toplevel schema comparisons - it would need to be extracted to some higher layer for full OpenApi comparisons

yeah, for full OpenAPI comparisons (which is our goal), we will have to compare schemas for each endpoint. But won't simply adding the issues to a set solve the problem?

The issues are not a flat list - they form a tree like structure that allows you to "track the path" to incompatibilities within schemas (see SubschemaCompatibilityIssue hierarchy)

ah I must have missed that ... checking again ;)

Anyway, there are several different ways we could deal with this duplication. This largely depends on how exactly we want the incompatibilities to be presented to the user. So far, I have only implemented a simple human-readable description for every issue.

Yeah, that's a good start. We'll probably want to just return the list of the issues initially.

adamw · 2024-04-23T15:10:50Z

apispec-model/src/main/scala/sttp/apispec/validation/SchemaComparator.scala

+      $comment = None,
+      title = None,
+      description = None,
+      default = None,


don't defaults affect comparisons? if a writer has a schema with default = 5, and a reader with default = 7, the deserialisation might be different?

According to JSON Schema spec, the default keyword is an annotation, which is "for documentation and user interface display purposes".

Changing the default value cannot make any previously valid data invalid according to a new schema, so technically it's not an incompatibility. Whether this can be an incompatibility on a higher, semantic level is debatable. I can imagine situations where it is and where it isn't.

For example, an integer field in a request may have had a default value of 0, but then the server decided to make this field nullable, with a default value of null, so that it can distinguish between passing 0 explicitly and not passing anything. The server can do this in a fully compatible way. It depends on its implementation.

In general, compatibility on a "semantic" level can be broken in many ways just by changing the implementation of the server, even without changing the format at all. I assumed that here we focus purely on syntactic incompatibilities.

Ok, sounds reasonable, thanks for the explanation :)

adamw · 2024-04-23T15:15:03Z

Nice work - well beyond what I imagined originally, but very thorough :).

… properly

ghik added 4 commits April 22, 2024 13:38

refactored constants in Schema

0047fff

SchemaComparator: first version

1fa03d7

added simple test for ignoring annotations

6231470

stripping $comment from schemas for comparison

641e6f2

ghik requested a review from adamw April 22, 2024 14:39

ghik added 19 commits April 22, 2024 16:41

cosmetic

01d86bc

compilation fix

4b82ab1

added test for enum & const checking

2cc0172

added test for format checking

d0a8fa9

added test for nullable schemas

8807c3f

added test for numerical assertions

ab68a80

added test for string assertions

1e291b8

simplified assumptions about product schemas, added test for product …

4c5e88c

…property checking

compilation fix

7d8b239

cosmetic - renaming tests

2900078

added test for collection schema checking

ea73309

added test for tuple-like array schema

4c0e068

improved error when comparing with empty schema

6cc33dd

added test for comparing map schemas

e974050

cosmetic

daf67f2

added tests for coproduct schemas

687b027

reorganized schema comparator tests

62c529d

cosmetic

7770eb8

SchemaComparator.compare scaladoc

3b557e6

adamw reviewed Apr 23, 2024

View reviewed changes

not insulting tapir

8a0b5e7

adamw reviewed Apr 23, 2024

View reviewed changes

apispec-model/src/main/scala/sttp/apispec/validation/SchemaComparator.scala Outdated Show resolved Hide resolved

improve naming of extractors

88e7a55

adamw reviewed Apr 23, 2024

View reviewed changes

ghik added 2 commits April 23, 2024 23:39

SchemaComparator class is public so that it can be reused for its cache

078f9a1

support for comparing untagged union schemas

d2f97cc

ghik marked this pull request as ready for review April 24, 2024 11:12

ghik added 5 commits April 24, 2024 14:33

fixed structural comparison of schemas so that it resolves references…

a56c727

… properly

comment about opaque schemas

49a12b7

libidn11 -> libidn2

c5e4f54

compilation fixed

81bfbb4

compilation fixed

84e680b

ghik requested a review from adamw April 24, 2024 14:35

adamw merged commit 19ef388 into master Apr 24, 2024
6 of 7 checks passed

mergify bot deleted the schema-comparator branch April 24, 2024 16:11

ghik mentioned this pull request Apr 25, 2024

Validation of endpoints against a given OpenAPI schema softwaremill/tapir#3645

Open

ghik mentioned this pull request Jul 4, 2024

SchemaComparator does not handle references to $defs #172

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SchemaComparator #157

SchemaComparator #157

ghik commented Apr 22, 2024 •

edited

Loading

adamw commented Apr 23, 2024

adamw Apr 23, 2024

ghik Apr 23, 2024

adamw Apr 23, 2024

ghik Apr 23, 2024 •

edited

Loading

adamw Apr 23, 2024

ghik Apr 23, 2024

adamw Apr 23, 2024

ghik Apr 23, 2024 •

edited

Loading

adamw Apr 23, 2024

adamw Apr 23, 2024

ghik Apr 23, 2024 •

edited

Loading

adamw Apr 23, 2024

adamw commented Apr 23, 2024

SchemaComparator #157

SchemaComparator #157

Conversation

ghik commented Apr 22, 2024 • edited Loading

adamw commented Apr 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghik Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghik Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghik Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamw commented Apr 23, 2024

ghik commented Apr 22, 2024 •

edited

Loading

ghik Apr 23, 2024 •

edited

Loading

ghik Apr 23, 2024 •

edited

Loading

ghik Apr 23, 2024 •

edited

Loading