Code generation for JSON Schema (Draft 07).
NOTE: – from version 0.110, the underlying JSON and YAML libraries have been switched from
jsonutil
and yaml-simple
to
kjson-core
and kjson-yaml
.
The change should be transparent to most users.
Also, the Mustache processor has been switched from kotlin-mustache
to mustache-k
, but this change will almost
certainly not affect any users.
New in version 0.106 – the classNames
configuration option has been extended to allow the configuration of
generated nested class names.
See classNames
in the Configuration Guide for more details.
New in version 0.105 – the generator will optionally validate default
and examples
entries against the schema
in which they appear.
See the
examplesValidationOption
and defaultValidationOption
section in the Configuration Guide.
New in version 0.100 – the generator will produce classes that handle additionalProperties
and
patternProperties
.
See the additionalProperties
and patternProperties
guide for more details.
New in version 0.87 – the generator now recognises a special case of anyOf
or oneOf
to specify nullability.
See below.
New in version 0.84 – the generator will now recognise the not
schema for most validations, and will output
reversed validation checks.
For example, "not": { "const": "error" }
in a property sub-schema will test that a string is not equal to "error".
Added to the code generator – the ability to configure generation options using a JSON or YAML file. See the documentation at CONFIG.md.
Also build tool support – see below.
Also, the ability to add annotations to generated classes – see annotations
.
And from version 0.86 onward, the ability to force the output of a companion object for all or selected classes –
see companionObject
.
JSON Schema provides a means of describing JSON values – the properties of an object, constraints on values etc. – in considerable detail. Many APIs now use JSON Schema to specify the content of JSON parameters and response objects, either directly or as part of an OpenAPI specification, and one way of ensuring conformity to the specified schema is to generate code directly from the schema itself.
This is not always possible – some characteristics described by a schema may not be representable in the implementation language. But for a large subset of schema definitions a viable code representation is achievable, and this library attempts to provide conversions for the broadest possible range of JSON Schema definitions.
The library uses a template mechanism (employing Mustache templates), and templates are provided to generate classes in Kotlin and Java, or interfaces in TypeScript.
Simply create a CodeGenerator
, supply it with details like destination directory and package name, and invoke the
generation process:
val codeGenerator = CodeGenerator()
codeGenerator.baseDirectoryName = "output/directory"
codeGenerator.basePackageName = "com.example"
codeGenerator.generate(File("/path/to/example.schema.json"))
The resulting code will be something like this (assuming the schema is the one referred to in the introduction to json-kotlin-schema):
package com.example
import java.math.BigDecimal
data class Test(
/** Product identifier */
val id: BigDecimal,
/** Name of the product */
val name: String,
val price: BigDecimal,
val tags: List<String>? = null,
val stock: Stock? = null
) {
init {
require(price >= cg_dec0) { "price < minimum 0 - $price" }
}
data class Stock(
val warehouse: BigDecimal? = null,
val retail: BigDecimal? = null
)
companion object {
private val cg_dec0 = BigDecimal.ZERO
}
}
Some points to note:
- the generated class is an immutable value object (in Java, getters are generated but not setters)
- validations in the JSON Schema become initialisation checks in Kotlin
- nested objects are converted to Kotlin nested classes (or Java static nested classes)
- fields of type
number
are implemented asBigDecimal
(there is insufficient information in the schema in this case to allow the field to be considered anInt
orLong
) - non-required fields may be nullable and may be omitted from the constructor (the inclusion of
null
in thetype
array will allow a field to be nullable, but not to be omitted from the constructor) - a
description
will be converted to a KDoc comment in the generated code if available
The Code Generator can process a single file or multiple files in one invocation.
The generate()
function takes either a List
or a vararg
parameter array, and each item may be a file or a
directory.
In the latter case all files in the directory with filenames ending .json
or .yaml
(or .yml
) will be processed.
It is preferable to process multiple files in this way because the code generator can create references to other classes
that it knows about – that is, classes generated in the same run.
For example, if a properties
entry consists of a $ref
pointing to a schema that is in the list of files to be
generated, then a reference to an object of that type will be generated (instead of a nested class).
It is important to note that the output code will be “clean” – that is, it will not contain
annotations or other references to external libraries.
I recommend the use of the kjson
library for JSON serialisation and
deserialisation, but the classes generated by this library should be capable of being processed by the library of your
choice.
There is one exception to this – classes containing properties subject to “format
” validations will
in some cases cause references to the external library json-validation
to be generated, and this library must be included in the build of the generated code.
The format keywords that will lead to the requirement for this library are:
email
hostname
ipv4
ipv6
json-pointer
relative-json-pointer
uri-template
The idn-email
, idn-hostname
, iri
and iri-reference
format keywords are not currently implemented, and the use of
these keywords will have no effect.
The default in the JSON Schema specification for additionalProperties
is true
, meaning that any additional
properties in an object will be accepted without validation.
Many schema designers will be happy with this default, or will even explicitly specify true
, so that future extensions
to the schema will not cause existing uses to have problems.
Unfortunately, for a class to accept properties with names not known in advance requires very much more complex code
than for a simple class, so in normal usage the code generator takes the additionalProperties
setting to be false
,
even if it is specified otherwise.
To specify that the code generator is to interpret the additionalProperties
keyword strictly as specified (including
defaulting to true
), the configuration option additionalPropertiesOption
must be set to strict
(see the additionalProperties
and patternProperties
guide for more
information).
Most JSON deserialisation libraries have a means of specifying that additional properties are to be ignored; for the
kjson
library, the allowExtra
variable (Boolean
) in JSONConfig
must be set
to true
.
The code generator will create a data class
whenever possible.
This has a number of advantages, including the automatic provision of equals
and hashCode
functions, keeping the
generated code as concise and readable as possible.
Unfortunately, it is not always possible to use a data class
.
When the generated code involves inheritance, with one class extending another, the base class will be generated as an
open class
and the derived class as a plain class
.
In these cases the code generator will supply the missing functions – equals
, hashCode
, toString
, copy
and
the component[n]
functions that would otherwise be created automatically for a data class
.
To ensure that the full range of values can be accommodated, any property or array item of type number
or integer
will be generated as a java.math.BigDecimal
, unless it includes validations that restrict the range of values to those
that will fit in an Int
or a Long
(see Kotlin Multi-Platform (KMP) for non-JVM uses),
For example:
month:
type: integer
minimum: 1
maximum: 12
This will cause the property month
to be generated as an Int
, because the possible range of values will always fit
in 32 bits.
It is always good practice to specify minimum
and maximum
constraints on integer
properties to ensure that the
appropriate form of storage is used.
The standard way of specifying a value as nullable in JSON Schema is to use the type
keyword:
{ "type": [ "object", "null" ] }
When the code generator encounters a property or array item defined in this way, it will make the generated type nullable.
A problem arises when we consider the interaction of this declaration of nullability with the required
keyword.
What should the generator produce for an object property that does not include null
as a possible type, but is not in
the required
list?
The solution adopted by the code generator is to treat the property as if it had been defined to allow null
, and this
seems to work well for the majority of cases, although strictly speaking, it is not an accurate reflection of the
schema.
In particular, it helps with the case of utility sub-schema which is included by means of $ref
in multiple places, in
some cases nullable and in some cases not.
For example, an invoice may have a billing address and an optional delivery address, both of which follow a common
pattern defined in its own schema.
The shared definition will have "type": "object"
, but the delivery address will need to be nullable, so generating a
nullable type for a reference omitted from the required
list will have the desired effect.
But this solution does not work for all circumstances.
For example, it does not cover the case of an included sub-schema as an array item – there is no required
for
array items.
One way of specifying such a schema using the full capabilities of JSON Schema is as follows:
{
"type": "object",
"properties": {
"billingAddress": {
"$ref": "http://example.com/schema/address"
},
"deliveryAddress": {
"anyOf": [
{ "$ref": "http://example.com/schema/address" },
{ "type": "null" }
]
}
},
"required": [ "billingAddress", "deliveryAddress" ]
}
It is not easy to generate code for the general case of oneOf
or anyOf
, but the code generator will detect this
specific case to output the deliveryAddress
as nullable:
- The
anyOf
oroneOf
array must have exactly two sub-schema items - One of the items must be just
{ "type": "null" }
In this case, the code generator will generate code for the other sub-schema item (the one that is not
{ "type": "null" }
, often a $ref
), and treat the result as nullable.
(NOTE – the configuration file may be a simpler way to specify custom classes, particularly when combined with other configuration options. See the Configuration Guide.)
The code generator can use custom types for properties and array items. This can be valuable when, for example, an organisation has its own custom classes for what are sometimes called "domain primitives" – value objects representing a fundamental concept for the functional area.
A common example of a domain primitive is a class to hold a money value, taking a String
in its constructor and
storing the value as either a Long
of cents or a BigDecimal
.
There are three ways of specifying a custom class to the code generator:
- URI
- Custom
format
types - Custom keywords
An individual item in a schema may be nominated by the URI of the element itself.
For example, in the schema mentioned in the Quick Start section there is a field named price
.
To specify that the code generator is to use a Money
class for this field, use:
codeGenerator.addCustomClassByURI(URI("http://pwall.net/test#/properties/price"), "com.example.Money")
The base URI can be either the URL used to locate the schema file or the URI in the $id
of the schema.
A distinct advantage of this technique is that when a $ref
is used to share a common definition of a field type, the
destination of the $ref
can be specified to the code generator function shown above, and all references to it will use
the nominated class.
It is also the least obtrusive approach – it does not require modification to the schema or non-standard syntax.
The JSON Schema specification allows for non-standard format
types.
For example, if the specification of the property in the schema contained "format": "x-local-money"
, then the
following will cause the property to use a custom class:
codeGenerator.addCustomClassByFormat("x-local-money", "com.example.Money")
The JSON Schema specification also allows for completely non-standard keywords.
For example, the schema could contain "x-local-type": "money"
, in which case the following would invoke the use of the
custom class:
codeGenerator.addCustomClassByExtension("x-local-type", "money", "com.example.Money")
Most uses of the code generator target the JVM versions of Kotlin (and the code generator itself runs only on the JVM), but with the right configuration it can be used to generate code for any environment. The main areas that require configuration are:
- The default classes for strings with
format: date
,format: time
andformat:date-time
are classes from the JVMjava.time
package (java.time.LocalDate
,java.time.OffsetTime
andjava.time.OffsetDateTime
respectively). Schema files using these formats must includecustomClasses
configuration, specifying alternative classes (thekotlinx-datetime
library my be suitable, although it does not have exact matches for the JVM classes). - Decimal values use
java.math.BigDecimal
on the JVM, and this can not be configured usingcustomClasses
. Starting from version 0.107, thedecimalClassName
configuration setting allows the specification of an alternative class for decimal values.
This code generator targets the Draft-07 of the JSON Schema specification, and it includes some features from Draft 2019-09.
It also includes support for the int32
and int64
format types from the
OpenAPI 3.0 Specification.
A CodeGenerator
object is used to perform the generation.
It takes a number of parameters, many of which can be specified either as constructor parameters or by modifying
variables in the constructed instance.
targetLanguage
– aTargetLanguage
enum
specifying the target language for code generation (the options areKOTLIN
,JAVA
orTYPESCRIPT
– TypeScript coverage is not as advanced as that of the others at this time)templateName
– the primary template to use for the generation of a classenumTemplateName
– the primary template to use for the generation of an enumbasePackageName
– the base package name for the generated classes (if directories are supplied to thegenerate()
function, the subdirectory names are used as sub-package names)baseDirectoryName
– the base directory to use for generated output (in line with the Java convention, output directory structure will follow the package structure)derivePackageFromStructure
– a boolean flag (defaulttrue
) to indicate that generated code for schema files in subdirectories are to be output to sub-packages following the same structuregeneratorComment
– a comment to add to the header of generated filesmarkerInterface
– a “marker” interface to be added to every class
The configure()
function takes a File
or Path
specifying a configuration file.
See CONFIG.md for details of the configuration options.
There are two generate()
functions, one taking a List
of File
s, the other taking a vararg
list of File
arguments.
As described above, it is helpful to supply all the schema objects to be generated in a single operation.
While the generate()
functions take a file or files and convert them to an internal form before generating code, the
generateClass()
and generateClasses()
functions take pre-parsed schema objects.
This can be valuable in cases like an OpenAPI file which contains a set of schema definitions embedded in another file.
The generateAll()
function allows the use of a composite file such as an OpenAPI file containing several schema
definitions.
For example, an OpenAPI file will typically have a components
section which contains definitions of the objects input
to or output from the API.
Using the generateAll()
function, the set of definitions can be selected (and optionally filtered) and the classes
generated for each of them.
To simplify the use of the code generator in conjunction with the common build tools the following plugins will perform code generation as a pre-pass to the build of a project, allowing classes to be generated and compiled in a single operation:
The latest version of the library is 0.112, and it may be obtained from the Maven Central repository.
<dependency>
<groupId>net.pwall.json</groupId>
<artifactId>json-kotlin-schema-codegen</artifactId>
<version>0.112</version>
</dependency>
implementation 'net.pwall.json:json-kotlin-schema-codegen:0.112'
implementation("net.pwall.json:json-kotlin-schema-codegen:0.112")
Peter Wall
2024-11-13