An data validator for Zig
This is a work in progress and fairly opinionated. It's currently focused around validating JSON data. It supports nested objects and arrays, and attempts to generate validation messages that are both user-friendly (i.e. can be displayed to users as-is) and developer-friendly (e.g. can be customized as needed).
The first step is to create "validators" using a Builder
:
const Builder = @import("validate").Builder;
// If we want to validate a "movie" that looks like:
// {"title": "A Movie", "year": 2000, "score": 8.4, "tags": ["drama", "sci-fi"]}
// First, we create a new builder. In more advanced cases, we can pass application-specific data
// into our validation (so that we can do more advanced app-specific logic). In this example, we
// won't use a state, so our builder takes a state type of `void`:
var builder = try validate.Builder(void).init(allocator);
// Validators are often long lived (e.g. the entire lifetime of the program), so deinit'ing the builder
// here might not be what you want.
// defer builder.deinit(allocator);
// Next we use our builder to create a validator for each of the individual fields:
var year_validator = builder.int(u16, .{.min = 1900, .max = 2050, .required = true});
var title_validator = builder.string(.{.min = 2, .max = 100, .required = true});
var score_validator = builder.float(f64, .{.min = 0, .max = 10});
var tag_validator = builder.string(.{.choices = &.{"action", "sci-fi", "drama"}});
// An array validator is like any other validator, except the first parameter is an optional
// validator to apply to each element in the array.
var tagsValidator = builder.array(&tagValidator, .{.max = 5});
// An object validate is like any other validator, except the first parameter is a list of
// fields, each field containing the input key and the validator:
const movieValidator = builder.object(&.{
builder.field("year", year_validator),
builder.field("title", title_Validator),
builder.field("score", score_validator),
builder.field("tags", tags_validator),
}, .{});
Validators are thread-safe.
With validators defined, we can validate data. Validation happens (1) on an input, (2) using a validator built with a Builder and (3) with a validation context. The context collects errors and maintains various internal state (e.g. when validating values in an array, it tracks the array index so that a more meaningful error message can be generated).
A context is not thread safe.
const Context = @import("validate").Context;
// validate.zig supports arbitrary application state, hence Context is a generic (and takes a type) and the 3rd parameter to init is the state. We'll cover this later. For now we use void and a void state of {}:
var context = try validate.Context(void).init(allocator, .{.max_errors = 10, .max_nesting = 4}, {});
defer context.deinit();
const jsonData = "{\"year\": \"nope\", \"score\": 94.3, \"tags\": [\"scifi\"]}";
const input = try movieValidator.validateJsonS(body, &context);
if (!validator.isValid()) {
try std.json.stringify(validator.errors(), .{.emit_null_optional_fields = false}, SOME_WRITER);
return;
}
// On success, validateJsonS returns a thin wrapper around typed.Value
// which lets us get values:
const title = movie.string("title").? // ...
The above sample outputs the validation errors as JSON to stdout. Given the `jsonData` in the snippet, the output would be:
```json
[
{"field": "year" ,"code": 4, "err": "must be an int"},
{"field": "title" ,"code": 1, "err": "is required"},
{"field": "score", "code": 14, "err": "cannot be greater than 10", "data": {"max": 1.0e+01}},
{"field": "tags.0", "code": 10, "err": "must be one of: action, sci-fi, drama", "data": {"valid": ["action", "sci-fi", "drama"]}}
]
The validate.Builder
is used to create validators. The full Builder API is described later. Validators are optimized for long-term / repeated use, so the Builder does more setup than might initially be obvious.
Validators don't require a huge amount of allocations, but some of the allocations could be considered subtle. For example, if a "name" validator gets placed in a "user" object, a "user.name" value is created as part of the "Builder" phase. Doing this upfront once mean that the "user.name" field is ready to be used (and re-used) when validation errors happen.
All this is to say that the Builder's init
takes the provided std.mem.Allocator
and creates and uses an ArenaAllocator. Individual validators do not have a deinit
function. Only the Builder
itself does.
In many applications, a single Builder will be created on startup and can be freed on shutdown.
When it comes time to actually validate data, a validate.Context
is created. The context collects errors and the internal state necessary for validation as well as for generating meaningful errors.
In the simplest case, a context is created (or taken from a validate.Pool), and passed to validator. However, in more advanced cases, particularly when a custom function is given to a validator, applications might interact with the context directly, either to access parts of the input and/or add custom validation errors.
When validating, this library attempts to minimize memory allocations as much as possible. In some cases, this is not possible. Specifically, when an error is added for an array value, the field name is dynamic, e.g, "user.favorites.4".
While this library has common validation rules for things like string length, min and max values, array size, etc., it also accepts custom validation functions. Sometimes these functions are simple and stateless. In other cases it can be desirable to have some application-specific data. For example, in a multi-tenancy application, data validation might depend on tenant-specific configuration.
The Builder
and Context
types explored above are generic functions which return a Builder(T)
and Context(T)
. When the validateJsonS
function is called, a state T
is provided, which is then passed to custom validation functions:
// Our builder will build validators that expect a `*Custom` instance
// (`Custom` is a made-up application type)
var builder = try validate.Builder(*Custom).init(allocator);
// Our nameValidator specifies a custom validation function, `validateName`
var nameValidator = builder.string({.required = true, .function = validateName})
...
fn validateName(value: []const u8, context: *Context(*Customer)) !?[]const u8 {
const customer = context.state;
// can do custom validation based on the customer
}
We then specify the same *Customer
type when creating our Context and provide the instance:
const customer: *Customer = // TODO: the current customer
var context = try Context(*Customer).init(allocator, .{}, customer);
Generated errors are meant to be both user-friendly and developer-friendly. At a minimum, every error has a code
and err
field. err
is a user-safe English string describing the error. It's "user-safe" because the errors are still generic, such as "must have no more than 10 items" as opposed to a more app-specific "cannot pick more than 10 addons".
The code
is an integer that is unique to each type of error. For example, a code=1
means that a required field was missing.
Most errors will also contain a field
string. This is the full field name, including nesting and zero-based array indexes, such as user.favorite.2.id
. This field is optional - sometimes validation errors don't belong to a specific field.
Some errors also have a data
object. The inclusion and structure of the data
object is specific to the code
. For example, a error with code=1
(required) always has a null data
field (or no data field if the errors are serialized with the emit_null_optional_fields = true
option). An error with code=8
(string min length) always has a data: {min: N}
.
Between the code
, data
and field
fields, developers should be able to programmatically consume and customize the errors.
Validation of data happens by calling validateJsonS
on an object
validator. This function returns a typed.Value
instance which is a thin wrapper around typed.Value
(see typed.zig).
The goal of typed.Value
is to provide a user-friendly API to extract the input data safely.
The returned Typed
object and its data are only valid as long as the validate.Context
that was passed into validateJson
is.
Most validators accept a custom function. This custom function will only be called if all other validators pass. Importantly, if a value is null
and required = false
, the custom validator is called with null
.
The signature of these functions is:
*const fn(value: ?T, context: *Context(S)) !?T
Where T
is the type of value being validated. For a bool validator, T
will be bool
, for a string validator T
will be []const u8
.
There are a few important things to note about custom validators. First, as already mentioned, if the value is not required and is null, the custom validator is called with null. Thus, the type of value
is ?T
. Second, custom validators can return a new value to replace the existing one, hence the return type of ?T
. Returning null
will maintain the existing value. Finally, the provided context
is useful for both simple and complex cases. At the very least, you'll need to call context.add(...)
to add errors from your validator.
The builder is used to create and own validators. When deinit
is called on the builder, all of the validators created from it are no longer valid.
var builder = try validate.Builder(void).init(allocator);
// The builder must live as long as any validator it creates.
// defer builder.deinit()
An int validator is created via the builder.int
function. This function takes an integer type and configuration structure. The full possible configuration, with default values, is show below:
const age_validator = builder.int(.{
// whether the value is required or not
.required = false,
// the minimum allowed value (inclusive of min), null == no limit
.min = null, // ?T
// the maximum allowed value (inclusive of max), null == no limit
.max = null, // ?T
// when true, will accept a string input and attempt to convert it to an integer
.parse = false,
// a custom validation function that will receive the value to validate
// along with a validation.Context.
function: ?*const fn(value: ?T, context: *Context(S)) anyerror!?T = null,
});
In rare cases (e.g. OOM) builder.int
can panic. builder.tryInt
function can be used to return an ErrorSet which can be caught/unwrapped/propagated.
Typically, this validator is invoked as part of an object validator. However, it is possible to call validateJsonValue
directly on this validator by providing a std.typed.Value
and a validation Context.
A float validator is created via the builder.float
function. This function takes a float type and configuration structure. The full possible configuration, with default values, is show below:
const rating_validator = builder.float(.{
// whether the value is required or not
.required = false,
// the minimum allowed value (inclusive of min), null == no limit
.min = null, // ?T
// the maximum allowed value (inclusive of max), null == no limit
.max = null, // ?T
// when false, integers will be accepted and converted to an f64
// when true, if an integer is given, validation will fail
.strict = false,
// when true, will accept a string input and attempt to convert it to a float
.parse = false,
// a custom validation function that will receive the value to validate
// along with a validation.Context.
function: ?*const fn(value: ?T, context: *Context(S)) anyerror!?T = null,
});
In rare cases (e.g. OOM) builder.float
can panic. builder.tryFloat
function can be used to return an ErrorSet which can be caught/unwrapped/propagated.
Typically, this validator is invoked as part of an object validator. However, it is possible to call validateJsonValue
directly on this validator by providing a std.typed.Value
and a validation Context.
A bool validator is created via the builder.bool
function. This function takes a configuration structure. The full possible configuration, with default values, is show below:
const enabled_validator = builder.boolean(.{
// whether the value is required or not
.required = false,
// when true, will accept a string input and attempt to convert it to a boolean
// "1", "t", "true" == true (string comparison is case insensitive)
// "0", "f", "false" == false (string comparison is case insensitive)
.parse = false,
// a custom validation function that will receive the value to validate
// along with a validation.Context.
function: ?*const fn(value: ?bool, context: *Context(S)) anyerror!?bool = null,
});
In rare cases (e.g. OOM) builder.bool
can panic. builder.tryBool
function can be used to return an ErrorSet which can be caught/unwrapped/propagated.
Typically, this validator is invoked as part of an object validator. However, it is possible to call validateJsonValue
directly on this validator by providing a std.typed.Value
and a validation Context.
A string validator is created via the builder.string
function. This function takes a configuration structure. The full possible configuration, with default values, is show below:
const name_validator = builder.string(.{
// whether the value is required or not
.required = false,
// the minimum length (inclusive of min), null == no limit
.min = null, // usize
// the maximum length(inclusive of max), null == no limit
.max = null, // usize
// a list of valid choices, this list is case sensitive
.choices = null, // []const []const u8
// a regular expression pattern, currently using POSIX regex, but likely to change in the future
.pattern = null // []const u8
// a custom validation function that will receive the value to validate
// along with a validation.Context.
function: ?*const fn(value: ?[]const u8, context: *Context(S)) anyerror!?[]const u8 = null,
});
In rare cases (e.g. OOM) builder.string
can panic. builder.tryString
function can be used to return an ErrorSet which can be caught/unwrapped/propagated.
Typically, this validator is invoked as part of an object validator. However, it is possible to call validateJsonValue
directly on this validator by providing a std.typed.Value
and a validation Context.
A UUID validator is created via the builder.uuid
function. This function takes a configuration structure. The full possible configuration, with default values, is show below:
const name_validator = builder.string(.{
// whether the value is required or not
.required = false,
// a custom validation function that will receive the value to validate
// along with a validation.Context.
function: ?*const fn(value: ?[]const u8, context: *Context(S)) anyerror!?[]const u8 = null,
});
In rare cases (e.g. OOM) builder.uuid
can panic. builder.tryUuid
function can be used to return an ErrorSet which can be caught/unwrapped/propagated.
Typically, this validator is invoked as part of an object validator. However, it is possible to call validateJsonValue
directly on this validator by providing a std.typed.Value
and a validation Context.
A type-less validator is created via builder.any
function. Unlike all other validators, this validator does not validate the type of the value. This validator is useful when the type of a field is only known at runtime and a custom validation function is used.
This function takes a configuration structure. The full possible configuration, with default values, is show below:
const default_validator = builder.any(.{
// whether the value is required or not
.required = false,
// a custom validation function that will receive the value to validate
// along with a validation.Context.
function: ?*const fn(value: ?[]typed.Value, context: *Context(S)) anyerror!?typed.Value = null,
});
In rare cases (e.g. OOM) builder.any
can panic. builder.tryAny
function can be used to return an ErrorSet which can be caught/unwrapped/propagated.
Typically, this validator is invoked as part of an object validator. However, it is possible to call validateJsonValue
directly on this validator by providing a std.typed.Value
and a validation Context.
An array validator is created via the builder.array
function. The array validator can validate the array itself (e.g. it's length) as well as each item within in. As such, this function takes both an optional validator to apply to the array values, as well as a configuration structure. The full possible configuration, with default values, is show below:
// name_validator will be applies to each value in the array.
// null can also be provided, in which case array items will not be validated
// (but the array itself will still be validated based on the provided configuration)
const names_validator = builder.array(name_validator, .{
// whether the value is required or not
.required = false,
// the minimum length (inclusive of min), null == no limit
.min = null, // usize
// the maximum length(inclusive of max), null == no limit
.max = null, // usize
// a custom validation function that will receive the value to validate
// along with a validation.Context.
.function = ?*const fn(value: ?typed.Array, context: *Context(S)) anyerror!?typed.Array = null,
});
In rare cases (e.g. OOM) builder.array
can panic. builder.tryArray
function can be used to return an ErrorSet which can be caught/unwrapped/propagated.
Typically, this validator is invoked as part of an object validator. However, it is possible to call validateJsonValue
directly on this validator by providing a std.typed.Value
and a validation Context.
The object validator is similar but also different from the others. Like the other validators, it's created via the builder.object
function. And, like the other validators, it takes a configuration object that defines how the object value itself should be validated.
However, unlike the other validators (but a little like the array validator), builder.object
takes a name => validator
map which defines the validator to use for each value in the object.
var user_validator = builder.object(name_validator, &.{
builder.field("age", age_validator),
builder.field("name", name_validtor),
}, .{
// whether the value is required or not
.required = false,
// a custom validation function that will receive the value to validate
// along with a validation.Context
function: ?*const fn(value: ?typed.Map, context: *Context(S)) anyerror!?typed.Map = null,
});
In rare cases (e.g. OOM) builder.object
can panic. builder.tryObject
function can be used to return an ErrorSet which can be caught/unwrapped/propagated.
One created, either validateJsonS
or validateJsonV
are used to kick-off validation. validateJsonS
takes a []const u8
. validateJsonV
takes an ?std.typed.Value
.
These return a typed.Value
(see typed.zig on sucess. On error, the context's isValid()
will return false, and the errors can be fetched using context.errors()
.
Oftentimes you'll need validators which only differ in their required
configuration. For example, an HTTP API might require a name
on create, but leave the name as optional
on update.
For int
, float
, bool
, string
and any
validators, setRequired(bool, builder)
can be called to clone the original validator with the specified required
.
For example, our above user_validator
uses a non-required name_validator
. We could change this to required by doing:
var user_validator = builder.object(name_validator, &.{
builder.field("age", age_validator),
builder.field("name", name_validator.setRequired(true, builder)),
}, .{});
In simple cases, the context is an after thought: it is created and passed to validateJsonS
or validateJsonV
.
To create a context with no custom state, use:
var context = validate.Context(void).init(allocator, .{
.max_errors = 20,
.max_nesting = 10,
}, {});
To create a context with custom state, say a *Customer
, use:
var context = validate.Context(*Customer).init(allocator, .{
.max_errors = 20,
.max_nesting = 10,
}, the_customer);
max_errors
limits how many errors will be collected. Additional errors will be silently dropped. (This is an optimization so that we can statically allocate an array to hold errors, which makes more sense since validate.Pool provides a re-usable pool of validation contexts).
max_nesting
limits the depth of the object to validate, specifically with respect to arrays. Object and array validators can be nested in any combination, but array validators are difficult as they introduce dynamic field names (e.g. "users.5.favorites.193.name"). The context must keep a stack of array indexes. This is statically allocated (the stack is merely an []usize, so setting this to a larger value should be fine).
A thread-safe re-usable pool of Contexts can be created and used:
var pool = validate.Pool(S).init(allocator, .{
// how many contexts to keep in the pool
.size = 50, // u16
// Configuration for each individual context
.max_errors = 20,
.max_nesting = 10,
})
Contexts can thus be acquired from the pool and released back into the pool:
var context = try pool.acquire();
defer pool.release(context);
The pool is non-blocking. If empty, a context is dynamically created. The pool will never grow beyond the configured sized.