Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Schema validator specification #116

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

liuggio
Copy link

@liuggio liuggio commented Feb 25, 2013

see #76

TOLS is like XSD, only more readable and simpler.

There are 3 simple rules to follow:

  • A TOLS file is itself a valid TOML file.
  • Each element could be validated.
  • Explicit validation - if an element has not a validation scheme is valid by default.

Todo:

  • gather feedback about Specification
  • gather feedback about Keywords
  • gather feedback about Usages

@mojombo
Copy link
Member

mojombo commented Feb 25, 2013

This is pretty awesome, but you're not using keygroup nesting as intended. Nested keygroups must contain the full keygroup name under which they are nested.

[owner]

  [owner.name.scheme]
  primitive = "String"
  required = true

And there doesn't seem to be a need for scheme in the keygroup name. More thoughts later.

@liuggio
Copy link
Author

liuggio commented Feb 25, 2013

ops, my fault, fixed the keygroup names & squashed

@liuggio
Copy link
Author

liuggio commented Feb 25, 2013

About removing scheme

Pro:

  • With the full keygroup name is possible to remove scheme, this reduce the number of the keywords to 7 👍
  • scheme was another Obey word 👍

Cons:

  • readability, knowing that you are on the schema validation and not in toml is good.
  • mixed type, is possible to ship toml and tols in one file (sounds creepy?).

@liuggio
Copy link
Author

liuggio commented Feb 25, 2013

Tomorrow I'll try to specify the XSD rules sequence, and Occurs

    <xsd:complexType name="client">
        <xsd:sequence>
            <xsd:element name="dsn" type="xsd:string" minOccurs="1" maxOccurs="unbounded" />
            <xsd:element name="options" type="client-options" minOccurs="0" maxOccurs="1" />
        </xsd:sequence>
        <xsd:attribute name="type" type="client-type" />
        <xsd:attribute name="alias" type="xsd:string" use="required" />
        <xsd:attribute name="logging" type="xsd:boolean" />
    </xsd:complexType>

@liuggio
Copy link
Author

liuggio commented Feb 26, 2013

while we wait for the acceptance of #127, I made the tols examples valid toml

@liuggio
Copy link
Author

liuggio commented Feb 26, 2013

the funny thing is that it's starting to make sense.

@liuggio
Copy link
Author

liuggio commented Mar 5, 2013

news on it? I think is useful have a validator, some web frameworks implemented his own validator in order to parse yml file. You could create a skelton TOML file, another use case is that you could create a short TOML file and use the default options of TOLS.

@pygy
Copy link
Contributor

pygy commented Mar 9, 2013

If type homogeneity in arrays were to be abandoned, this lightweight syntax
would do the trick:

[section]
foo = ["required", "", 0] #either a string or an int.
bar = [0000-00-00T00:00:00Z] #optional date
bar = ["required", [0.0]] a float array.

There's no need to specify whether sections are required. It is dependent
on the necessity of its keys.

-- Pierre-Yves

@BurntSushi
Copy link
Member

@pygy - I love the idea, but can we please have better syntax? Namely, be consistent with TOLS. It specifies types as strings like Integer or Datetime.

To check the type of arrays, why not just extend the type language allowed by TOLS? Assuming #154 isn't accepted: Array would still be valid, but so would Array String, Array Datetime or even Array Array Float.

It seems like your proposal also accounts for the fact that a key could take on one of N types. I think this is orthogonal to homogeneity and should be considered separately.

@pygy
Copy link
Contributor

pygy commented Mar 11, 2013

I love the idea, but can we please have better syntax? Namely, be consistent with TOLS. It specifies types as strings like Integer or Datetime.

I find TOLS excessively verbose. You can define the type structure using the scheme described above. Show, don't tell.

It does not allow to specify other constrains, though, like range, for example. I suggest below an extension to my format, from now on "TOMLS" (TOML Schema). Providing structural validation already goes a long way, though, because it ensures that the aplication will get the expected type for each value. Validating values (range, odd?, even?, prime?, set) is thus simpler.

TOLS, as is, has at least one major flaw: in [cache.clients.client], how can we tell that you don't want a hash named client?

Here's a revised version of the cache.tols example.

# Double the control chars to escape them. `**` means one asterisk.
[cache.clients.* @occurence(1+)] 
               type    = ["", "@required", 
                              "@+set:", ["redis","memcache"], 
                              "@-set:", ["mysql"]]
               alias   = ["", "@required"]
               dsn     = [ ["", "@regex:", ["^(?:[0-9]{1,3}\\.){3}[0-9]{1,3}$"]] ]
               logging = [true]
[cache.clients.*.options]
                 connection_persistent = [true, "@default:", [true]]
                 connection_timeout    = [0,    "@range:", [0, 30000]]

It's by no way perfect, but it conveys the same information as the TOLS in on third of the size, and, most importantly, the schema looks like the target document.

"@Validation commands" are of two type: either they end with a column, and the argument follows in the next value, or they don't, which means that they don't thake arguments.

Section names can be prefixed by @commands. A bare section name is actually a @structure. Actually, we may make the @structure mandatory. Simpler is better. (perhaps just @struct?)

[@structure.server] # could be just [server]
            host     = ["", "@either"]
            protocol = ["", "@either"]
            hostname = ["", "@either"]
            port     = [0,  "@range:", [0, 65536], "@either"]
[@restriction.server.@either(host|protocol,hostname,port)]

To check the type of arrays, why not just extend the type language allowed by TOLS? Assuming #154 isn't accepted: Array would still be valid, but so would Array String, Array Datetime or even Array Array Float.

It seems like your proposal also accounts for the fact that a key could take on one of N types.

No, look again at my previous post, specifically this line.

foo = ["required", "", 0] #either a string or an int.

Using the extended syntax proposed here, you can set the constrains for a given type right after it. Global constrains, like "@required" can go anywhere.

foo = [
    "", "@regex:",["a|b"],
    0, "@min:", [10],
    "@required"
]

I think this is orthogonal to homogeneity and should be considered separately.

The proposed schema requires mixed arrays (or mixed tuples), so this proposal is only viable if either are allowed.


At last, a drawback of either TOLS or TOMLS is that they can't describe themselves, because you can't describe a section with a variable number of children. It could be handled by an @command, but I've yet to find something that works well.

@liuggio
Copy link
Author

liuggio commented Mar 11, 2013

Thanks for the comment, I really appreciated,

I find TOLS excessively verbose. You can define the type structure using the scheme described above. Show, don't tell.

I totally agree with you but your syntax is a little ambiguous, and needs a lot of documentation
TOSL has the same objectives as TOML that's easy to read due to obvious semantics.

the schema looks like the target document.

👍 this is a big pro, that's why I proposed the #127.

It does not allow to specify other constrains, though, like range, for example.

Is not true or I don't understand the point
for example:

[owner.dob.range]
min = 1913-05-27T07:32:00Z
max = 2013-05-27T07:32:00Z

TOLS, as is, has at least one major flaw: in [cache.clients.client], how can we tell that you don't want a hash named client?

Sorry for the lack of documentation, but, the prototype name is not important as the comment says,
so just calling the section with a new proper name

[cache.clients.ME]              # defines a new prototype, its name is not important
primitive = "Hash"
[cache.clients.ME.range]
notin = ["client"]        

I don't like the @ syntax, the first rule of TOLS is to be a valid TOML.

Your idea is good maybe not so much readable, maybe we should think about merge both yours and mine ideas in order to deliver a better product.

At last, a drawback of either TOLS or TOMLS is that they can't describe themselves, because you can't describe a section with a variable number of children

This is not true, that's why I introduced the occurrence
it has a in, notin, min and max number of children.

@BurntSushi
Copy link
Member

I think having some sort of schema validation is a worthy goal, but I'd like to mark this as a post 1.0 feature. I would like to move expediently toward TOML 1.0, so I think it would be unwise to try and solidify a schema validator spec at the same time.

@mojombo
Copy link
Member

mojombo commented Jun 26, 2014

Agreed, schema validation will be awesome, but we need to nail down TOML 1.0 first.

@liuggio
Copy link
Author

liuggio commented Jun 16, 2015

👍

@WiSaGaN
Copy link

WiSaGaN commented Mar 12, 2016

Are we progressing toward 1.0? It seems to me if we are not going to be ready for 1.0 soon, this schema validator would be something useful to add.

@liuggio
Copy link
Author

liuggio commented Jun 2, 2016

@WiSaGaN if you think this will be merged I'll do a code rewiew and update and speed up this Issue
But I think this will not be merged :) anyway

@HelloGrayson
Copy link

Any chance of getting renewed interest here?

@golddranks
Copy link

I wonder if there were any value in using a readymade solution for JSON, like JSON Schema? I'm not familiar enough with TOML so I don't know whether it roundtrips through JSON well enough, but if it does, I think that would be a simple way to add schema validation.

@golddranks
Copy link

Anyway, having a schema language would be great in the sense that it would allow configuration file validation and autocompletion using external, generalised tools. I'd be more than happy to have that!

@adamvoss
Copy link
Contributor

adamvoss commented Aug 4, 2017

You can now validate TOML (v0.4) documents against a JSON Schema using pajv. I suppose you could even write the schema in TOML if you want then use any-json to convert it to JSON.

I have documented the implementation status of JSON Schema validation for TOML. The main issue I can think of with the available implementations is that they are JavaScript-based and thus cannot differentiate between number and integer.

@golddranks You can use any-json to experiment with round-tripping.

@vietlq
Copy link

vietlq commented Feb 9, 2018

How is this going? I'm quite interested in using schema for TOML :)

@AndreiPashkin
Copy link

What do everybody think about having ability to specify default values as a reference to other values in the config in schema?

@eksortso
Copy link
Contributor

@AndreiPashkin Well, defaults could be useful. But including defaults goes beyond simple schema validation. So I wouldn't recommend it for TOLS, at least right now.

But I can imagine an extension that could contain all sorts of data properties that are not natively handled by TOML (such as defaults, cardinality requirements, valid alternatives, etc.). A document with such features could be fed into a meta-parser, which then would handle some of the common but bothersome tasks that processing config files involves.

Save defaults for later, I'd say. Stick to the essentials in your spec, and release that as TOLS v1.0. Once it's released, you can then see how validations are being done with it, then add useful stuff later on.

@LongTengDao
Copy link
Contributor

LongTengDao commented Mar 16, 2019

Could we use TypeScript-like things? Now mostly program API docs are described in TypeScript format, which is expressive for type verify:

import { String, Integer } from 'toml-spec';

type section = {
  key-a  :String | Integer | true
  key-b? :object
};

Or:

[section]
key-a  = String | Integer | true
key-b? = Table

I think we can't get a nice validator, unless we add | and ? or ! operators, and change [table] [[array-of-tables]] to inline notation, because here needs show branches.

@verdie-g
Copy link

verdie-g commented Sep 6, 2019

Hello, I'm a big fan of the XSD and I'm looking for something similar in toml. To try the robustness of tols I have tried writing tols' tols.

Maybe I missed something but the first problem I have encountered was that I don't know how to write a schema validating dynamic key names like we find in tols:

[owner] # the key can be anything
[database]
...

So I've skipped this part and jumped right into the validation rules of an element:

[primitive]
  primitive = "String"
  [primitve.range]
  in = ["String", "Integer", "Float", "Boolean", "Datetime", "Array", "Hash"]

[default]
# primitive = ??

[required]
  primitive = "Boolean"
  default = false

[length]
  primitive = "Hash"
  [length.min]
  primitive = "Integer"
  default = 0
  [length.max]
  primitive = "Integer"

[range]
  primitive = "Hash"
  [range.min]
  # primitive = ??
  [range.max]
  # primitive = ??
  [range.in]
    primitive = "Array"
    [range.in.content]
    # primitive = ??

[pattern]
  primitive = "String"
  pattern = "/^((?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>)?(?1)??\)|\(\?(?:R|[+-]?\d+)\))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*)$/" # https://stackoverflow.com/a/172316/5407910

[content]
  primitive = "Hash"
  [content.content]
  # recursive. need a way to reference a custom type

[occurence]
  primitive = "Hash"
  [occurence.min]
  primitive = "Integer"
  default = 1 # not documented so I chose 1 like the XSD
  [occurence.max]
  primitive = "Integer"
  default = 1

Tols has some limitations and can't validate some parts of tols:

  1. Length only for String, Array, Hash
[foo]
primitive = "Integer"
[foo.length] # ???
  1. Range only for Integer, Float, Datetime
[foo]
primitive = "Array"
[foo.range] # ???
  1. Pattern only for String
[foo]
primitive = "Array"
pattern = "/.../" # ???
  1. Range, default, in/notin primitives
[foo]
primitive = "Integer"
default = "xyz" # ???

[foo]
primitive = "Integer"
in = ["xyz"] # ???
  1. Array subtable's keys not being integers
[foo]
primitive = "Array"
[foo.abc] # ???
  1. String, Integer, Float, Boolean having subtables
[foo]
primitive = "Integer"
[foo.0] # no effect

Also there are several points that are missing or that bother me:
7. As I said I don't know how validate a table with key name unknown
8. Occurence defaults are not documented
9. Hash should be renamed to Table like in toml's spec
10. Min and max are exclusive which I think is extremely confusing
11. Local Date-Time, Local Date, Local Time from toml are missing
12. Primitive could be renamed to type
13. Recursive structure are not supported so it is not possible to write tols' schema
14. Inheritance could be a nice feature, it is something I use a lot with the XSD
15. if an element has not a validation scheme is valid by default. Does it mean if there is a typo in a key, the validator won't raise an error because there is no validation rules found?

@muuvmuuv
Copy link

muuvmuuv commented Oct 1, 2020

If it has not been written somewhere I would like to see editor autocompletion too, like with JSON $schema. A URL which holds the XSD/TOLS file to validate against.

@marzer marzer mentioned this pull request Dec 4, 2020
@ChristianSi ChristianSi mentioned this pull request Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.