-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will custom type syntax be good for TOML health? #603
Comments
It's an intriguing idea (definitely post-1.0). May I make a suggestion, though? Custom types would need to be expected by parsers. Perhaps it would be better to put the custom type after the key's name, to associate that type with the key?
The parser, given the smarts to handle them, would produce converted output and handle specified constraints. For instance, One example that intrigues me is the use of units, given this. (See #514.) |
@eksortso Oh, I think that's better and thorough! Emmm... How would you deal with inline array item type? Array item are same type, I know, but inner tables/arrays in inline array could be different... |
@LongTenDao Well, we could have both types of syntax. Consider two different things: keys with custom type, and values with custom type. Here's an example that might please a few people, and for good reason. The
Key types and value types could have different meanings for the same type name, although in practice those meanings would be related. For instance, without using units, we could write (with modified syntax; we could allow value types to go before or after the value, but not both):
The All of these uses of custom type would be application specific, but their widespread adoption would suggest updates to the TOML standard in the future. The type expression in parentheses would not conflict with either the key names or the values. Both would be expressed using traditional TOML syntax, unless that type significantly modifies allowed syntax. This is far from complete, but it's a start. We just need to remember that for configurations, good documentation and proper templates that include custom key types would need to be written for those special types to carry minimal, obvious meaning to naïve readers. Update: I realized too late that the SI abbreviation for minutes is "min", not "m". Can't help it, though. My point was that key tags may mean something different from value tags with the same names. |
This's what I thought, with scruple, but —
This's really a good inspiration! It makes things look O&M! Then what do you think about syntax in We can simply specify
How do you feel? I'm suddenly a little afraid of things going towards:
|
This is definitely a post-1.0 discussion. It's definitely intriguing. |
@LongTengDao What syntax would ;; Part of a naively revised ABNF might look like this.
key = key-name [ ws type ]
type = "(" type-name ")"
type-name = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _
key-name = simple-key / dotted-key This syntax would allow tables, elements of table arrays, and keys within inline tables to have custom types. [dictionary (ordered)] To be generous, we could permit multiple type phrases with literal-string-like syntax (except no parentheses and no commas), separated with commas and whitespace.
Value type syntax can be done in a similar fashion. We probably should just stick to keylike strings for custom type names. Also, I don't think parameters are strictly necessary. If they're needed, then tables can be used to provide them. Here's an example of how we can put a CSV table value (i.e. a string) in the key [guys (csv)]
header = ['name', 'age', 'sex']
rows = [
['Manny', '100', 'male'],
['Moe', '100', 'male'],
['Jack', '100', 'male'],
] Stuff like the To this end, perhaps we can use pragmas, or some variant thereof (see #522) to specify which types are to be accepted by the parser. In combination, these things make specifying the behavior of the types more objective. For instance, imagine over time that units of measure are gradually accepted into the standard. Say that there's an external standard called "units-of-measure" that most parsers will acknowledge. We could see documents written like this:
Then later, in a nifty parallel universe after unit dimensions are adopted and variant unit type names are included in the spec:
Thoughts? Edit: Fixed the tag for minutes. |
@eksortso You can always get good idea and beautiful example! Wow.
|
In my humble opinion, this is far too complex for TOML, post-1.0 or not. Remember what the 'M' in the name stands for? |
@ChristianSi This may depend on whether the complexity is caused by this syntax, or is inherent in actual use. If the latter, then the main purpose of this syntax is precisely to avoid TOML becoming more complex. :) |
To be clear, I see this as being similar to equivalent to YAML's tags so I am fairly weary of this. I don't want to block any discussion on this but, I do think it'll be a not-so-easy task to convince me on this FWIW. |
@pradyunsg, But these sorts of tags (I prefer that name, personally) are not defined the way that YAML's tags are. In all cases covered so far, custom types' usages are all parser-dependent. I'd be fine if that's all they ever are. They serve one specific purpose, defined by the app with the parser, and that's it. But I hope that during our discourse, you can see value in some of these use cases. I'm pretty pleased with the unit-of-measure tags, and those weren't using types other than TOML's integers. Is it not simpler to say But I would never shove |
@eksortso Let's discuss some edge cases? A. How do we tag on the array self, not the item table?
How do you like this? I also want to know why you prefer I think whatever the final choice, our basis should be consistent firstly: being intuitive is the premise, then is reasonable and unified? B. How do we tag on table which not directly appeared?
Or just forbidden it, request C. How do we tag on the root table?
D. What's the order of tag processing?
Just in the reverse order they occur like above? Or always from inner to outer, like below?
Or, from inner to outer, but give the same level tags to the parser at the same time which refer to different meanings:
Sample in JS parser: function tagProcessorForEach(parent, keyOrIndex, keyOrIndexTag, value, valueTag) { } Or, only these (key tags) are valid:
And then the order is always from inner to outer. |
@LongTengDao You gave me a lot to think about. Here's my take on the subjects that you raised. TL;DR
A. How do we tag on the array self, not the item table?My preference is to bind a key tag to its key's name. The syntax That said, an exception needs to be made for table arrays. The table array syntax defines the name of the array, but there's no simple way to explicitly separate the array from its elements. The If it's needed, perhaps we allow a tag after the double brackets on the first element table of the array. It would be invalid after any element table line beyond the first one.
This is very much an exception to the norm, as you're about to see. B. How do we tag on table which not directly appeared?My preference is, if we can't refer to the key, we can't tag it. So something like By the way, despite my earlier slightly enthusiastic comments, I've come to prefer single bare-key-like tag names. So I'd actually prefer C. How do we tag on the root table?My preference: We can't. The root table is never explicitly specified, so per B., it can't be tagged. Besides, the application defines the top level's significance, so that shouldn't change. D. What's the order of tag processing?I would prefer Collection types may affect how the contents are processed. Consider the example So I would prefer Tag ordering should not change depending on whether a tag appears before or after a value. Please recall: tags can only go on one side of a value. So this is invalid: In any case, we handle, in the order that they appear, all the tags on each key-value pair or table header as they appear in the document.
I think tags on non-collection values can be permitted to go on either side of the value (only on one side per value though). And I wouldn't want to exclude their use.
But I do think tags on collection values ought to come before the collection, and that's easy to show why.
Other StuffReturning to an aside that I made earlier, I mentioned the idea of a tag-set registry. This would include, among other things, an online reference of the meanings of various related tags, blessed by the TOML community for each set's merits. Such a registry would value obviousness, minimalism, clean syntax, a high degree of useability, and very little screwing with stuff that doesn't need screwed with. Such a registry would use URLs, and I do advocate for bare-key-like tag names, which would require little conversion if they're typed in blindly by human beings, or by IDEs that are just trying to be useful. Thoughts on any of these things? |
@eksortso Only discussing one feature is so complex, how difficult it was for Tom to invent TOML! XD
The items I checked look good to me. 4 & 5Positive sequence to deal tags maybe not possible, when I try to implement it in my parser (ltd/j-toml/xOptions/tag)... Consider this: (I break lines in example inline table to see clearly)
When process tag for each level, inner level can not read outer level information more than one layer, at least not easy (like dom.parentNode api...), but outer level can easily get any deep inner level information if it need, so I think inner tag is just preliminary preparation, the order to handle tag should be from after to before. 1 & 8 & 9
I think the latter one looks more clear (avoid overwhelm conspicuousness of 10Did you mentioned the idea of tag-set registry before? Sorry I didn't see it, and can't find it... I'm not sure what you mean. If it's used for parser, I think that's good; if it's used for |
I would like to have a symbol or a term to determine which lines are manually defined in the case of designing a type syntax. |
What do you mean? Currently, all the examples in this issue, use |
I mean it would be good to see hand-written types are distinguishable from annotation types which are auto generated and be written in place. In that case, we can simplify the generated types more easily. |
@drunkwcodes Sorry, I think I need some help. @eksortso Hi, could you understand what these above mean? |
Parenthesis have too many useful meanings besides of noting types. I got an idea. |
Personally, it's hard for me to distinguish between type and calculating, like below:
It's a type, also calculating |
It would be something like this after the first pass.
It has canonical types to describe the data. So we know that it's a 2-by-3 string table with one-line header at the first glance. A delimiter like |
Couldn't we just use In toml parentheses don't have many meanings, so something like this wouldn't be ambiguous. [a] # table because of the brackets
head (1x3 string array) = [...]
body (2x3 string matrix) = [...] |
@drunkwcodes Hi The colon is very close to the semantic status of the equal sign, and data file formats generally avoid using both as much as possible, such as YAML and JSON with the colon and INI/TOML with the equal sign. But it reminded me of TypeScript, which might help #116 (comment):
But it also means that the colon gives me a validator comment sense of "equivalence" rather than "extra transform", which similar to below but with grammar effectiveness: [a] # table
head = [ ] # 1×3 str array
body = [ ] # 2×3 str matrix |
Exactly. It's all about readability. |
I think you want to write "readness" which maybe means "readability"?
Currently, it's mainly used for exploring new type, which may be not good to wholesale add into spec. The date-time*4 types are examples, which are obviously differ from other types (primitive types and structure types). It's useful, but time duration is also useful, and there are so many types useful under various situations, which more like syntactic sugar (for TOML v0.5—without any sugar:
TOML v0.5—with custom type syntax:
TOML v50—add all into spec:
|
Hello there. Here are my thoughts after all this reading. Type inferenceAdding types to a language is something that has been very much thought. The most recent programming languages like Kotlin, Swift, TypeScript, are all typed languages with type inference, and I think there is a good reason why. Types bring stability, clarity. Type inference brings ease of programming for humans. About the syntaxWhat about using the same syntax as Swift, TypeScript and Kotlin instead of a C-like syntax?
Are explicit types necessary?Kotlin, Swift, they have type inferences, but also explicit types when it is necessary. But they are programming languages, not configuration/object notation languages. Does TOML need explicit types?
When I look at this code, I have a feeling "That's cool" mixed with another feeling : "That's complicated" :p
That's not as cool, I agree. The human has to convert minutes to seconds himself. But it works fine. There is no ambiguity, thanks to the key name or the comment. And of course the parsing is a lot easier and faster to do. There is another issue with those kind of conversions : if you create an object from TOML, and then convert back the object to TOML (with a stringify function), you will lose all your type informations. About that kind of code :
I think type inference is the best. For me, the Now, my favorite point :
Ok. Here I see true potential for tags. User-defined classesIn this example, the user has a CSV object that he wants to convert to/from TOML.
we would get a true CSV object by passing the resulting Map object to the CSV constructor.
It can work not only with CSV, but with any objects you use in your project, if you've defined a valid constructor. This constructor just has to be accessible by the parser. Another example, with an user who needs to work with Books :
or...
Advantages of this idea :
|
@Lepzulnag I like this comparison. TOML is familiar and formal by now, and this is a type syntax which will be superior to those in programming languages right here. But I like it. I just googled those type syntax. I may be mistaken.
Because it is TOML. |
Indeed, "type inference" and "custom type" (or "user-defined classes") are two things. Whether BTW: "type inference" is intended for variable deassigning, computing, and passing in api, these only happen in programming language, because configure language is static (and it's complexity is exactly the same with type syntax, they are nothing different to computer): let a :string = 'abc'; // Without the actions and possible errors below,
// there is no need to hand-write a type,
// because type syntax is almost the same to value syntax
a = {}; // deassign error
a = a+1; // compute error
! function (p :boolean) { }(a) // passing error @Lepzulnag So at present, the problem may be: Is the thing whether we call it "custem type" "tag" or "user-defined classes" available for inline element? Inline table and inline array are also an object, and even string literal like date-time and url, is also going to be an object, how to express them if using
Do you want one in below?
And limit that custom type must be returned as an object type by the plugins in parser (which is mainly used for configure format), in exchange for better support of stringification (which still has many other untenable things, like an object is whether inline or not, and how to reserve dot keys)? |
A few suggestions for alternatives to parentheses were made. I don't like the colon-based syntax; visually, it's too discreet. Because of their applications, tags ought to stand out! I do like angle brackets, as it turns out. Something like Tags should follow the same format that keys follow. If you can use a tag's name as if it were a key, then it ought to be good. Thoughts? |
@LongTengDao, let me split up your last post into a few different posts. A number of things that you mentioned need to be addressed. With the mentioning of "optional" types, it may be worth revisiting the notion of a Also, this suggests that tag names ought to allow for a
This begs a question of whether tags |
Sure. It's designed for custom feature.
I think spec should never interfere the tag feature, to promise custom tag will never be conflict with official feature when upgrade spec version. Unless the spec tell what format tags are reserved in the beginning. But I still suggest official features to use
Yes. In my parser's experimental implementation, it's easy to combined use plugins: const TOML = require('@ltd/j-toml');
const toml_plugin_a = require('...');
const toml_plugin_b = require('...');
const sourceContent = `
x = <tag-x> 'value'
y = <tag-y> 'value'
z = <tag-z> 'value'
`;
const rootTable = TOML.parse(sourceContent, 0.5, '\n', true, {
mix: true,
tag ({ table, key, tag }) {
switch (tag) {
case 'tag-x':
case 'tag-y':
toml_plugin_a({ table, key, tag });
break;
case 'tag-z':
toml_plugin_b({ table, key, tag });
break;
default:
throw Error('Unknown TOML tag: <'+tag+'>.');
}
},
});
Personally, I'm both okay, whether use or not. I leave this point to other discussants.
Yeah, if only one. Because
LGTM. In the future when attributes are necessary, it will be
I can't see why we need this, is there any relation to tag topic? Tags are intended for type conversion, not validator. One more question. This rule is more unified (always before value):
Do you still think tag after keys is better, when we stop using
|
Found just now and marked for reference: @vagoff Welcome to join the discussion! Time flies and good days come~
Maybe the custom syntax is also suitable for variable reference requirement:
Other related issues collection:
|
Pardon me, @LongTengDao, for not responding earlier.
Well, the spec would set some expectations for how the parser handles the tags. In line with a minimal approach, a parser can assign a tag to a key or a value on a first pass, and then, later on, apply special typing or other features based on how the tags are interpreted by the parser plugin. The assignment part would be part of the TOML standard. The interpretation goes beyond the standard. (I said "plugin" because your experimental parser uses plugins, but a plugin isn't strictly necessary to offer special functionality.)
Agreed. That's another issue now. I'd rather save multi-tagging for later discussion.
It wasn't my intention to use Some languages have "optional" types that are identical to a simpler type except that they allow null values. A tag application like But when someone writes and presents you with a configuration template like
My conception of key tags is that they put expectations on the values assigned to those keys. So it's like
I don't see that at all. Maybe I'm revisiting the same example too much, but |
I think the conversation on this topic is going well. Just some feedback from an outsider to confirm some things and point out some other things (hopefully this will be helpful).
|
There are two things here:
I think the second item makes things too complex, stuff like this:
Is pretty hard to understand. Even things like this seems too complex to me:
And you can just use a different
Which I find much more obvious. Personally I think the
Also maybe adding multiple tags might be a good idea:
On the other hand, all of this will be implementation-defined, and an implementation can already do the same with just regular strings:
Which avoids having to add any syntax. I'm not so sure if the |
I agree with all this.
But this creates a big string escaping problem. What if we want the value to literally be the string 'compute: 5 * 60 * 60' we then have to define an escaping mechanism like 'string: "compute: 5 * 60 * 60"' and now basically all strings that contain colons need to use that syntax: which can be a painfully sharp non-standard edge case. That problem^ is the main reason I'm advocating for tags. Because without tags there are hard-coded assumption and painful edgecases, on top of a lack of standards and custom/manual parsing |
Ah yeah, that's a good point; and every application that wants something like this will have to figure out escaping as well. |
I don't think the ability for the TOML parser to parse further according to tags should be in the specification. In theory the syntax is obvious and minimal but in real life it won't be. In my opinion, after the parser has parsed the TOML file it should simply return primitive values native to most programming languages to the application. The application can then decide how to interpret and use these values. The application will know what it needs to do. If a TOML parser implementation is allowed to automatically parse input further based on tags, you are taking control away from the main application and tie/lock it to the implementation of the parser. This actually makes everything infinitely more complex and brings potential security vulnerabilities. With tags, you could trick parsers to parse complicated things and every parsers' implementation will support different tags. For example, with a What if the parser implementation decides that if you put a Tags will have different behaviours between different programming languages and different implementations. A This proposal opens up the ability for the TOML parser to arbitrary call other parsers (or any code) based on what the TOML parser you're using implemented. That is a terrible and dangerous idea. What all these points have in common is undefined behaviour. Adding tags by definition adds undefined behaviour to the spec because the implementations can do whatever they want when they encounter a tag. Adding undefined behaviour is a really, really bad idea. |
@tintin10q At the heart of your criticisms is your sentiment, which I find myself agreeing with wholeheartedly:
This is appealing because it enforces the principles of obviousness and minimalism. (This actually states your case more strongly than bringing up "undefined behavior," which is arguably bad practice in programming language specs, but TOML is not a programming language. But I digress.) In our discussions, we talked about arbitrary things that tags could do, which certainly falls outside the scope of TOML and which I must admit I was speculating about at length without security concerns. So I am changing my tone, but I have a different approach now, which I will elaborate on below. But first, let me see if I understand your point of view. Allowing the possibility for arbitrary behavior in the specification could make it seem like we encourage abuses of the syntax. We certainly don't. However, violations are already possible without changes to the spec, because some parsers read and preserve comments, and some consumers may read those comments and make changes to their configurations. Currently, we do not make explicit that comments ought to be ignored by parsers, because format-preserving parsers and TOML document encoders need that room to maneuver. I'll be opening a PR which is intended to curb this abuse. But I'll also open a separate issue around the syntax that's been discussed here, because the idea of parenthetical comments may be worth considering. Neither of these things is intended to address notions of type syntax (which in TOML is completely determined by existing value syntaxes), but I will still refer to this issue when I make them since these proposals stem from the exploration conducted here. |
I agree, TOML is not a programming language and it should not be. It is an input language. However, the parsers are written in programming languages. I believe that if arbitrary tags were added, TOML could have become a programming language because with arbitrary behavior it was essentially undefined what a parser should do when encountering a tag which means it was up to the parser to decide and the parser could decide to do anything. But perhaps this is not the same definition as undefined as on the Wikipedia I linked.
You understand my view although I don't think encourage is the right word. Even if you would explicitly discourage abuse in the spec, the ability to do so would be there and that will go wrong at some point with people wanting to do 'clever things' with their parsers and then we get to About the comments violations. I agree that somehow preserving the comments is nice. Otherwise they would all be removed from the file when you would read and write back a TOML file. However, this is less of an issue than the arbitrary tags because comments do not have to be parsed any further as they are just strings and should stay strings but clearly defining how parsers should deal with comments further is of course a good idea.
With consumers do you mean a parser implementation or an application? I think that if you mean an application than this is not that bad. Although I wouldn't that it is a good idea it is still the application making the choices not the parser. If you do mean the parser than I would say that that parser is just not compliant with the TOML spec and being too clever. With something like a |
@tintin10q You said:
That ability for abuse is still there, but any such abuse would make the abusing parser non-conformant. That's the most that we can do, really. If such abuse persists, then we could either adopt their changes into the standard or refuse to condone them, making appropriate modifications in either case. We can make it more difficult for "clever" solutions to take root. If #950 gets merged, for instance, then parsers cannot mess with configurations by looking for and reading comments. So any "clever" solution would have to rely on non-standard syntax (like type tags) or unusual naming conventions or some such voodoo to do clever things, for better or worse.
I meant post-parsing end-user applications when I said "consumers." Let's stop repeating ourselves. I don't know what will happen with the tag discussions posed here. I had an idea which may be more confusing than it needs to be, but it may serve an important purpose. What if we took the parenthetical syntax, the words in round brackets like "Clever" users might be tempted to write |
I don't think
I think the best option is just to ignore Inline comments by themselves might be a good idea. But I would not use another syntax for it with the A better way to do inline comments is to just say that comments end when you encounter another So like this:
Although this does make parsing harder because now you have to keep track of when you are in a comment. I also think that |
Bracketing comments between hash signs is a non-starter because it will break any comment with a I was trying to use a simple example to explain how a template writer could put units as comments after key names. There are more complicated key names than |
I have a meta question.
What should this get?
|
If this is asking for the output if the toml parser, even assuming tags were implemented, I would expect/hope that the output structure is still The point, or what I believe makes tags useful, is precisely that they don't change the structure. A number, that happens to be a unit of time, is still structurally a number (not a table, or a list) so if we want to keep the structure, but add the additional info of As is true for most current yaml parsers of docs with tags, the program still receives the plain/normal structure by default. For compatibility across toml parsers, it wouldn't make sense for toml to interpret the tags and manipulate the structure. If the program wants non-structural information whether it's tags or comments (for round-trip), it would make sense for that info to be a separate. E.g. doc = toml.parseDocument("thing.toml")
doc.data # { "size": 1 }
doc.tagForValue([ "size" ]) # "M"
doc.tagForKey(["size"]) # "K" Without tags, two programs must "just know" timeout is in seconds. Tags don't change the fundamental need of interpretation, both programs still need to "just know" (e.g. coordinate) that "ms" means milliseconds and not microseconds. But, on top of being human-visible, the difference is that it's easier for two programs to coordinate on what a "ms" tag means compared to coordinating on the interpretation of every single
So, if this is asking for the program output (instead of toml parser output), its like asking what units should the program get for It just doesn't matter, the program could interpret the 300 as an enum value, or as 300 degrees kelvin, or the timeout value could be entirely ignored. Same for the |
I think the real question is do the toml maintainers want to allow non-structural information? If yes, then a human-readable syntax can be debated (and probably solved), and a write-with-tag method can be devised. If no, then this issue should just be closed. |
@tintin10q I don't entirely agree with your take on non-structural information; my reasons would take too long to explain succinctly here. Bur with all due respect to @LongTengDao who opened this suggestion, we need to start fresh. Let's close this issue, and any of the various topics that we discussed here, if they're worth reintroducing, can be given better focus with new issues. |
Based on reviewing the discussion here, I don't think tag-style rich information is a good idea. Quoting from the objectives of the language:
Neither of these are feasible with tag information. You need to either (a) modify the serialised data or (b) provide tag-like information via a side-channel. Both of thsoe are no-gos from my perspective.
An error? I think any behaviour other than an error here is going to be non-trivial to explain.
I agree. If someone wants to pick out a specific piece from the discussions here, please open a new issue for that with a specific proposal for what you want to change (or at least specific usecases to focus on) so that we can have a less meandering discussion. :) As always, thanks for a productive discussion here folks! Even though the conclusion here seems to be "no action, and more discussion", a lot of what has been discussed here is quite useful. :) |
I don't mean the custom type syntax is a replacement of standard types. I am just wondering, maybe the exploration of de facto standards, will facilitate the development of standard types, with less discussion which hard to decide, and avoid these requirement become a dialect which will conflict with spec in the future?
The text was updated successfully, but these errors were encountered: