-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split units and values in type definitions #121
Comments
Parsing numbers and units from a string is trivial and performant in every programming language. I personally quite like the readability of having one string represent amount + unit, and it doesn’t impact tooling. So I think this hurts readability without adding any benefit (I don’t think this is easier to implement). Further, for duration, there’s only one unit (considering
In a design system, time and space are distinct concepts. Numbers can quantify both, but since they have no direct relationship I don’t think there’s a point in tokenizing the number (e.g. |
Agreed. I think that what the spec defines is our best option right now, unless maybe perhaps we could agree on something like "all durations shall be in milliseconds forever so let's specify durations as just a |
I want to drop a little “it depends” to this conversation, but more for consideration in future issues. Given the current character restrictions, I agree we should be able to reliably parse units from numbers in the JSON format.
I only want to highlight one exception, specific to this Design Tokens Format Module. These character restrictions do not pair well outside of quoted strings in non JSON file formats like CSS. Specifically, the reservation of As niche or hypothetical as this sounds, I believe it has already surfaced in the wild. We can see the parsing conflict in this video for a new and very impressive library that implements the Design Tokens Format Module in CSS (at 00:43:31). In that clip, there is a I think this is worth pointing out as folks experiment with DTFM outside of JSON. These are complex problems. And I have big time admiration for everyone I’ve just replied to or invoked. In fact, in the same clip (at 00:50:44), the same folks educate viewers on these same kinds of parsing issues. Y’all are good people who know your stuff. 🙂 |
Sorry, I don't follow how that really applies here. This spec isn't about defining tokens in CSS. If someone, like the people in that video, wants to invent their own method of specifying tokens in some CSS-like syntax, they're welcome to. But that would be a different format, not the one in this spec (albeit perhaps one meant to interoperate with this one 1:1)! And the author of that format could solve that problem however they wish*. This issue is saying "hey, parsing strings to get numbers is dumb; we should just separate the data in the format itself instead of parsing a string." If in the future there were some syntax where (*If I were inventing such a format, I would probably follow the pattern that |
No one said anything like this or used a tone like this. The main issue I see with this specification proposal is that it is focussed on creating an interface between multiple programs but too many provisions are made for reading and writing by humans. This does have a short term advantage which I fully understand.
This might be true in practice but this specification will still need to define exactly how values must be parsed. Any specification which has micro syntax does this to avoid interop issues. This places extra burden on editors of the spec and implementers. Avoiding micro syntaxes altogether and choosing to create a data format without any possible ambiguity is possible. I am only advocating for making the choice in favour of something that works best as an interface between multiple programs. |
Apologies; I didn't mean to imply that you were being rude. I hope no one else took it that way. I agree with your central points! Parsing strings to get numbers is d—to use nicer-sounding words, best avoided—for the reasons you list and more. But I think it makes the most sense for this spec only since, in my view, this format is already clearly leaning strongly toward human-editable rather than optimizing for machines. We agree on the "best" way, just not the best way for this particular spec. The format's design philosophy is perhaps something that needs to be more explicitly stated. If I've misinterpreted it and this is indeed a format primarily to be edited and read by code, then I will wholeheartedly agree with this issue and give it all my upvotes. |
I've spend more time considering this and still think it is best to avoid micro syntaxes if possible. My main concern is that regexp will be used or that we will see everyone writing custom parsers for these. |
@romainmenke said:
I agree with this concern. As written, I think the spec will need to define the grammar for each string field. For example, dimensions are currently limited to If we separate the number and units, we can fall back on JSON's number parsing. For example: |
Hi @romainmenke, there is actually an ISO format for expressing times, intervals, and durations: ISO 8601 For durations, here is the Wikipedia description that we also use in tools like Camunda. Is it set upfront that durations will always be expressed in milliseconds? |
I fully agree :) This issue specifically is focussed on splitting numbers and units into multiple JSON fields. Not about the actual notation of the number or unit part. |
@rdlopes said:
From https://tr.designtokens.org/format/#duration:
Probably forking the conversation, but if all durations are milliseconds, could the spec omit the |
@romainmenke @KyleWpppd agreed, will fork, out of scope here. |
I haven't written a translator, so admittedly I'm not the best judge of computational complexity. That said ... So far as I can tell in the current spec, for single-value (ie not composite) tokens, the
In the case of 3, it seems pretty straightforward to split the string on the first non-numeric character, resulting in the value and unit. The parser will need to use a regular expression to handle these cases, but it seems computationally cheap. To me, it's a good tradeoff for the better ergonomics of writing/editing. |
What about :
Is that :
or
A parser is not a regular expression. This is about simplicity. Inventing new micro syntaxes requires much more work to implement this. Having implemented multiple tokenizers and parsers I can tell you that the hand wavy list of rules above are completely insufficient for the purpose of extracting values from strings. A good reference to get a feel for the complexity involved is the CSS syntax specification. https://www.w3.org/TR/css-syntax-3/#tokenizer-algorithms All this complexity can be avoided by splitting units and numeric values in two fields in the JSON structure. |
I think this could be avoided by disallowing scientific notation — I can't think of any cases where values would be so large or small to make the tradeoff worth the complexity. |
That would mean that this specification needs to define its own number type completely.
* specifying what a number is can not be done be referring to an existing specification because the format is different. |
I don't see why that's the case, since we're discussing this in the context of spec-defined types like I should have been more clear when I said "disallowing scientific notation." Let me rephrase: According to the proposed definition of possible values for
Do you foresee this causing any problems? |
What if you want to express a string value that starts with I think this rule already is an oversimplification.
Ok
Negative numbers? What if a string literal starts with a number? How do I express string literals that contain only numbers? These examples mainly show that a dimension value can not be separated from its type field.
I think the main issue is vagueness. I can just pass
There are only two options :
A detailed specification includes error handling. |
Limiting the number type to make it easier to parse in implementations would also conflict with this resolution : #149 (comment) Easy editing by humans is now an important aspect of this specification. That number values from a different context can not be copy/pasted to a design token file doesn't help people edit these manually. If any program produces numbers with scientific notation it forces people to convert them first before using them in design token files. I would prefer it if this specification did not enforce arbitrary limits like that. |
Before going deeper down the rabbit hole, I want to enthusiastically agree that splitting numbers and units would make the spec simpler and make it easier for parsers to rely on languages' built-in types (like numbers and strings). On the other hand, I think we can agree that So, given that, along with the stated goal of making token files easy for people to edit, the 2 questions I'm trying to answer are:
Obviously my opinion on both of these is "yes," but as we go through @romainmenke 's excellent counterpoints I'm having doubts. But I do think it's worth continuing to iterate to see if we can get to clarity around the spec's currently-implied microformat before moving to splitting units and numbers. Ok, now down the rabbit hole.
Yes, good point. It might be productive to limit our discussion to types like
Another good point, my microformat proposal didn't account for negative numbers. I'll iterate at the end of this comment.
I think this hypothetical is a little too hypothetical. But I think it would be reasonable to understand If you want
That would be something like a (currently undefined)
Agree. I think we're discussing a microformat for any tokens that have numbers+units, not for every single type/token.
I agree that changing numbers from scientific notation is an extra step. I think we have to consider tradeoffs here: how often is someone copying-and pasting numbers into a token file from a tool that produces scientific notation? My estimation is that it is pretty rare. If we are making files harder to write and/or harder to parse to acommodate this workflow, is it worth it? Here's an updated proposal for the format based on these ideas: The
Here's a reference implementation: function parse(str) {
const regex = /^([-+]?\d*\.?\d+)+\s?([A-Za-z]+)$/;
const matches = str.match(regex);
if (!matches) {
return new Error('Input string does not follow the format');
}
const number = matches[1];
const unit = matches[2];
return { number, unit };
}
@romainmenke I appreciate you continuing to push on this and having an exacting attention to detail and edge cases. |
I would strongly advice not to use regexp for this, not even for a reference implementation. If I had a dime for each bug that existed because someone used regexp where they should have used a true tokenizer and parser :) Doing this work both in the specification and in implementations from day one will spare end users so much issues and bugs. The example with White space must not be allowed between a value and a unit.
I think it is confusing that there are arbitrary differences between a JSON number and a number that is part of a dimension. Why can't we adopt the definition of a number from JSON? https://www.json.org/json-en.html A parsing algorithm can use look ahead because values are contained within JSON fields. This algorithm assumes that the token type is It requires that the allowed units are known.
function parseUnitAndValue(input, allowedUnits) {
allowedUnits.sort((a, b) => b.length - a.length); // can be optimized by sorting outside this function
// 1. compare the end of the string value with the known units
const unit = allowedUnits.find((candidate) => {
return input.slice(input.length-candidate.length) === candidate;
});
// 2. if the string value does not end in any of the known units
if (!unit) {
// 2.a. this is parsing error
throw new Error('Parse error');
}
// 3. trim the unit from the string value
let inputWithoutUnit = input.slice(0, input.length - unit.length);
let value;
try {
// 4. parse the remaining string value as JSON
value = JSON.parse(inputWithoutUnit);
} catch (err) { // 5. if parsing as JSON fails
// for debugging maybe?
console.log(err);
// 5.a. this is parsing error
throw new Error('Parse error');
}
// 6. if the parsed value is not a number
if (typeof value !== 'number') {
// 6.a this is parsing error
throw new Error('Parse error');
}
// 7. return the parsed value as the `value` and the found unit as `unit`
return {
value: value,
unit: unit,
};
}
console.log(parseUnitAndValue('-10rem', ['em', 'rem', 'px'])); Implementers might prefer to use a regexp to find and trim the unit from the end. An algorithm of this kind has these benefits :
The downside is that it can only be applied if you already know that something is a unit and value tuple and what the allowed units are. If a field in a composite token allows multiple types then it requires a second algorithm. For this theoretical composite type :
valid:
This works well up until a new unit is added to This parsing algorithm doesn't have any issues for It fails to detect syntactically valid microsyntax with unknown units. This affects the parsing of composite tokens. A different approach would to require implementers to write a custom version of the JSON number parsing algorithm. can be found here : https://www.json.org/json-en.html
This places a much higher burden on implementers but it is able to distinguish these :
Being able to make that distinction makes it much easier to extend composite types in the future. (in the specification, not in implementations) I think it is important to constantly look ahead and balance these aspects :
Simply splitting the unit and value has non of these issues, challenges or drawbacks. |
A good way to get a sense of the complexity involved would be to explore how dimension tokens are parsed in CSS. In our tokenizer this is done here : https://github.com/csstools/postcss-plugins/blob/postcss-preset-env--v8/packages/css-tokenizer/src/consume/numeric-token.ts#L19 number specifically : https://github.com/csstools/postcss-plugins/blob/postcss-preset-env--v8/packages/css-tokenizer/src/consume/number.ts CSS is not JSON so the exact implementation is different. oversimplified :
|
Great points on the parsing algorithm. I think any recommendation by the spec here would be non-normative, but it's good to see how different strategies to parsing apply. Our microformat definition could be as simple as "A string containing a number (defined by JSON's Then, implementers can use whatever parsing algorithm they want. To state the issues:
So, if the spec is written so that the first type of algorithm can work for all cases, and as you said it's "simple to implement," then it seems like it might be a viable way to enable authors to write number+unit strings instead of breaking them out. |
I am now unsure if these are generally normative or not. Can they be normative without requiring implementations to follow them exactly?
These are all correct :) There are multiple ways to remove that ambiguity.
This has the extra nuance that this list is not a constant.
All tools do not update simultaneously and users of tools do not update all tools at the same time. This will easily create cases where It must be defined what tool Y must do when encountering I will open a separate issue to define a general error handling strategy. |
It may be of interest that the Typed OM JavaScript API separates units and values. |
Yes, any sane object model separates these :) |
On the topic of readability, the spec says “Groups are arbitrary and tools SHOULD NOT use them to infer the type or purpose of design tokens.” Could the spec be changed to allow people to define the type at the group level? Would that address the issues raised here but keep it readable? |
Sorry for pinging an old thread, but I’ve just put up a proposal ✅ accepting this original proposal: #244. This is NOT an official decision yet! This is only a way to source feedback, and push this forward. I tried to distill the thoughts & opinions expressed in this thread into a formal spec change (also, for the eagle-eyed, you may spot an earlier comment from me initially against this proposal. The arguments were compelling. I came around. People grow 🙂). Any/all feedback would be welcome. Thanks all for pushing for this change 🙏
|
I have some concerns regarding this proposal. It's simpler?For the tool to parse it, yes perhaps, but it's making the value more complex by making it of type object. Objects in objectsIf we make very simple token types object types (e.g. dimension, duration etc.), it means we get more instances where already composite type tokens will be allowed to contain nested objects: {
shadow_token: {
$value: {
"xOffset": {
"number": 0,
// is unit required? can it be left out if number is 0?
},
"yOffset": {
"number": 5,
"unit": "px"
}
}
}
} I'm not saying it's not technically feasible to deal with composite value nesting, there's already a case where this is in the spec: Multiple valuesI have a small concern related to #239 which is actually quite a similar issue to this issue, the summary is whether we want to allow array values e.g. for shadows, animations, gradients/colors, to be layered within a single design token, as this is often conceptually a single "design decision". That issue doesn't have much feedback at all yet but if it were to be received positively, it would maybe clash a bit with this proposal because both issues propose to complicate the token Summarizing, I feel like this proposal smells of over-engineering. I am not convinced that the benefits of splitting dimensions in syntax-tree like objects is worth it. The downsides are that tokens become more verbose/heavy, more complex (e.g. nested objects), less easy to author/read (if those are goals of the spec, perhaps debatable). My suggestion is to keep this open but to not push this forward until there is more of a consensus on the actual need of this to land in the spec. I just don't see the need right now. Edit: I don't mean to be harsh btw, I'm not absolutely/strongly against the proposal, just wanted to make sure I voice the thoughts I have against it (I think the arguments for it are clear already) |
Alternative idea (feel free to upvote/downvote, I'm impartial to this idea and won't be offended, just throwing it out there) Have the spec publish a regex pattern recommendation for tools to use for interpreting dimension tokens, e.g.: |
no, it is the opposite :) This proposal aims to use what is already there in JSON, what is freely and readily available in the underlying tech. Not splitting units and values into separate fields however might very well lead to over-engineering:
This proposal is to "under-engineer" the dimension type. I can understand and sympathize that an extra object that is visibly present in a tokens file is perceived as extra complexity. At the end of the day what I truly find important is that this specification is precise and exact and doesn't leave room for implementors to make the wrong assumptions. If this is done with a strict number, unit and parsing definitions, fine by me, but this is work that needs to be done. It can't be assumed that it is obvious how to parse Separating units and numbers adds perceived complexity but reduces very real specification and implementation complexity. |
We need to do all of these things regardless of whether the value is a string or an object so I'm not sure why you're bringing this up. Especially the first two, what does it matter if the unit/number are already split from one another, you have to decide whether you want to restrict the units/numbers regardless?
And that's exactly the mindset/philosophy that imo leads to over-engineering the spec. You can't realistically do this - pragmatism is just as important as correctness, if not more important.
As a specification implementor (style-dictionary, sd-transforms and various other tools that consume DTCG tokens), my experience is that I just don't agree with this statement. The current string value isn't complex 😅. I also disagree with @jonathantneal 's analysis, I don't see how their examples are relevant to the design token JSON format, which @TravisSpomer also pointed out. Edit: Let me be helpful and mention what would change my mind on this matter: real world use cases, that are actually applicable to design tokens in JSON, where string parsing dimensions to split value from unit is actually error-prone when done by a simple regex. |
This is untrue and a trivial example of that is the max value that can be expressed. If however a microsyntax is chosen you need to at least define how to parse a string into a number. The current specification doesn't do that. Keep in mind that not everything is JavaScript. Even without defining it you might get good results in your context, but the goal is that anyone can get the correct outcome.
I am not sure how to respond to that. A specification, by its very nature, is a document that helps and guides multiple implementors realize different implementations that still behave in the same way. A primary purpose of specifications is to have interop between various tools. This is not a mindset or philosophy... 😕 I also don't understand what isn't realistic about this. To clarify, such a description wouldn't tell you how to do it. A specification typically only describes the high level steps in pseudo code. It is up to you to actually write code that matches those high level steps. No specification tells you to over-engineer your code. I am not arguing against you or the use of a micro syntax, I am advocating for clarity, completeness, ease of implementation and interop. I am absolutely sure that an object notation is the most direct path towards those attributes, but I will follow whatever is decided by the spec editors. |
https://design-tokens.github.io/community-group/format/#duration
This assumes that all consumers agree on the unit for a given value type or forces implementers to do error prone parsing before passing on transformed values.
A better sub type might be a general
Dimension
typeThis reduces the amount of micro syntaxes in the spec and makes it easier to implement.
$value: "100ms"
becomes
$value: { "number": "100", "unit": "ms" }
The text was updated successfully, but these errors were encountered: