-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUGGESTION] Simplify how lists separators are used to simplify toolability (code generation and parsing) #115
Comments
Thanks! Summarizing, the suggestion is to allow a trailing I understand the convenience, I've encountered it occasionally... but the compiler will tell you if there's a stray comma (so AFAIK we don't need to teach anything about this today), and if we allow meaningless trailing commas won't that add a thing to document and explain (because people will ask)? Also: We have experience with this in various languages so we can see if that experience can guide us... are there any known downsides based on that experience? For example, JavaScript allows trailing commas in some places, but doesn't allow it in other places (e.g., when using a Re toolability: Every popular language and lots of popular tools deal with this for JSON, so is the lack of support for extra trailing commas in JSON a significant problem in practice for them? I'm not pushing back necessarily, it's easy to implement... I'm curious about the actual benefit and learning from the industry experience, and a little concerned that if I allow the extra comma then I may actually add a thing I have to teach (i.e., it may be an example where something "simpler" is not actually simpler to for users and tools), but I could be wrong so I'm interested in data. Thanks! |
From a toolability perspective, this:
becomes:
Which is certainly nicer. It can also significantly reduce the size of diffs, which is good for code review. This:
becomes this:
It's a more compelling argument for something like enum values than function parameters, but I see the value in keeping comma-separated lists consistent. If you're really concerned about explaining the "meaningless" comma, you could do like Go and make it mandatory. After all, we don't consider the last semi-colon in a function to be "meaningless". It does feel pretty icky to me to put a trailing comma after the last parameter in a single-line function declaration, although I admit that that's entirely based in cultural expectations about commas. |
@hsutter: About your concern about teaching, I consider that has of today it is a mess. You have to teach/learn why enum, initializer list, and possibly other I forgot allow trailing ',', but literal array, argument/parameter list, and other I forgot can't have one. You have to learn and understand why it was allowed (in some constructions, but not not in all them), mostly because of the interactions with the macro system. You have to teach/learn that when using ';' to terminate a declaration/expression, it is mandatory. Even if the compiler teach/guide the user, there is a learning curve for each constructions. With my proposition, it become a single point of teaching like: "A list of lexical elements has to be separated using a delimiter (',' or ';'). A trailing delimiter can be used to ease human expressiveness and tooling." About the javascript syntax, the only place where it is seems to be really forbidden is either:
About the JSON argument, I don't think it is a fair argument. What I mean, is that it is a very strict format meant for data interchange. The fact that it is human readable/editable, is only a side effect of the decision to base the format on a very small subset of javascript. At the end of the day, it means that humans are not the preliminary users of JSON, the machines are. In practice AFAIK, most JSON parsers that are targeted for humans inputs allow extra stuffs (mostly comments, and possibly others deviations). We design a language meant for users that ease tooling. So it is basically the other way around. It does not mean it has to be strict in some places. But that proposition tries to allow some flexibility for users that also benefits for tooling. So it seems to be a win/win situation. @Skrapion:
|
I feel like I recently learned that you can have trailing commas on an Let's say you (re)learn that trailing commas are a thing. You try it in some context and it doesn't work. So you forget that's a thing, because why bother with the mental burden, and move on. |
There seems to have been enough pressure to make some parsers accept it despite being forbidden in the spec: https://docs.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializeroptions.allowtrailingcommas?view=net-7.0 And to even make an alternative spec that focuses on human editing and maintenance of files (which are non-goals of JSON, but are very much goals of a programming language): https://json5.org/ Given that JSON itself doesn't have a versioning mechanism built in, the surprising thing to me would be for the spec to evolve over time in a way that makes currently-deployed parsers to break. |
@JohelEGP: Consider you delete the last line/entry of your enum, and the compiler didn't complained about the trailing comma. You learned "nothing", the compiler didn't complained, and it was feeling natural. In a stricter mode, the compiler complained, you have to come back to your edited line, see that you forgot to remove the trailing ',', edit the line and possibly lose 30s to 5 minutes understanding editing and restarting the compilation. Even with some fancy code editor, you loose time because you have to wait for the language server to kick and you have to triage what it spits out. |
Requests for trailing commas in a C++ json library (that was actually proposed for standardization at one point). |
I don't think json is a valid argument here at all. json is a communication format and thus lives in between many libraries and languages. allowing the comma would create a dialect and you would have separation, which is a very bad idea for a communication format imho. while I do think it would have been good to allow trailing commans in json I also think it should not change now. |
Thanks again, everyone. I've given this more thought and spent some time implementing some variations, and I'll keep this in mind going forward but for now I'm not going to make this change. I understand that will be disappointing, but FWIW here are my thoughts on it (trying not to repeat what I've already said above). I agree that allowing extra commas/semicolons has some advantages for editing code (including diffs) and generating code.
So, I do think I get it. ... However, I also think the advantage is minor:
On the other hand, I think that allowing empty expressions/statements and ignoring them:
That said, I'm willing to revisit this in the future based on more experience, it's something that can always be added later. But for the reason @JohelEGP gave, if I do allow extraneous commas/semis in the future, I will almost certainly insist on allowing them consistently everywhere, not just in some places. There would have to be an even stronger argument for allowing them in some places but not others. (*) Finally, and maybe I shouldn't point this out, but: For better or worse, at least Cpp2 is currently more consistent than today's Cpp1 because cppfront currently doesn't allow empty statements
I don't hit this often, but when I do it sometimes costs me way more time than it should in debugging because it takes a long time to see the problem. The above class of bugs isn't possible in Cpp2 because I require |
Thanks for giving it a honest try @hsutter . I'm curious if this finding you mention:
includes issues with commas (not semicolons). In my mind semicolons are a completely separate topic, as (1) they're not used to enumerate anything, (2) most modern languages (and importantly, efforts analogous to CppFront - Swift and Kotlin) have just let go of them. On the issue of removing material from guidelines, I think most veteran C++ programmers have encountered the "comma at the beginning of the line" guideline (or seen it applied by others). Yes, the lack of support of trailing commas by the language cannot cause bugs. But the guideline is still taught because no-trailing-commas causes unnecessary annoyance to yourself and your teammates. That piece of advice could be removed. |
@jcanizales Actually it was a semicolon. The first thing I hit was that my current declaration grammar is name which allows code like
because both of those things on the right of If I allowed empty statements and did nothing more, then this
would also be allowed, but wouldn't make sense (well, we could always make it mean something, maybe default construction, but that feels weird to me, especially to have that meaning in just one place unless we also allow assignment expressions like In short, one could view the above as an example of "allowing nulls [to be created] creates the problem that you then have to test for nulls [in probably more places, and later]"... :) (In this case a "null" statement.) One could also view it as an example of "allowing multiple spellings for the same thing can create ambiguities/inconsistencies/special-cases." |
Picking this up again after more than a year... I said:
As promised, I've kept thinking about this, and I've come around to agreeing with a lot of the above. Re allowing omitting trailing Re allowing adding redundant trailing What pushed me over the edge was @oldnewthing's especially persuasive recent post "On the virtues of the trailing comma", including the side-by-side language comparisons. After reading that, rereading the great comments in the thread above really resonate now. (Note that Raymond comes to the same conclusion about the So I'm about to push a commit that will make the second work. It's not everything this issue asked for, but it's a big part of it I think. Thanks again for the suggestion! And thanks for your patience as I kept thinking about it and getting used to the idea. This code will now work:
|
See also #115, and the comment there that goes with this commit: #115 (comment)
Nice to hear that.
While it's true it is not a full implementation of what I originally
proposed, it is a nice improvement in that direction.
We will see if at the end of your journey, if you reach the same conclusion
than me (and others). I can't force you to believe me, but I toyed enough
with a personal language to see the value of the full proposition.
|
Thanks. I'm open to being convinced, and I want to learn from your experience with that language, so let me ask... BenefitsAs I understand it, a (the?) major advantage of allowing adding an "extra" trailing delimiter is to make code more robust to change. It does that by:
For example, given:
So my question is, in your experience: Aren't those weaknesses of allowing omitting a trailing delimiter? For example (as Raymond's article points out in the last postscript):
Two perspectives, subtly differentWith those examples in mind, I think what you wrote here might capture the difference well:
I understand that perspective of "always optional." From that view, the final The perspective I'm coming from is (subtly) the opposite, "always allowed." From that view, the final Do you see the difference? How that that resonate with you? Language evolution, and closing doorsSpecifically for omitting
So one concern I have with doing (1) (allowing omission as innocuous) today is that it actively closes the door to doing (2) (giving omission a meaning) in the future, because allowing (1) now means any code that uses it will be broken if we ever changed to (2). Does that make sense? That's a brain dump, anyway... How does that match with your experience in your language, and what users reported from using it? I appreciate the feedback and insights! |
I've now captured this at Design note: Commas. There, I also added: "Additionally, Cpp2 supports reflection and code generation, which is done by source code generation. Allowing trailing commas in lists makes it easier to generate source code without special cases." |
From my small experience and understand of it, I would say that the main
advantage is that it makes the writer lazy.
When editing a "list of things" you don't have to think "I'm writing a foo
list, I MUST add/omit the closing delimiter". Consequently, naturally we
start to over add a closing delimiter. And since it naturally appear, it
has consequences on styles and diff.
The fact that it is optional, makes it compatible with old style of
writing, making it transparent for old code/writers. But it also enable
these benefits for users that are aware of them, and allow old users to
learn them by error "transparently".
Since it does only really contradict with the empty entry case, it usually
does not matter 99,99.. % of the time. Unless we start to do pattern
recognition/application, it should not really matter. And if there is a
compelling need for an empty expression, a new dedicated token, or in the
worse case some kind of cast to void... (Or be able to construct a void but
that is a completely different topic)
|
One particularly nasty example of where things can go wrong is with string
literals.
const char *foo[] = {
"bar",
"baz"
};
then you hastily add one more:
const char *foo[] = {
"bar",
"baz"
"qux"
};
and it still compiles but it doesn't mean what you intended.
Because of adjacent string literal concatenation, it's {"bar", "bazqux"}.
Michael
…On Sat, Mar 16, 2024 at 1:29 AM Herb Sutter ***@***.***> wrote:
Picking this up again after more than a year... I said:
I'll keep this in mind going forward but for now I'm not going to make
this change.
As promised, I've kept thinking about this, and I've come around to
agreeing with a lot of the above.
*Re allowing omitting trailing ;:* I'm still concerned that allowing
omitting a trailing ; is problematic. Though I have to acknowledge one
caveat: In the meantime I've actually added that the tersest function
expressions which have now allowed that for a while (e.g., for_each( v,
:(e) use(e) )). I'll still keep thinking about it, including an optional
semicolon on the last statement of a braced function body (including for
@enum).
*Re allowing adding redundant trailing ,:* I'm sold. I now think allowing
this has value, and I no longer believe it's two ways to say the same
thing. Especially persuasive was @oldnewthing
<https://github.com/oldnewthing>'s recent post "On the virtues of the
trailing comma"
<https://devblogs.microsoft.com/oldnewthing/20240209-00/?p=109379>,
including the side-by-size language comparisons. After reading that,
rereading the great comments in the thread above really resonate now.
So I'm about to push a commit that will make the second work. It's not
everything this issue asked for, but it's a big part of it I think.
Thanks again for the suggestion! And thanks for your patience as I kept
thinking about it and getting used to the idea.
This code will now work:
f: (a, b, ) a + b;
g: <T, U, > (a: T, b: U) a + b;
doubler: (a: int,) -> (i : int,) = {
i = a * 2;
}
vals: @struct type = { i: int; }
main: () -> int = {
(copy a := 42,) while false { a++; }
_ = g(1, 2,);
grouping: std::initializer_list = (0, 1, 2,);
array: std::array = (0, 1, 2,);
v: vals = 21;
v.i;
_ = array;
_ = grouping;
}
—
Reply to this email directly, view it on GitHub
<#115 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAW6MPCIYEVXYXBPDJEPXDDYYPKCZAVCNFSM6AAAAAAR7J3OOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBRG4ZTOMBXG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
This is probably the subject of another issue. But this makes me wonder if
the concatenation of litteral string is still necessary after the removal
of macro's.
|
You used
|
Yes, it is. This functionality was not included originally and then added later due to discussions in this repo. I don't have a pointer handy to share with you, but if you're interested, I can probably dig them up. |
As far as "should the trailing ; in a block be optional", I would say that if it's not, then there is an inconsistency between expression lists and statement lists. These are consistent, there is one item/statement, and one
These are inconsistent, there is one item/statement, and one
I expect that this will be the primary use case for "leave off the trailing Now imagine if we hadn't excluded the
Well, if we can already leave off off the
A final bit on consistency:
All of these forms would be valid. I expect that people WILL do |
Almost: String literal concatenation has been requested, and I personally would find it useful in
I agree that's a consequence of "always optional"... but please see this comment above and the design note again: I'm arguing that there are two (subtly) opposite★ ways to look at this that lead to different conclusions, "always optional" and "always allowed," and I agree with what you say about the former, but I'm arguing instead for (and implemented) the latter. ★ BTW, I totally realize it's a subtle viewpoint difference, which is why I've been calling it "(subtly) opposite"... and the irony of those two words is not lost on me, it's been jarring to me every time I've written it in this thread, because normally "subtly" and "opposite" don't go together (and I've kept resisting making the note longer by pointing that out, but this has pushed me over the edge 😄 ). Does that help? |
Thanks for the clarification. I thought it was done after that conversation.
Yes, I understand. I am arguing that "always allowed" still results in an undesired inconsistency, and am arguing for "always optional" to resolve that inconsistency. We've moved from "two of the four are allowed" to "three of the four are allowed", and I think that we should now move to "all four are allowed". |
Don't all or most C++ style guides forbid multi-statement lines? Can't cppfront just drop the |
I don't know. I don't generally use them, but it's certainly possible to do it if it's fairly simple.
I'm not sure that it's always unambiguous. I'm sure we can come up with cases where the |
Here's the discussion I was thinking of: #861 |
I guess the reason in my mind the |
Line breaks happen in auto-formatted code with long lines. |
Brief answers as we're about to resume meeting sessions:
Yes, Raymond alludes to this at the end of his article in pointing out that when Pascal allows omitting the last Re long lines: There are LOTS of long lines. For example, any line that contains a lambda... we don't want that all on one line, right? If we did that, soon we'd ask for meaningful whitespace indentation... speaking of which... Re dropping |
Of topic, but why the operator + cannot also be used for litteral strings?
I mean the concatenation operation was always it for std::string, and it
always bugged me that 2 consecutive litteral string were automatically
concatenated without operator.
|
This would be on-topic for #861
It can if they're made into
You could think of it not as two consecutive literal strings but as a way to break a long string literal across multiple lines. This ability comes from C, and far predates |
Please continue resisting! |
I suggest looking into how languages that have dropped the
Usually, but not necessarily. And in particular, none of the three languages above are indentation-significant. Correlation is not causation here. I consider these three the analogues to Cpp2 because they successfully did for Objective-C, Java, and JavaScript what you're now trying to do for C++. |
Ah, thanks! That's good information, appreciated. One reason I went into more detail in my answer was to provide enough of the thinking and background that folks could point out if I was missing something. Now that you mention it, yes Swift and Kotlin and TypeScript do use both braces and optional semicolons. Doing some brief googling, one of the first things I found for TS was StackOverflow questions about semicolons that referenced the TypeScript Deep Dive which includes this section:
Disclaimer: I don't know how popular/authoritative that document is, and I haven't asked the TS language designers this semicolons question. I'll try to remember the next time I talk to them. So at least Pascal and TypeScript allow allow omitting But you've given me great things to think about and look into more, and I will. Thanks! |
The code is basically only lists of consecutive elements. The idea behind this suggestion is to try to only have a more uniform syntax for all of them, to help laziness for users and simplify tooling.
There are basically 3 forms of lists separators in code:
While these have their purpose, they are separated cases that needs to be learned for each usage. This adds a small cognitive burden and represent a learning curve.
In case of code generations, it means, that we also have to 2 different ways to generate theses lists. A simple/dumb
for
loop cannot be used in the case of in between separators, there is always a small exceptions to skip one separator.For maximum compatibility, I suggest that all the grammatical lists are "in between separator with an optional trailing separator". (The only exception left should be the statements block, since it does "replace" a
;
in most cases).When writing/generating it adds laziness, and generalize the trend that was introduced with (at least) the
enum
case.Parsing is now a little bit more complex (since there are now always the possibility of a trailing separator). But since the construction is more generalized, vectorization of grammar parsing is even more encouraged.
Since the old macro usage is banned for the future, it should not conflict with that proposition. It was the only usage (that i can think of) that would produce a different number of argument in case of the presence of a blank/empty argument.
The others syntax could be chosen, but they are more strict and does not encourage expressiveness. While the
enum
case was most probably to simplifyenum
interaction with macros, it shows a trends for users to (de)activate list parts using comments in live coding. When producing diff (like with git), it should/does reduce the noise that would be induced if the in between separator syntax was chosen.It would allow something like:
While it is not a syntax that I would encourage, in case of diffs it would allow things like (with a specific style format):
It should help in some cases to make diffs less busy, ease readability (and therefore help to focus more on what really change).
PS: Sorry to propose a small syntax change. But considering how it can help to simplify the tooling, I think it is worth giving a try.
The text was updated successfully, but these errors were encountered: