Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUGGESTION] Simplify how lists separators are used to simplify toolability (code generation and parsing) #115

Closed
mhermier opened this issue Nov 14, 2022 · 34 comments
Assignees

Comments

@mhermier
Copy link

The code is basically only lists of consecutive elements. The idea behind this suggestion is to try to only have a more uniform syntax for all of them, to help laziness for users and simplify tooling.

There are basically 3 forms of lists separators in code:

  • The in between separator (like the argument/parameter separator)
  • The in between separator with an optional trailing separator (like enum declarations)
  • The mandatory terminal separator (like statements blocks with some exception)

While these have their purpose, they are separated cases that needs to be learned for each usage. This adds a small cognitive burden and represent a learning curve.

In case of code generations, it means, that we also have to 2 different ways to generate theses lists. A simple/dumb for loop cannot be used in the case of in between separators, there is always a small exceptions to skip one separator.

For maximum compatibility, I suggest that all the grammatical lists are "in between separator with an optional trailing separator". (The only exception left should be the statements block, since it does "replace" a ; in most cases).
When writing/generating it adds laziness, and generalize the trend that was introduced with (at least) the enum case.
Parsing is now a little bit more complex (since there are now always the possibility of a trailing separator). But since the construction is more generalized, vectorization of grammar parsing is even more encouraged.

Since the old macro usage is banned for the future, it should not conflict with that proposition. It was the only usage (that i can think of) that would produce a different number of argument in case of the presence of a blank/empty argument.

The others syntax could be chosen, but they are more strict and does not encourage expressiveness. While the enum case was most probably to simplify enum interaction with macros, it shows a trends for users to (de)activate list parts using comments in live coding. When producing diff (like with git), it should/does reduce the noise that would be induced if the in between separator syntax was chosen.

It would allow something like:

doubler: (a: int,) -> (i : int,) = {
    i = a * 2;
    return
}

main: () -> int = {
    grouping := (0, 1, 2,);
    array := [0, 1, 2,];
    v := vals(21);
    v.i
}

While it is not a syntax that I would encourage, in case of diffs it would allow things like (with a specific style format):

 myfunction(
     a : _,
     b : _,
-    obsolete_parameter: _
   ) -> (
    i : _,
+   super_important_parameter: _,
   ) = {
    ...
  }

It should help in some cases to make diffs less busy, ease readability (and therefore help to focus more on what really change).

PS: Sorry to propose a small syntax change. But considering how it can help to simplify the tooling, I think it is worth giving a try.

@hsutter hsutter self-assigned this Nov 26, 2022
@hsutter
Copy link
Owner

hsutter commented Nov 26, 2022

Thanks! Summarizing, the suggestion is to allow a trailing , in a list before the closing ), and treat it as whitespace/innocuous, correct?

I understand the convenience, I've encountered it occasionally... but the compiler will tell you if there's a stray comma (so AFAIK we don't need to teach anything about this today), and if we allow meaningless trailing commas won't that add a thing to document and explain (because people will ask)?

Also: We have experience with this in various languages so we can see if that experience can guide us... are there any known downsides based on that experience? For example, JavaScript allows trailing commas in some places, but doesn't allow it in other places (e.g., when using a ... element), and JSON which is derived from JavaScript doesn't allow trailing commas at all, and it's a super popular interchange format used by (and in) many different languages and yet there doesn't seem to have been pressure to support trailing returns in JSON. Do we know why JSON chose not to support them, and hasn't felt pressure to add them?

Re toolability: Every popular language and lots of popular tools deal with this for JSON, so is the lack of support for extra trailing commas in JSON a significant problem in practice for them?

I'm not pushing back necessarily, it's easy to implement... I'm curious about the actual benefit and learning from the industry experience, and a little concerned that if I allow the extra comma then I may actually add a thing I have to teach (i.e., it may be an example where something "simpler" is not actually simpler to for users and tools), but I could be wrong so I'm interested in data. Thanks!

@Skrapion
Copy link

Skrapion commented Nov 26, 2022

From a toolability perspective, this:

isFirst = true
foreach paramName
  if isFirst
    print ",\n"
    isFirst = false
  print paramName
print "\n"

becomes:

foreach paramName
  print paramName + ",\n"

Which is certainly nicer.

It can also significantly reduce the size of diffs, which is good for code review. This:

- Previous last param
+ Previous last param,
+ New last param

becomes this:

+ New last param,

It's a more compelling argument for something like enum values than function parameters, but I see the value in keeping comma-separated lists consistent.

If you're really concerned about explaining the "meaningless" comma, you could do like Go and make it mandatory. After all, we don't consider the last semi-colon in a function to be "meaningless".

It does feel pretty icky to me to put a trailing comma after the last parameter in a single-line function declaration, although I admit that that's entirely based in cultural expectations about commas.

@mhermier
Copy link
Author

@hsutter:
Your summary is only a fraction of what I propose. In the state of the language, there are 2 delimiters/separators: ',' and ';'. What I suggest is too formalize that, if there is no ambiguity (and this has to be checked/understood/evaluated), a trailing separator is always optional in any list of any kind (even in the unlikely event that a new delimiter would be added in the future). It means that a list of members of a struct, a block of expressions, would allow to also have an optional trailing ';'.

About your concern about teaching, I consider that has of today it is a mess. You have to teach/learn why enum, initializer list, and possibly other I forgot allow trailing ',', but literal array, argument/parameter list, and other I forgot can't have one. You have to learn and understand why it was allowed (in some constructions, but not not in all them), mostly because of the interactions with the macro system. You have to teach/learn that when using ';' to terminate a declaration/expression, it is mandatory. Even if the compiler teach/guide the user, there is a learning curve for each constructions. With my proposition, it become a single point of teaching like: "A list of lexical elements has to be separated using a delimiter (',' or ';'). A trailing delimiter can be used to ease human expressiveness and tooling."

About the javascript syntax, the only place where it is seems to be really forbidden is either:

  • (,) which logically makes no sense. If it would have a meaning, it would means that the empty element would have a meaning, making the rest of usages collapse.
  • (...foo,) which seems to be an arbitrary decision. In C++ templates, there is a counter argument, in the sense that template <typename ...RetArgs, typename ...CallArgs> std::tuple<RetArgs...> foo(CallArgs...); is a valid declaration. It is not exactly the same, but I think it proves it can be legal if we say so. So, unless I miss something, I think it is just an arbitrary decision.

About the JSON argument, I don't think it is a fair argument. What I mean, is that it is a very strict format meant for data interchange. The fact that it is human readable/editable, is only a side effect of the decision to base the format on a very small subset of javascript. At the end of the day, it means that humans are not the preliminary users of JSON, the machines are. In practice AFAIK, most JSON parsers that are targeted for humans inputs allow extra stuffs (mostly comments, and possibly others deviations).

We design a language meant for users that ease tooling. So it is basically the other way around. It does not mean it has to be strict in some places. But that proposition tries to allow some flexibility for users that also benefits for tooling. So it seems to be a win/win situation.

@Skrapion:
That culture is based on 3 fact I think:

  • We try to replicate human grammar, an expression is a sentence. So like a phrase ends with a '.' an expression must be terminated with a ';'. But a language does not need to be that way to be non ambiguous.
  • The fact that an empty entry can mean something. This was the case with the C macro system. There is nothing nothing fundamentally wrong about that idea. But when it was expanded, that system started to show some fatal flaws (basically because invoking FOO() as 0 and 1 argument at the same time)
  • Because of the second argument, a conservative approach that was probably taken (because we can't envision the future). By being strict, any debate is eliminated and it becomes an historical way of doing. But the patching done to the language that was done to enum and probably other constructs, seems to proves that it was possibly not the only/right way to do. (Maybe there is a mathematical reason behind that, but it can probably be slightly modified/expanded/corrected)

@JohelEGP
Copy link
Contributor

You have to teach/learn why enum, initializer list, and possibly other I forgot allow trailing ',', but literal array, argument/parameter list, and other I forgot can't have one.

I feel like I recently learned that you can have trailing commas on an enum. Probably because I consistently omit them as that's what works everywhere. And so does the code I usually deal with.

Let's say you (re)learn that trailing commas are a thing. You try it in some context and it doesn't work. So you forget that's a thing, because why bother with the mental burden, and move on.

@jcanizales
Copy link

there doesn't seem to have been pressure to support trailing returns in JSON. Do we know why JSON chose not to support them, and hasn't felt pressure to add them?

There seems to have been enough pressure to make some parsers accept it despite being forbidden in the spec: https://docs.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializeroptions.allowtrailingcommas?view=net-7.0

And to even make an alternative spec that focuses on human editing and maintenance of files (which are non-goals of JSON, but are very much goals of a programming language): https://json5.org/

Given that JSON itself doesn't have a versioning mechanism built in, the surprising thing to me would be for the spec to evolve over time in a way that makes currently-deployed parsers to break.

@mhermier
Copy link
Author

@JohelEGP: Consider you delete the last line/entry of your enum, and the compiler didn't complained about the trailing comma. You learned "nothing", the compiler didn't complained, and it was feeling natural. In a stricter mode, the compiler complained, you have to come back to your edited line, see that you forgot to remove the trailing ',', edit the line and possibly lose 30s to 5 minutes understanding editing and restarting the compilation. Even with some fancy code editor, you loose time because you have to wait for the language server to kick and you have to triage what it spits out.

@gregmarr
Copy link
Contributor

gregmarr commented Nov 29, 2022

Requests for trailing commas in a C++ json library (that was actually proposed for standardization at one point).
nlohmann/json#1787
nlohmann/json#1429
nlohmann/json#150

@maddanio
Copy link

I don't think json is a valid argument here at all. json is a communication format and thus lives in between many libraries and languages. allowing the comma would create a dialect and you would have separation, which is a very bad idea for a communication format imho. while I do think it would have been good to allow trailing commans in json I also think it should not change now.
an argument for allowing trailing commas imho is easier editability of lists, and allowing easy swapping around

@hsutter
Copy link
Owner

hsutter commented Dec 22, 2022

Thanks again, everyone.

I've given this more thought and spent some time implementing some variations, and I'll keep this in mind going forward but for now I'm not going to make this change. I understand that will be disappointing, but FWIW here are my thoughts on it (trying not to repeat what I've already said above).

I agree that allowing extra commas/semicolons has some advantages for editing code (including diffs) and generating code.

  • For editing, I've felt that inconvenience myself when editing multiline member initializer lists in today's C++, and it's why I generally put the commas first in such lists in today's C++.
  • For code generation, well cppfront itself is all about that :) and yes I've felt the "getting the commas right" work when I generate Cpp1 code in this compiler.

So, I do think I get it. ... However, I also think the advantage is minor:

  • For editing, compilers do warn/error when writing an extra comma. That's why this doesn't appear in coding guidelines - because you can't make the mistake, and we virtually never need to publish a guideline about code the compiler rejects (except in rare cases where it's a surprise and the reason isn't obvious, but in this case the reason and diagnostic are clear and direct).
  • For code generation, I don't find it to be a significant issue, and even if Cpp1 did allow extraneous commas/semis so that I wouldn't have to worry about suppressing extra ones in the generated Cpp1 code, I still have to do basically the same thing for other cases even in cppfront. For example, cppfront has similar code for natural language support to get the English grammar right when I format lists of line numbers in error messages, such as to add "s" if there's more than one element in the list, and to add commas and "and" in lists of multiple elements. See for instance load.h:264-283; when reporting unmatched braces' line numbers I emit "line 1", "lines 1 and 2", "lines 1, 2, and 3" etc. (Yes, I currently emit an Oxford comma. There's no perfect "allow redundant commas" or "disallow redundant commas" answer even for English, both create different potential problems.)

On the other hand, I think that allowing empty expressions/statements and ignoring them:

  • Can be a potential pitfall, because if the programmer writes a stray comma/semi but intended to write something more, we now can't diagnose that if we allow and ignore the stray comma/semi.
  • Can make things less consistent. As I was trying out an implementation, I did come across cases where this would have made my own parsing (and potentially the Cpp2 programmer's code reading) harder because allowing a grammar element to have an extraneous extra comma/semicolon in one place turned out to create a problem where the same grammar element was used in another place. (*)
  • And therefore it can increase, rather than decrease, what we have to teach for the reason @JohelEGP said:

Let's say you (re)learn that trailing commas are a thing. You try it in some context and it doesn't work. So you forget that's a thing, because why bother with the mental burden, and move on.

That said, I'm willing to revisit this in the future based on more experience, it's something that can always be added later. But for the reason @JohelEGP gave, if I do allow extraneous commas/semis in the future, I will almost certainly insist on allowing them consistently everywhere, not just in some places. There would have to be an even stronger argument for allowing them in some places but not others.


(*) Finally, and maybe I shouldn't point this out, but: For better or worse, at least Cpp2 is currently more consistent than today's Cpp1 because cppfront currently doesn't allow empty statements ;. I realize that those who would like more empty commas/semicolons will view this as a regression from C and C++! But then I will point out that today's empty ; statements is a known bug farm that people do hit by accident and we have to learn to avoid:

// this compiles cleanly, but doesn't do what the programmer intended
if( some long condition goes here && more things || maybe some exceptional clauses );
{
    oops, this code gets executed even when condition is false
}

I don't hit this often, but when I do it sometimes costs me way more time than it should in debugging because it takes a long time to see the problem.

The above class of bugs isn't possible in Cpp2 because I require { } around all branch bodies (which also allows making the ( ) around the condition optional) and I don't allow empty ; statements... and I like the fact that I can't write such bugs in Cpp2.

@hsutter hsutter closed this as completed Dec 22, 2022
@jcanizales
Copy link

Thanks for giving it a honest try @hsutter .

I'm curious if this finding you mention:

I did come across cases where this would have made my own parsing harder because allowing a grammar element to have an extraneous extra comma/semicolon in one place turned out to create a problem where the same grammar element was used in another place.

includes issues with commas (not semicolons). In my mind semicolons are a completely separate topic, as (1) they're not used to enumerate anything, (2) most modern languages (and importantly, efforts analogous to CppFront - Swift and Kotlin) have just let go of them.

On the issue of removing material from guidelines, I think most veteran C++ programmers have encountered the "comma at the beginning of the line" guideline (or seen it applied by others). Yes, the lack of support of trailing commas by the language cannot cause bugs. But the guideline is still taught because no-trailing-commas causes unnecessary annoyance to yourself and your teammates. That piece of advice could be removed.

@hsutter
Copy link
Owner

hsutter commented Dec 22, 2022

@jcanizales Actually it was a semicolon. The first thing I hit was that my current declaration grammar is

name : type = statement

which allows code like

x: std::string = "xyzzy";

f: () = { do_stuff(); }

because both of those things on the right of = are valid statements.

If I allowed empty statements and did nothing more, then this

x: std::string = ;

would also be allowed, but wouldn't make sense (well, we could always make it mean something, maybe default construction, but that feels weird to me, especially to have that meaning in just one place unless we also allow assignment expressions like a = ; which is weirder still). So I'd want to either complicate the grammar to eliminate this as a legal parse, else add a semantic rule to reject it (i.e., reject it later in compilation, and then make sure that in between the rest of the compiler correctly handles/ignores that new 'empty' case -- i.e., it can create more downstream work). Not the end of the world, but it's an example I encountered.

In short, one could view the above as an example of "allowing nulls [to be created] creates the problem that you then have to test for nulls [in probably more places, and later]"... :) (In this case a "null" statement.)

One could also view it as an example of "allowing multiple spellings for the same thing can create ambiguities/inconsistencies/special-cases."

@hsutter
Copy link
Owner

hsutter commented Mar 16, 2024

Picking this up again after more than a year... I said:

I'll keep this in mind going forward but for now I'm not going to make this change.

As promised, I've kept thinking about this, and I've come around to agreeing with a lot of the above.

Re allowing omitting trailing ;: I'm still concerned that allowing omitting a trailing ; is problematic. I do have to acknowledge one caveat: In the meantime I've actually added that for the tersest function expressions (e.g., for_each( v, :(e) use(e) )). But omitting the trailing ; statement terminator seeks to work against easily reordering lines of code and minimizing diffs, rather than for it. (See also the article reference below.)

Re allowing adding redundant trailing ,: I'm sold. It's not problematic, and it has greater value than omitting the ; because it works for easily reordering lines of code and minimizing diffs. I no longer believe allowing a redundant , is two ways to say the same thing, it's just part of the one way to say something, and still in the spirit of "omit the parts of the syntax you aren't currently using" in that we're omitting adding another list entry we don't need but the comma is still the same syntax we'd use if we didn't omit it; we can now just omit the next entry, and separately also its comma.

What pushed me over the edge was @oldnewthing's especially persuasive recent post "On the virtues of the trailing comma", including the side-by-side language comparisons. After reading that, rereading the great comments in the thread above really resonate now. (Note that Raymond comes to the same conclusion about the ; statement terminator not being optional, though.)

So I'm about to push a commit that will make the second work. It's not everything this issue asked for, but it's a big part of it I think.

Thanks again for the suggestion! And thanks for your patience as I kept thinking about it and getting used to the idea.

This code will now work:

f: (a, b, ) a + b;

g: <T, U, > (a: T, b: U) a + b;

doubler: (a: int,) -> (i : int,) = {
    i = a * 2;
}

vals: @struct type = { i: int; }

main: () -> int = {
    (copy a := 42,) while false { a++; }
    _ = g(1, 2,);

    grouping: std::vector = (0, 1, 2,);

    array: std::array = (0, 1, 2,);

    _ = array;
    _ = grouping;
}

hsutter added a commit that referenced this issue Mar 16, 2024
See also #115, and the comment there that goes with this commit: #115 (comment)
@mhermier
Copy link
Author

mhermier commented Mar 16, 2024 via email

@hsutter
Copy link
Owner

hsutter commented Mar 16, 2024

Thanks. I'm open to being convinced, and I want to learn from your experience with that language, so let me ask...

Benefits

As I understand it, a (the?) major advantage of allowing adding an "extra" trailing delimiter is to make code more robust to change. It does that by:

  • enabling reordering entire lines of code for simpler refactoring and maintenance
  • minimizing diffs and merge conflicts for simpler maintenance

For example, given:

data1: vector = (
    111,
    -99,
    42,
);

data2: vector = (
    111,
    -99,
    42
);
  • to change the value order, data1 can always just reorder whole lines, data2 can't
  • to append a new value, data1 can just add a new whole line without changing any existing lines, data2 can't
  • if two commits each append a new value, in data1 it's an easy merge conflict to resolve by accepting both edits in full, in data2 it requires hand-editing

Also: In languages like C++ that only allow the data2 form in a given place, there's pressure to put the separator on the following line to avoid the problem:
mytype::mytype()
: member1{ value1 }
, member2{ value2 }
, member2{ value3 }

So my question is, in your experience: Aren't those weaknesses of allowing omitting a trailing delimiter? For example (as Raymond's article points out in the last postscript):

func: () = {
    x = 1;
    y = 2;
    z = 3;
}

gunc: () = {
    x = 1;
    y = 2;
    z = 3
}
  • to change the order of lines of code, func can always just reorder whole lines, gunc can't
  • to append a new statement, func can just add a new whole line without changing any existing lines, func can't
  • if two commits each append a new statement, in func it's an easy merge conflict to resolve by accepting both edits in full, in func it requires hand-editing

Two perspectives, subtly different

With those examples in mind, I think what you wrote here might capture the difference well:

What I suggest is too formalize that, if there is no ambiguity (and this has to be checked/understood/evaluated), a trailing separator is always optional

I understand that perspective of "always optional." From that view, the final , or ; can always be omitted, so we would make it optional in grammar productions that currently require it.

The perspective I'm coming from is (subtly) the opposite, "always allowed." From that view, the final , or ; can always be added, so we would make it allowed in grammar productions that currently don't allow it.

Do you see the difference? How that that resonate with you?

Language evolution, and closing doors

Specifically for omitting ; on the final statement/expression of a function body, my understanding is that experience with the feature in other major languages that allow that is:

  1. In languages where omitting it is innocuous (usually because the language has function bodies that contain a list of expressions, not statements), the feature isn't generally used. As Raymond pointed out, Pascal allows the gunc-like form, but Pascal programmers generally don't use it -- as far as I know?
  2. In languages where omitting it is meaningful (changes the meaning of the code), typically it's to make the final statement be really an expression which is the implicit return-expression of the function.

So one concern I have with doing (1) (allowing omission as innocuous) today is that it actively closes the door to doing (2) (giving omission a meaning) in the future, because allowing (1) now means any code that uses it will be broken if we ever changed to (2). Does that make sense?

That's a brain dump, anyway... How does that match with your experience in your language, and what users reported from using it? I appreciate the feedback and insights!

@hsutter
Copy link
Owner

hsutter commented Mar 17, 2024

I've now captured this at Design note: Commas.

There, I also added: "Additionally, Cpp2 supports reflection and code generation, which is done by source code generation. Allowing trailing commas in lists makes it easier to generate source code without special cases."

@mhermier
Copy link
Author

mhermier commented Mar 18, 2024 via email

@MichaelCook
Copy link

MichaelCook commented Mar 18, 2024 via email

@mhermier
Copy link
Author

mhermier commented Mar 18, 2024 via email

@gregmarr
Copy link
Contributor

I've now captured this at Design note: Commas.

There, I also added: "Additionally, Cpp2 supports reflection and code generation, which is done by source code generation. Allowing trailing commas in lists makes it easier to generate source code without special cases."

You used func for both cases in both of these statements. One of each should be gunc.

  • to append a new statement, func can just add a new whole line without changing any existing lines, func can't
  • if two commits each append a new statement, in func it's an easy merge conflict to resolve by accepting both edits in full, in func it requires hand-editing

@gregmarr
Copy link
Contributor

This is probably the subject of another issue. But this makes me wonder if the concatenation of litteral string is still necessary after the removal of macro's.

Yes, it is. This functionality was not included originally and then added later due to discussions in this repo. I don't have a pointer handy to share with you, but if you're interested, I can probably dig them up.

@gregmarr
Copy link
Contributor

As far as "should the trailing ; in a block be optional", I would say that if it's not, then there is an inconsistency between expression lists and statement lists.

These are consistent, there is one item/statement, and one ;:

    x := { 1 };
    f : (e) use(e);
    g : () = { return 1 };

These are inconsistent, there is one item/statement, and one ; in two cases and two in the other:

    x := { 1 };
    f : (e) use(e);
    g : () = { return 1; };

I expect that this will be the primary use case for "leave off the trailing ;".

Now imagine if we hadn't excluded the ; at the end of a terse function:

   f : (e) use(e); /* this ; ends the function declaration */ ; /* this ; ends the declaration of f */

Well, if we can already leave off off the ; with terse function syntax, why not write your function in terse syntax? You can only leave it off for a deduced return type. You can also only do it for a single statement. You might have a "one-liner" that needs two statements but is still small enough to be written in one line:

    e.transform(:(e) = { [auto x, _] = do_something(e); return x }); 

A final bit on consistency:

func: () = {
    x = 1;
    y = 2;
    z = 3;
}

gunc: () = {
    x = 1;
    y = 2;
    z = 3
}

bar := (
    1,
    2, 
    3,
);

baz := (
    1,
    2,
    3
);

All of these forms would be valid. I expect that people WILL do func and bar most of the time, but if they want to, they CAN do gunc and baz, because the rules are the same. When the rules are not the same (like now), they can do func, bar, and baz, but not gunc.

@hsutter
Copy link
Owner

hsutter commented Mar 18, 2024

This is probably the subject of another issue. But this makes me wonder if the concatenation of litteral string is still necessary after the removal of macro's.

Yes, it is. This functionality was not included originally and then added later due to discussions in this repo. I don't have a pointer handy to share with you, but if you're interested, I can probably dig them up.

Almost: String literal concatenation has been requested, and I personally would find it useful in reflect.h2 when I write a metafunction that generates long code lines. But I haven't added it yet. I'm still thinking about it. If I do it, it will be in the grammar, not in a preprocessor of course.

As far as "should the trailing ; in a block be optional", I would say that if it's not, then there is an inconsistency between expression lists and statement lists.

I agree that's a consequence of "always optional"... but please see this comment above and the design note again: I'm arguing that there are two (subtly) opposite★ ways to look at this that lead to different conclusions, "always optional" and "always allowed," and I agree with what you say about the former, but I'm arguing instead for (and implemented) the latter.

★ BTW, I totally realize it's a subtle viewpoint difference, which is why I've been calling it "(subtly) opposite"... and the irony of those two words is not lost on me, it's been jarring to me every time I've written it in this thread, because normally "subtly" and "opposite" don't go together (and I've kept resisting making the note longer by pointing that out, but this has pushed me over the edge 😄 ).

Does that help?

@gregmarr
Copy link
Contributor

Almost: String literal concatenation has been requested, and I personally would find it useful in reflect.h2 when I write a metafunction that generates long code lines. But I haven't added it yet. I'm still thinking about it. If I do it, it will be in the grammar, not in a preprocessor of course.

Thanks for the clarification. I thought it was done after that conversation.

"always optional" and "always allowed,"

Yes, I understand. I am arguing that "always allowed" still results in an undesired inconsistency, and am arguing for "always optional" to resolve that inconsistency. We've moved from "two of the four are allowed" to "three of the four are allowed", and I think that we should now move to "all four are allowed".

@jcanizales
Copy link

Don't all or most C++ style guides forbid multi-statement lines? Can't cppfront just drop the ; altogether like its analogues?

@gregmarr
Copy link
Contributor

Don't all or most C++ style guides forbid multi-statement lines?

I don't know. I don't generally use them, but it's certainly possible to do it if it's fairly simple.

Can't cppfront just drop the ; altogether like its analogues?

I'm not sure that it's always unambiguous. I'm sure we can come up with cases where the ; is required.

@gregmarr
Copy link
Contributor

Here's the discussion I was thinking of: #861

@jcanizales
Copy link

I guess the reason in my mind the , and ; are completely different cases is that I don't see writing statements as "enumerating a list of statements". I've always worked in companies where Google's C++ styleguide is enforced, and ; is just what you type before hitting the return key...

@JohelEGP
Copy link
Contributor

Line breaks happen in auto-formatted code with long lines.
You're suggesting breaking a lot of reasonable code that can't be ported to Cpp2 without having single overly-long lines.

@hsutter
Copy link
Owner

hsutter commented Mar 18, 2024

Brief answers as we're about to resume meeting sessions:

I guess the reason in my mind the , and ; are completely different cases is that I don't see writing statements as "enumerating a list of statements".

Yes, Raymond alludes to this at the end of his article in pointing out that when Pascal allows omitting the last ; it's at least in part because in Pascal that's a list of expressions, not statement.

Re long lines: There are LOTS of long lines. For example, any line that contains a lambda... we don't want that all on one line, right? If we did that, soon we'd ask for meaningful whitespace indentation... speaking of which...

Re dropping ;: The languages that use end-of-line as statement separator are usually whitespace indentation-significant languages that also don't have braces... those things usually go together. Tim Sweeney was in the middle of early design for what became Verse which chose that path (this was before the pandemic, before Simon Peyton-Jones joined Epic), and Tim urged me to allow that style also in Cpp2 (then called Cppx). I understood the attraction, because it's a clean style; but I resisted switching to that (or, worse, allowing that in addition to braces) because I couldn't see that it solved a known problem in C++. Note that Python similarly resists going the other way... for a fun Easter egg, try from __future__ import braces in Python. A language should choose one path or the other, I think.

@mhermier
Copy link
Author

mhermier commented Mar 19, 2024 via email

@gregmarr
Copy link
Contributor

Of topic

This would be on-topic for #861

but why the operator + cannot also be used for litteral strings? I mean the concatenation operation was always it for std::string, and

It can if they're made into std::string, but it means that it's a runtime concat instead of a compile-time concat.

it always bugged me that 2 consecutive litteral string were automatically concatenated without operator.

You could think of it not as two consecutive literal strings but as a way to break a long string literal across multiple lines. This ability comes from C, and far predates std::string::operator+.

@gregmarr
Copy link
Contributor

The languages that use end-of-line as statement separator are usually whitespace indentation-significant languages that also don't have braces... those things usually go together. ... Tim urged me to allow that style also in Cpp2 (then called Cppx). I understood the attraction, because it's a clean style; but I resisted switching to that

Please continue resisting!

@jcanizales
Copy link

jcanizales commented Mar 19, 2024

Line breaks happen in auto-formatted code with long lines.
You're suggesting breaking a lot of reasonable code that can't be ported to Cpp2 without having single overly-long lines.

I suggest looking into how languages that have dropped the ; still allow breaking long statements into multiple lines. This isn't a new theoretical idea: Swift, Kotlin and TypeScript are all a decade old already. None of them require you to write single overly-long lines. (I think in TypeScript people still usually add the ; by convention, even though they're not needed; I'm less familiar with it than with the mobile ones).

The languages that use end-of-line as statement separator are usually whitespace indentation-significant languages that also don't have braces... those things usually go together.

Usually, but not necessarily. And in particular, none of the three languages above are indentation-significant. Correlation is not causation here. I consider these three the analogues to Cpp2 because they successfully did for Objective-C, Java, and JavaScript what you're now trying to do for C++.

@hsutter
Copy link
Owner

hsutter commented Mar 19, 2024

Ah, thanks! That's good information, appreciated. One reason I went into more detail in my answer was to provide enough of the thinking and background that folks could point out if I was missing something.

Now that you mention it, yes Swift and Kotlin and TypeScript do use both braces and optional semicolons.

Doing some brief googling, one of the first things I found for TS was StackOverflow questions about semicolons that referenced the TypeScript Deep Dive which includes this section:

Semicolons

  • Use semicolons.

Reasons: Explicit semicolons helps language formatting tools give consistent results.
Missing ASI (automatic semicolon insertion) can trip new devs e.g. foo() \n (function(){}) will be a single statement (not two). Recommended by TC39 as well.

Disclaimer: I don't know how popular/authoritative that document is, and I haven't asked the TS language designers this semicolons question. I'll try to remember the next time I talk to them.

So at least Pascal and TypeScript allow allow omitting ; (some less, some more), but their developers generally don't use the feature (Pascal) and/or have at least some style guides/rules that recommend against using the feature citing pitfalls.

But you've given me great things to think about and look into more, and I will. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants