Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUGGESTION] Array Literals with the support of both Multidimensional and Jagged Arrays #424

Closed
msadeqhe opened this issue May 6, 2023 · 24 comments

Comments

@msadeqhe
Copy link

msadeqhe commented May 6, 2023

Preface

The idea of this suggestion is gathered from discussion in this issue.

I have to mention that (...) is already for calling constructores, grouping expressions and initializing lists.

Now, consider the following ambiguities:

  • (1) as an expression, are they parenthesis around a value? Or is it an array of one item?
  • () is not an empty array, but it calls default constructor in variable declarations.
  • (1, 2) in declarations, is it the arguments of a constructor? Or is it an array of two items?
x0: std::vector<int> = (1, 2);

Yes I know that std::vector has a bad API design, but I ask myself why would Cpp2 (like Cpp1) allow libraries to have this ambiguity in the first place?

Having array literals with a different syntax, will solve those three ambiguities. I suggest to use [...] for array literals:

x0: = [1, 2, 3];

Also nested (...)s or ;s or etc, will create multidimensional arrays, because they don't create a new array, and they are for mathematical grouping (as they are used to group expressions and to change the precedence of operators). On the other hand, nested [...]s will create jagged arrays, because they create a new array:

// Multideminsional Array
x0: = [(1, 2, 3), (4, 5, 6)];
// Or alternatively one of the following syntax:
// x0: = [1, 2, 3; 4, 5, 6];
// x0: = [(1, 2, 3); (4, 5, 6);];
r0: = x0[0, 1] == 2; // true

// Jagged Array
x1: = [[1, 2, 3], [4, 5, 6]];
r1: = x1[0][1] == 2; // true

I currently do not suggest to support multidimensional arrays, but it's a possibility to consider in the future.

Suggestion Detail

Three options are available instead of () for array literals:

  • <...> is already for template parameters/arguments. It's not a good choice, because:
    • It doesn't have any known relation with arrays.
    • It looks like less-than and greater-than operators, because of this similarity, it's not a good choice for arrays which are methematical such as a vector of boolean values, e.g. <a < b, c > 2>.
  • [...] is already for accessing items of an array. It seems to be a good choice.
  • {...} is already for function/statement blocks and type definitions. It can be considered as a good choice.

Now, it's the time to compare both [...] and {...} for array literals:

x0: /*...*/ = [1];
x1: /*...*/ = {1};

OK. Both of them look good. So what if we want to write an empty array?

x0: /*...*/ = [];
x1: /*...*/ = {};

[] is clearly an empty array, but {} can be either an empty function/statement block or an empty array in which it depends on the declaration. For example:

x0         :       std::vector<int> = {};  // empty array
x1         : () -> void             = {}   // empty statement block
x2         : () -> std::vector<int> = {}   // ERROR: It doesn't work, although visually it's the same as above.
x3: = call(: () -> std::vector<int> = {}); // SURPRISE! It works, although visually it's the same as above.

{} is visually surprising and inconsistent for x1, x2 and x3, although they look the same:

  • In x0 declaration, {} is an empty array.
  • In x1 declaration, {} is an empty statement block.
  • But in x2 declaration, {} is an error, because it must end with ;.
  • But in x3 declaration, {} is an empty array!

So [...] is more expressive than {...} for array literals.

Now let's consider this situation in the following example:

x0: = [1, 2, 3][1]; // It's equal to 2

The first [...] creates an array, and the second [...] accesses an item from it. A sequence of [...]s is not ambiguous, because its behaviour is similar to parenthesis:

x0: = (call() + something)(1);

The first (...) groups the operands of operator+, and the second (...) calls operator() on the result.

Your Questions

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

Yes. If a bad API design can suddenly change the meaning of code, it's going to be a security vulnerability. This suggestion is a way to prevent it by separating arrays from constructors and expressions.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

Yes. It's not needed to learn if user-defined constructors are ambiguous with initializer lists, because it prevents ambiguous situation completely. It allows more API choices.

Considered Alternatives

An alternative solution was that if a type has ambiguous constructor with initializer list, it should be a syntax error. By the way, this approach wouldn't fix the bad API design of std::vector.

Another alternative solution was a little complicated. The idea was to favor constructors over initializer list, and to consider a comma-separated list with parenthesis to be an initializer list:

x0: = (1, 2); // x0 is an initializer list

With the help of unnamed variable declaration and indirect initialization, it could be used like this:

// `: = (1, 2)` is an initializer list
x0: std::vector<int> = : = (1, 2);

// This calls the constructor to create a vector of one element with value 2.
x1: std::vector<int> = (1, 2);

But I gave up on this idea, becuase it would encourage unnamed variable declaration more than necessary.

Finally I considered to use literal templates syntax:

// `(1, 2)<int>` is an initializer list
x0: std::vector<int> = (1, 2)<int>;

// `list` is a user-defined literal suffix which creates an initializer list
x1: std::vector<int> = (1, 2)list;

// This calls the constructor to create a vector of one element with value 2.
x2: std::vector<int> = (1, 2);

But I gave up on this idea too, because (1, 2)<int> would require to always specify the type, and (1, 2)list would make user-defined literal suffixes to be look like constructors.

Edits

  • I've added one more alternative solution which I was considered.
@AbhinavK00
Copy link

Herb points out that [1,2,3] would be hard to parse to see if it's the indexing operator or an array literal here, so here's the suggestion for an unambigious syntax. It'll be

x0: = [,1, 2, 3];  //notice the comma between the [ and 1

These arrays would effectively translate to std::array in cpp1 or some cpp2::array if Herb wants a different type.
With that syntax, std::vector constructors would look like this

x1 : std::vector = (4,2);  // vector of [2,2,2,2]
x2 : std::vector = ([,4,2]) // vector of [4,2]
x3 : std::vector = [,4,2]  //error!

There won't be any initialiser lists in cpp2, they'll be just an implementation detail on cpp2 side.

@msadeqhe
Copy link
Author

msadeqhe commented May 6, 2023

Herb points out that [1,2,3] would be hard to parse to see if it's the indexing operator or an array literal here, ...

So "hard to parse" is the only complaint against [...]. I hope it to be revisited. If there is an identifier before [...] then it's a subscript operator, otherwise it's an array literal:

x0: = identifier[1]; // [1] is a subscript operator.
x1: = [1]; // [1] is an array literal.

It's like parenthesis in Cpp2 and Cpp1:

x0: = identifier(1); // (1) is a function call operator.
x1: = (1); // (1) is an expression group.

The only difference is that [...] (without identifier before it) has to be transformed to {...} in Cpp1. But other cases don't need this transformation.

@msadeqhe
Copy link
Author

msadeqhe commented May 6, 2023

Herb points out that [1,2,3] would be hard to parse to see if it's the indexing operator or an array literal here, ...

It seems Herb's answer was to this comment:

Could we use square [ ] brackets for lists/aggregates instead?

I have to clarify that I don't suggest to also use [...] for aggregates or etc. I suggest to use [...] only for array literals. IMO parenthesis are best for aggregates, because somehow they are constructors similar to designated initializers which look like named arguments (both visually and semantically), and there won't be any conflict between constructors and designated initializers with parenthesis as described in this comment.

@HALL9kv0
Copy link

HALL9kv0 commented May 6, 2023

I think it is important for initializer list/ array-type constructors to have a different symbol/syntax from everything else, as they are a special case. We also need avoid the double paren ((...)) nightmare.
[...] is the most sensible choice as it is intuitive and very familiar for people coming from python/javascript (two of the most pop languages) .

@AbhinavK00
Copy link

I think initialiser list contructors should be viewed as simple contructors where we pass arrays as parameters, they should not be a special case. so if [...] is chosen as array literals, then

x : std::vector = ([1,2,3,54]);

should translate to initialiser list syntax instead of

x : std::vector = [1,2,3,54]; //does not feel right

@HALL9kv0
Copy link

HALL9kv0 commented May 6, 2023

The issue with that is that for the case of a 2D vectorst the ('s and [' become too many
v : std::vector<std::vector<int>> = ([([1,2,3]),([4,5,6])]) ;

The () feel redundant and a pure [[1,2,3],[4,5,6]] syntax way less prone to typing errors. Also, it more closely resembles the math vectors everyone learns in school.

@AbhinavK00
Copy link

AbhinavK00 commented May 6, 2023

That's not what I recommend. The syntax would be

v : std::vector<std::vector<int>> = ([[1,2,3],[4,5,6]]);

Like I said, just view it as passing an array to a constructor. Here we are passing the array [[1,2,3],[4,5,6]] as an argument to the constructor.
One more thing, I said that the following would be an error

x : std::vector = [1,2,3,54];

But I think this should be allowed if the constructor is marked implicit, what do you guys think?

@msadeqhe
Copy link
Author

msadeqhe commented May 6, 2023

Parenthesis around [...] are not necessary, because [...] is an array literal. It's just like other literals in which we pass arguments to the constructor without implicit:

#include <iostream>

taip: type = {
    operator=: (out this, i: int) = {
        std::cout << i << "\n";
    }
}

main: () = {
    t0: taip = 1;

    // Parenthesis are not necessary.
    t1: taip = (1);
}

So 1 is a literal with type int, in the same way [1, 2, 3] is a literal with type std::initializer_list<int>, that's similar to how "text" is a literal with type std::string. For example:

#include <string>
#include <iostream>

taip: type = {
    operator=: (out this, text: std::string) = {
        std::cout << "OK. That's a string literal.\n";
    }
    operator=: (out this, list: std::initializer_list<int>) = {
        std::cout << "OK. That's an array literal.\n";
    }
}

main: () = {
    s0: taip = "text";
    t0: taip = [1, 2, 3];

    // Parenthesis are not necessary.
    s1: taip = ("text");
    t1: taip = ([1, 2, 3]);
}

@HALL9kv0
Copy link

HALL9kv0 commented May 6, 2023

If I understand correctly, the inner vectors will only need [...] for their initializer list but the outer will need ([ ..]) ?
Personally, even though it is much better that ([]), I still find the ()redundant and prefer the python/javascript way.

@AbhinavK00
Copy link

Parenthesis around [...] are not necessary, because [...] is an array literal. It's just like other literals in which we pass arguments to the constructor without implicit

I thought constructors were supposed to be explicit? you guys can ignore my comments cuz it seems my understanding was wrong

@msadeqhe
Copy link
Author

msadeqhe commented May 8, 2023

1. Multidimensional Arrays

I have to clarify that currently I don't suggest multidimensional arrays, but they are a possible feature to explore.

Multiple syntax choices are available for multidimensional arrays:

a: = [((1, 2), (3, 4)), ((5, 6), (7, 8))];
b: = [1, 2; 3, 4;; 5, 6; 7, 8];
c: = [((1, 2); (3, 4);); ((5, 6); (7, 8););];
d: = [((1; 2); (3; 4)); ((5; 6); (7; 8))];
e: = ...

Here, I'm explaining the first one of those possible syntax for multidimensional arrays:

a: = [((1, 2), (3, 4)), ((5, 6), (7, 8))];

This syntax requires loosely multidimensional arrays, when the type of array literal is not specified within declaration (e.g. the type of variable a in above example is not specified).

1.1. Loosely Multidimensional Arrays

1-dimensional arrays are loosely multidimensional arrays. It's useful in generic programming. That's because there is two types of dimensions within arrays:

  • Required dimensions: They are added with parenthesis which contain commas.
    • These parenthesis must have identical commas in all corresponding elements.
// These parenthesis don't contain commas:
[(0), (1), (2), (3)]     // OK. It's equal to [0, 1, 2, 3]
[0, (1), ((2)), (((3)))] // OK. It's equal to [0, 1, 2, 3]

// These parenthesis contain commas:
[(0, 1), 2, 3] // ERROR! They must have identical commas in all corresponding elements.

// This is a combination:
[(0, (1)), (2, (3))] // OK. It's equal to [(0, 1), (2, 3)]
  • Optional dimensions: They are extra dimensions in which their length is always 1, therefore we use index 0 to access them.
    • Optional dimensions are always after required dimensions.
      • We have to specify the index for required dimensions.
      • Optionally we can specify index 0 for optional dimensions.
list: /*something*/ = [1, 2, 3];

// Only `1` is required dimension. These zero-value indexes are optional dimensions.
s0: = list[1] == 2; // true
s1: = list[1, 0] == 2; // true
s2: = list[1, 0, 0] == 2; // true

The opposite is not true. Multidimensional arrays cannot be treated as 1-dimensional arrays.

I have to explain that these rules help us to get rid of parenthesis when they are used to only group expressions inside arrays without any intention to add another dimension:

// It's a 1-dimensional array.
x: = [(1 + 2), (3 + 4)];

// But optionally it can be used as a two-dimensional array.
a: = x[1];    // OK. It's equal to (3 + 4)
b: = x[1, 0]; // OK. It's equal to (3 + 4)

// It's a two-dimensional array.
y: = [(1, 2)];

// But it cannot be used as a 1-dimensional array.
a: = x[0];    // ERROR! The second index is required.
b: = x[0, 1]; // OK. It's equal to `2`

1.2. More Examples

point: type = {
    operator=: (out this, x: int, y: int) = {}
}

zero_point: () -> point = {
    return (0, 0);
}

zero_array: () -> std::initializer_list<int> = {
    return [0, 0, 0/*, ...*/];
}

a0: =     []; // An empty array
a1: =     [ [] ]; // An empty array in array
a2: =     [1, 2, 3]; // A 1-dimensional array
a3: =   [ [1, 2, 3] ]; // A 1-dimensional array of 1-dimensional arrays
a4: =    [(1, 2, 3)]; // A two-dimensional array
a5: = [ [ [1, 2, 3] ] ]; // A 1-dimensional array of 1-dimensional arrays of 1-dimensional arrays
a6: =   [((1, 2, 3))]; // A three-dimensional array
a7: =  [ [(1, 2, 3)] ]; // A 1-dimensional array of two-dimensional arrays
a8: =  [( [1, 2, 3] )]; // A 1-dimensional array of 1-dimensional arrays

Although a8 (the last line) can be visually seen as "A two-dimensional array of 1-dimensional arrays", and we can use it either way without any problem with optional dimensions:

a8: =  [( [1, 2, 3] )]; // A 1-dimensional array of 1-dimensional arrays
// It can be visually seen as a two-dimensional array of 1-dimensional arrays

x0: = a8[0][1];      // OK. A 1-dimensional array of 1-dimensional arrays
x1: = a8[0, 0][1]; // OK. A two-dimensional array of 1-dimensional arrays

@msadeqhe
Copy link
Author

msadeqhe commented May 9, 2023

2. Dictionaries

I have to clarify that currently I don't suggest dictionaries, but they are a possible feature to explore.

No additional syntax is required for dictionaries, if Cpp2 could support type inference for unnamed variable declarations.

2.2. Dictionaries are arrays of objects.

With the help of unnamed variable declarations, we can write arrays of objects:

x0: = [: std::pair<std::string, int> = ("a", 1),
       : std::pair<std::string, int> = ("b", 2),
       : std::pair<std::string, int> = ("c", 3)];

But if Cpp2 could support type inference for unnamed variable declarations, it would be so much simpler in a way that a distinct syntax would be not needed for dictionaries:

x0: std::vector<std::pair<std::string, int>> = [: = ("a", 1),: = ("b", 2),: = ("c", 3)];

// It's a dictionary.
x1: = [: = ("a", 1),: = ("b", 2),: = ("c", 3)];

Please, see the next comment for construction with (...) in declarations.

2.2. Do "arrays of objects" conflict with "multidimensional arrays"?

No. There isn't any conflict, because parenthesis have loosely semantic unlike [...]. It's possible to have multidimensional arrays of objects, so they don't conflict with each other. For example:

// A multidimensional array of numbers
x0: matrix<int, 2, 3> = [(1, 2, 3), (4, 5, 6)];

// A multidimensional array of objects
x1: matrix<std::pair<std::string, int>, 2, 3> = [
    (: = ("a", 1),: = ("b", 2),: = ("c", 3)),
    (: = ("d", 4),: = ("e", 5),: = ("f", 6))
];
r1: = x1[0, 1]; // It is `: = ("b", 2)`.

In a nutshell, we use (a, b) to add a new dimension to the array, and := (a, b) to call the constructor of an element.

But in this case, the type of array literal is known for variable x1, therefore with the help of constructor syntax in declarations, all : = (...)s can be replaced with (...)s:

// A multidimensional array of objects
x1: matrix<std::pair<std::string, int>, 2, 3> = [
    (("a", 1), ("b", 2), ("c", 3)),
    (("d", 4), ("e", 5), ("f", 6))
];

Please, see the next comment for explanation.

@msadeqhe
Copy link
Author

msadeqhe commented May 9, 2023

3. Array Literal of Constructors

If the type of array literals are specified within declarations, the object construction is much easier and readable.

First, I have to mention that parenthesis can have the meaning of calling constructor in declarations in Cpp2:

point: type = {
    operator=: (out this, x: int, y: int) = {}
}

main: () = {
    // These parenthesis call constructor.
    x0: point = (0, 0);
}

So with the help of this feature, we may create an array literal of constructors:

// This is not a multidimensional array.
// Because parenthesis call constructor. They don't add new dimension.
// It's not ambiguous. It's expressed from the type within declaration.
x0: std::vector<point> = [(0, 0), (1, 1), (2, 2)];

// This is a two-dimensional array.
// Because the inner most parenthesis call constructor. They don't add new dimension.
// It's not ambiguous. It's expressed from the type within declaration.
x1: matrix<point, 2, 2> = [((0, 0), (1, 1)), ((2, 2), (3, 3))];

Because the type is specified within declaration, "Parenthesis for Constructors" won't conflict with "Parenthesis for Dimenstions". Finally a dictionary will look like this when its type is specified within declaration:

c0: std::vector<std::pair<std::string, int>> = [("a", 1), ("b", 2), ("c", 3)];

// It's a dictionary.
c1: dictionary<int> = [("a", 1), ("b", 2), ("c", 3)];

It's context-free, and : = (...) is not needed for object construction (see part 2. Dictionaries from the previous comment).

@msadeqhe
Copy link
Author

msadeqhe commented May 9, 2023

Readability examples

C++ already has operator[] to access arrays. If only operator() was available without operator[], it would be hard to understand the meaning of code. For example:

// `something` is defined somewhere...
sum: = 0;
(copy i: = 0) while i < 10 next i++ {
    // Is there any array here?
    sum += thing(something(i))(i);
}

What does it? OK. operator[] (aka brackets) will make it readable:

// `something` is defined somewhere...
sum: = 0;
(copy i: = 0) while i < 10 next i++ {
    // OK. This is an array of function objects.
    sum += thing[something(i)](i);
}

operator[] helps both "the compiler to parse" and "humans to read" the code (context-free).

In a similar manner, array literals with [...] will simplify the guidance of Cpp2 in a consistent way:

  • Unnamed variable declarations will call the constructor:
    • : id = arg
    • : id = (), default constructor
    • : id = (args...)
  • Parenthesis will group expressions. They cannot be empty:
    • (expr)
  • Function call, operator():
    • id(), without arguments
    • id(args...)
  • Brackets will create an array:
    • [], an empty array
    • [items...]
  • Accessing arrays, operator[]:
    • id[], ??
    • id[index...]
  • Braces are for the body of declarations and control structures:
    • {}, an empty statement block
    • { statements... }
  • Templates:
    • id<>, ??
    • id<args...>

They lead to more readable and easier to understand Cpp2 code, especially in a function call:

If we use (...) for arrays, consider how this example would be difficult to understand:

call((: t = (1, (f() + 2)), : t = (0, ())));

If you didn't lose the meaning of parenthesis, the compiler will face with ambiguous meanings.

Now if we use [...] for array literals, the previous example would look like this:

call([: t = (1, [f() + 2]), : t = (0, [])]);

Consider how you already understand what it does without my explanation (context-free).

@jcanizales
Copy link

I'm sympathetic to the suggestion, but this misunderstands the questions:

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

Yes. If a bad API design can suddenly change the meaning of code, it's going to be a security vulnerability. This suggestion is a way to prevent it by separating arrays from constructors and expressions.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

Yes. It's not needed to learn if user-defined constructors are ambiguous with initializer lists, because it prevents ambiguous situation completely. It allows more API choices.

Bolding mine: The key word in those questions is "current". Current C++ code, current C++ guidance.

@msadeqhe
Copy link
Author

msadeqhe commented May 10, 2023

Yes. My answers are for current Cpp1. I have to explain them with this example:

#include <iostream>

class parts {
    public:
    parts(int num) {
        std::cout << R"({ "ID": 1 })";
    }
};

int main() {
    parts p{0};
}

OK. This compiles fine. The output { ID: 1 } is in JSON format, and other applications would read it to decide what to do.

Now, someday someone decides to add another constructor to parts:

#include <iostream>

class parts {
    public:
    parts(int num) {
        std::cout << R"({ "ID": 1 })";
    }

    parts(std::initializer_list<int> args) {
        /* statements... */
    }
};

int main() {
    parts p{0};
}

OOPS! We didn't change the first constructor, but the second constructor would change the behaviour of object construction in main function. It's like the first constructor to be replaced indirectly with the second constructor! Consider thousands of functions could be depend on the first constructor, and the behaviour of all of them would be changed suddenly. Also those applications which used to read { ID: 1 }, will get nothing this time after this change, thus they would optimistically do nothing instead of being crashed. Anyway this unintended impacts lead to security vulnerabilities. Just like how integer overflows and out of bound indexes can make things wrong.

Programmers have to learn and keep in mind that they have to care about initializer lists when they create constructors. It leads to more thinking for code management, and it will make guidance literature bigger.

Also in current Cpp2 the same problem happens.

In current Cpp1 to avoid the problem, one can use () for constructors and {} for initializer list. This rule is in contrast to current C++ guidelines that recommend {} everywhere. But in current Cpp2 a solution isn't available yet except to use Cpp1.

@msadeqhe
Copy link
Author

msadeqhe commented May 10, 2023

An alternative solution was that if a type has ambiguous constructor with initializer list, it should be a syntax error. By the way, this approach wouldn't fix the bad API design of std::vector.

std::initializer_list<T> is not the solution to have a safe variable argument list. They can be completely replaced with an integrated language feature. For example:

run: (args...: int) = {}
// : (args: int, ...) = {}

Now, syntactically Cpp2 can disallow a function (or constructor) to have the following overloads at the same time:

run: (arg1: int) = {}
run: (args...: int) = {} // ERROR!
// : (args: int, ...) = {} // ERROR!

The above example will be like the following example with std::initializer_list<int>:

run: (arg1: int) = {}
run: (args: std::initializer_list<int>) = {}

But because std::initiazlier_list<int> type is different from int, syntactically it has to be allowed to have all those overloads which leads to bad API design.

Cpp2 is about C++20/23/... as stated in README.md of this repository. Let's accept that std::initializer_list<T> is not similar to variable argument list as supposed to be. It's an object like std::string. It needs a literal as "..." is a literal for std::string, whereas uniform initialization with braces {...} in Cpp1 and parenthesis (...) in Cpp2 want to treat std::initializer_list<T> as variable argument list. On the other hand, initialization with parenthesis (...) in Cpp1 is a partial solution to the problem. In a better way, arrays can have their own literals with brackets [...] in Cpp2.

@AbhinavK00
Copy link

Some of this is getting confusing for me, so here's some feedback.
For multidimensional or jagged arrays, I think we should just start with introducing one-dimensional arrayd and then generalize them up to include both multidimensional and jagged arrays. Having two kinds of "higher" arrays just increases concept count while it'll still boil down to "this one is the answer most of the time, use the other one only when necessary".

Secondly, I really like the idea of arrays of constructors but

x1: = [: = ("a", 1),: = ("b", 2),: = ("c", 3)]

this is just an abomination, the other way is much more pleasant.

Thirdly, about the initialiser list thing. I think the simplest solution would be first require ALL contructors to use parentheses

x : class_name = (contructing, parameters);

Cppfront will NEVER call any initialiser list contructors with this syntax. Now, from a cppfront point of view, initialiser list contructors could be viewed as just simple contructors taking in arrays for parameter

x : class_name = ([initialiser, list, parameters]);

This will call the initialiser list contructors and it even makes sense as initialiser list behave like arrays. Someone who's new to cppfront will not have to know what initialiser lists are, this is intuitive.

@msadeqhe
Copy link
Author

msadeqhe commented May 11, 2023

Thanks for your feedback.

I'm agree about multidimensional arrays, but jagged arrays are currently supported in C++.

Secondly, I really like the idea of arrays of constructors but

x1: = [: = ("a", 1),: = ("b", 2),: = ("c", 3)]

this is just an abomination, the other way is much more pleasant.

You're right. That syntax is for if the type is not known, but technically to support that syntax, it has to make the following notation to have a std::pair type:

x: = ("a", 1);
// It's type is std::pair<std::string, int>

It's not a general solution, and it's not needed if Cpp2 supports the following syntax.

On the other hand as you've mentioned the following syntax is both general and expressive when the type is known with the help of directly calling constructors within arrays;

x1: vector<pair<string, int>> = [("a", 1), ("b", 2), ("c", 3)];

Thirdly, about the initialiser list thing.

In variable declaration, if the constructor has one argument in addition to this, parenthesis will be optional. So array literals are consistent with other literals:

a: obj = 1;
b: obj = (1);
c: obj = "text";
d: obj = ("text");
e: obj = [1, 2];
f: obj = ([1, 2]);

But parentheses are required for multiple arguments:

a: obj = (10, 2);
b: obj = (10, "text");
c: obj = (10, [1, 2]);

Consider [1, 2] could be "1, 2" or '1, 2', they are literals. On the other hand, parentheses are for grouping expressions and passing arguments to function calls.

x : class_name = (contructing, parameters);

Cppfront will NEVER call any initialiser list contructors with this syntax.

Yes, that's it. Arrays will be treated like a literal in this way.

Parenthesis won't call the constructor with initializer list parameter.

@AbhinavK00
Copy link

AbhinavK00 commented May 19, 2023

One thing I realised a few days ago is that jagged arrays could be thought of as a tuple of arrays.
a : std::tuple<int[3], int[5],int[8]>;
I don't know how close to ACTUAL jagged arrays this is, are arrays of same type but different lengths considered as same type or not?
Anyways, so maybe we don't need explicit support for jagged arrays. They would be as good as right there if we have first class support for tuples (or still available even if not.)

@msadeqhe
Copy link
Author

msadeqhe commented May 19, 2023

Jagged arrays are different from a tuple of arrays, because the length of tuples are fixed. But yes, any extra syntax is not needed for array literals to have jagged arrays in Cpp2:

a: vector<int> = [1, 2, 3];
b: vector<int> = [1, 2];
c: vector<int> = [1, 2, 3, 4, 5];
x: vector<vector<int>> = [a, b, c];

d: vector<int> = [1, 2, 3, 4];
x.push_back(d);
// x == [a, b, C, d]

@msadeqhe
Copy link
Author

msadeqhe commented May 19, 2023

Jagged arrays are different from a tuple of arrays, because the length of tuples are fixed. But yes, any extra syntax is not needed for array literals to have jagged arrays in Cpp2:

a: vector<int> = [1, 2, 3];
b: vector<int> = [1, 2];
c: vector<int> = [1, 2, 3, 4, 5];
x: vector<vector<int>> = [a, b, c];

d: vector<int> = [1, 2, 3, 4];
x.push_back(d);
// x == [a, b, C, d]

Or with new constructor syntax as described in this issue, it would be like this:

x: = [[1, 2, 3]vector<int>,
      [1, 2]vector<int>,
      [1, 2, 3, 4, 5]vector<int>,
      [1, 2, 3, 4]vector<int>
]vector<vector<int>>;

And with type deduction for template arguments:

x: = [[1, 2, 3]vector,
      [1, 2]vector,
      [1, 2, 3, 4, 5]vector,
      [1, 2, 3, 4]vector
]vector;

With the help of either type aliases or UDLs (aka Non-member Constructors):

// Either type alias
v: <T> type == std::vector<T>;
// or Non-member Constructor
v: <T> (list: std::initializer_list<T>) -> type == std::vector<T> = {
    return list;
}

x: = [[1, 2, 3]v,
      [1, 2]v,
      [1, 2, 3, 4, 5]v,
      [1, 2, 3, 4]v
]v;

@msadeqhe
Copy link
Author

msadeqhe commented May 29, 2023

OK. Both of them look good. So what if we want to write an empty array?

x0: /*...*/ = [];
x1: /*...*/ = {};

[] is clearly an empty array, but {} can be either an empty function/statement block or an empty array in which it depends on the declaration. For example:

x0         :       std::vector<int> = {};  // empty array
x1         : () -> void             = {}   // empty statement block
x2         : () -> std::vector<int> = {}   // ERROR: It doesn't work, although visually it's the same as above.
x3: = call(: () -> std::vector<int> = {}); // SURPRISE! It works, although visually it's the same as above.

{} is visually surprising and inconsistent for x1, x2 and x3, although they look the same:

  • In x0 declaration, {} is an empty array.
  • In x1 declaration, {} is an empty statement block.
  • But in x2 declaration, {} is an error, because it must end with ;.
  • But in x3 declaration, {} is an empty array!

So [...] is more expressive than {...} for array literals.

For {}, Cpp2 has to look within {} to find out if it's a list or a block statement. How much of this visually unclear meaning (opinion-based) is acceptable?

a: = call(: () -> std::vector<int> = {1, 2, 3});          // {} is a list.
b: = call(: () -> std::vector<int> = { /*statements*/ }); // {} is a block statement.
c: = call(: () = {});                                     // {} is a block statement.
d: = call(: () -> _ = {});                                // {} is a list.

The only thing against {} for array literals is this behaviour, otherwise {} is also a good choice like []. Either of them will resolve issues.

Another Considered Alternative

Also () can be used for array literals if Cpp2 could support typed expressions as described in issue #463, in this way () always calls the direct constructor:

x: vector<int> = (1, 2);      // This is an array of one `int` with value 2.
y: vector<int> = (1, 2):list; // This is an array of two `int`s with values 1 and 2.

list<T> is a type alias to initializer_list<T>.

But the problem of this approach is that we have to write :list everytime.

// This is an array of two `int`s with values 1 and 2.
z: = (1, 2):list:vector<int>;

On the other hand, if () may call either direct constrcutor or initializer_list<T> constructor, Cpp2 may favor direct constructors over initializer_list<T> constructor, in this way :list will be used only for ambiguous cases:

x: vector<int> = (1, 2);      // This is an array of one `int` with value 2.
a: = (1, 2):vector;           // The same

y: vector<int> = (1, 2):list; // This is an array of two `int`s with values 1 and 2.
b: = (1, 2):list:vector;      // The same

z: vector<int> = (1, 2, 3);   // This is an array of three `int`s with values 1, 2 and 3.
c: = (1, 2, 3):vector;        // The same

So the constructor with initializer_list<T> is like a general constructor, and constructors with multiple arguments of type T are like specialization:

Abc: <T> type = {
    // General constructor for sequence of `T`s
    operator=: (out this, initializer_list<T>) = { /*statements*/ }

    // This is a specialized constructor for one item of `T`
    operator=: (out this, a: T) = { /*statements*/ }

    // This is a specialized constructor for three items of `T`s
    operator=: (out this, a: T, b: T, c: T) = { /*statements*/ }
}

main: () = {
// It calls the specialized constructor for one item of `int`.
    a: Abc<int> = (1);
// It calls the general constructor for sequence of `int`s.
    b: Abc<int> = (1, 2);
// It calls the specialized constructor for three items of `int`s.
    c: Abc<int> = (1, 2, 3);
// It calls the general constructor for sequence of `int`s.
    d: Abc<int> = (1, 2, 3, 4);
}

By the way, it can be a bad API if those specialized constructors had completely different behaviours as vector<int> has a bad API. If Cpp2 wouldn't call direct constructor and list initialization with the same notation, it would completely eliminate the possibility of this bad API to happen. Using [...] or {...} or even force to write (...):list for every list initialization will elimintate the possibility of bad API to happen.

@AbhinavK00
Copy link

Jagged arrays are different from a tuple of arrays, because the length of tuples are fixed.

Sorry for misunderstanding, I thought the suggestion was about fixed length arrays. But I think cpp2 should support fixed length arrays as inbuilt as one can always resort to std::vector.

Repository owner locked and limited conversation to collaborators Aug 30, 2023
@hsutter hsutter converted this issue into discussion #637 Aug 30, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

4 participants