-
Notifications
You must be signed in to change notification settings - Fork 251
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUGGESTION] Array Literals with the support of both Multidimensional and Jagged Arrays #424
Comments
Herb points out that
These arrays would effectively translate to std::array in cpp1 or some cpp2::array if Herb wants a different type.
There won't be any initialiser lists in cpp2, they'll be just an implementation detail on cpp2 side. |
So "hard to parse" is the only complaint against x0: = identifier[1]; // [1] is a subscript operator.
x1: = [1]; // [1] is an array literal. It's like parenthesis in Cpp2 and Cpp1: x0: = identifier(1); // (1) is a function call operator.
x1: = (1); // (1) is an expression group. The only difference is that |
It seems Herb's answer was to this comment:
I have to clarify that I don't suggest to also use |
I think it is important for initializer list/ array-type constructors to have a different symbol/syntax from everything else, as they are a special case. We also need avoid the double paren |
I think initialiser list contructors should be viewed as simple contructors where we pass arrays as parameters, they should not be a special case. so if
should translate to initialiser list syntax instead of
|
The issue with that is that for the case of a 2D vectorst the ('s and [' become too many The |
That's not what I recommend. The syntax would be
Like I said, just view it as passing an array to a constructor. Here we are passing the array
But I think this should be allowed if the constructor is marked implicit, what do you guys think? |
Parenthesis around #include <iostream>
taip: type = {
operator=: (out this, i: int) = {
std::cout << i << "\n";
}
}
main: () = {
t0: taip = 1;
// Parenthesis are not necessary.
t1: taip = (1);
} So #include <string>
#include <iostream>
taip: type = {
operator=: (out this, text: std::string) = {
std::cout << "OK. That's a string literal.\n";
}
operator=: (out this, list: std::initializer_list<int>) = {
std::cout << "OK. That's an array literal.\n";
}
}
main: () = {
s0: taip = "text";
t0: taip = [1, 2, 3];
// Parenthesis are not necessary.
s1: taip = ("text");
t1: taip = ([1, 2, 3]);
} |
If I understand correctly, the inner vectors will only need |
I thought constructors were supposed to be explicit? you guys can ignore my comments cuz it seems my understanding was wrong |
1. Multidimensional ArraysI have to clarify that currently I don't suggest multidimensional arrays, but they are a possible feature to explore. Multiple syntax choices are available for multidimensional arrays: a: = [((1, 2), (3, 4)), ((5, 6), (7, 8))];
b: = [1, 2; 3, 4;; 5, 6; 7, 8];
c: = [((1, 2); (3, 4);); ((5, 6); (7, 8););];
d: = [((1; 2); (3; 4)); ((5; 6); (7; 8))];
e: = ... Here, I'm explaining the first one of those possible syntax for multidimensional arrays: a: = [((1, 2), (3, 4)), ((5, 6), (7, 8))]; This syntax requires loosely multidimensional arrays, when the type of array literal is not specified within declaration (e.g. the type of variable 1.1. Loosely Multidimensional Arrays1-dimensional arrays are loosely multidimensional arrays. It's useful in generic programming. That's because there is two types of dimensions within arrays:
// These parenthesis don't contain commas:
[(0), (1), (2), (3)] // OK. It's equal to [0, 1, 2, 3]
[0, (1), ((2)), (((3)))] // OK. It's equal to [0, 1, 2, 3]
// These parenthesis contain commas:
[(0, 1), 2, 3] // ERROR! They must have identical commas in all corresponding elements.
// This is a combination:
[(0, (1)), (2, (3))] // OK. It's equal to [(0, 1), (2, 3)]
list: /*something*/ = [1, 2, 3];
// Only `1` is required dimension. These zero-value indexes are optional dimensions.
s0: = list[1] == 2; // true
s1: = list[1, 0] == 2; // true
s2: = list[1, 0, 0] == 2; // true The opposite is not true. Multidimensional arrays cannot be treated as 1-dimensional arrays. I have to explain that these rules help us to get rid of parenthesis when they are used to only group expressions inside arrays without any intention to add another dimension: // It's a 1-dimensional array.
x: = [(1 + 2), (3 + 4)];
// But optionally it can be used as a two-dimensional array.
a: = x[1]; // OK. It's equal to (3 + 4)
b: = x[1, 0]; // OK. It's equal to (3 + 4)
// It's a two-dimensional array.
y: = [(1, 2)];
// But it cannot be used as a 1-dimensional array.
a: = x[0]; // ERROR! The second index is required.
b: = x[0, 1]; // OK. It's equal to `2` 1.2. More Examplespoint: type = {
operator=: (out this, x: int, y: int) = {}
}
zero_point: () -> point = {
return (0, 0);
}
zero_array: () -> std::initializer_list<int> = {
return [0, 0, 0/*, ...*/];
}
a0: = []; // An empty array
a1: = [ [] ]; // An empty array in array
a2: = [1, 2, 3]; // A 1-dimensional array
a3: = [ [1, 2, 3] ]; // A 1-dimensional array of 1-dimensional arrays
a4: = [(1, 2, 3)]; // A two-dimensional array
a5: = [ [ [1, 2, 3] ] ]; // A 1-dimensional array of 1-dimensional arrays of 1-dimensional arrays
a6: = [((1, 2, 3))]; // A three-dimensional array
a7: = [ [(1, 2, 3)] ]; // A 1-dimensional array of two-dimensional arrays
a8: = [( [1, 2, 3] )]; // A 1-dimensional array of 1-dimensional arrays Although a8: = [( [1, 2, 3] )]; // A 1-dimensional array of 1-dimensional arrays
// It can be visually seen as a two-dimensional array of 1-dimensional arrays
x0: = a8[0][1]; // OK. A 1-dimensional array of 1-dimensional arrays
x1: = a8[0, 0][1]; // OK. A two-dimensional array of 1-dimensional arrays |
2. DictionariesI have to clarify that currently I don't suggest dictionaries, but they are a possible feature to explore. No additional syntax is required for dictionaries, if Cpp2 could support type inference for unnamed variable declarations. 2.2. Dictionaries are arrays of objects.With the help of unnamed variable declarations, we can write arrays of objects: x0: = [: std::pair<std::string, int> = ("a", 1),
: std::pair<std::string, int> = ("b", 2),
: std::pair<std::string, int> = ("c", 3)]; But if Cpp2 could support type inference for unnamed variable declarations, it would be so much simpler in a way that a distinct syntax would be not needed for dictionaries: x0: std::vector<std::pair<std::string, int>> = [: = ("a", 1),: = ("b", 2),: = ("c", 3)];
// It's a dictionary.
x1: = [: = ("a", 1),: = ("b", 2),: = ("c", 3)]; Please, see the next comment for construction with 2.2. Do "arrays of objects" conflict with "multidimensional arrays"?No. There isn't any conflict, because parenthesis have loosely semantic unlike // A multidimensional array of numbers
x0: matrix<int, 2, 3> = [(1, 2, 3), (4, 5, 6)];
// A multidimensional array of objects
x1: matrix<std::pair<std::string, int>, 2, 3> = [
(: = ("a", 1),: = ("b", 2),: = ("c", 3)),
(: = ("d", 4),: = ("e", 5),: = ("f", 6))
];
r1: = x1[0, 1]; // It is `: = ("b", 2)`. In a nutshell, we use But in this case, the type of array literal is known for variable // A multidimensional array of objects
x1: matrix<std::pair<std::string, int>, 2, 3> = [
(("a", 1), ("b", 2), ("c", 3)),
(("d", 4), ("e", 5), ("f", 6))
]; Please, see the next comment for explanation. |
3. Array Literal of ConstructorsIf the type of array literals are specified within declarations, the object construction is much easier and readable. First, I have to mention that parenthesis can have the meaning of calling constructor in declarations in Cpp2: point: type = {
operator=: (out this, x: int, y: int) = {}
}
main: () = {
// These parenthesis call constructor.
x0: point = (0, 0);
} So with the help of this feature, we may create an array literal of constructors: // This is not a multidimensional array.
// Because parenthesis call constructor. They don't add new dimension.
// It's not ambiguous. It's expressed from the type within declaration.
x0: std::vector<point> = [(0, 0), (1, 1), (2, 2)];
// This is a two-dimensional array.
// Because the inner most parenthesis call constructor. They don't add new dimension.
// It's not ambiguous. It's expressed from the type within declaration.
x1: matrix<point, 2, 2> = [((0, 0), (1, 1)), ((2, 2), (3, 3))]; Because the type is specified within declaration, "Parenthesis for Constructors" won't conflict with "Parenthesis for Dimenstions". Finally a dictionary will look like this when its type is specified within declaration: c0: std::vector<std::pair<std::string, int>> = [("a", 1), ("b", 2), ("c", 3)];
// It's a dictionary.
c1: dictionary<int> = [("a", 1), ("b", 2), ("c", 3)]; It's context-free, and |
Readability examplesC++ already has // `something` is defined somewhere...
sum: = 0;
(copy i: = 0) while i < 10 next i++ {
// Is there any array here?
sum += thing(something(i))(i);
} What does it? OK. // `something` is defined somewhere...
sum: = 0;
(copy i: = 0) while i < 10 next i++ {
// OK. This is an array of function objects.
sum += thing[something(i)](i);
}
In a similar manner, array literals with
They lead to more readable and easier to understand Cpp2 code, especially in a function call: If we use call((: t = (1, (f() + 2)), : t = (0, ()))); If you didn't lose the meaning of parenthesis, the compiler will face with ambiguous meanings. Now if we use call([: t = (1, [f() + 2]), : t = (0, [])]); Consider how you already understand what it does without my explanation (context-free). |
I'm sympathetic to the suggestion, but this misunderstands the questions:
Bolding mine: The key word in those questions is "current". Current C++ code, current C++ guidance. |
Yes. My answers are for current Cpp1. I have to explain them with this example: #include <iostream>
class parts {
public:
parts(int num) {
std::cout << R"({ "ID": 1 })";
}
};
int main() {
parts p{0};
} OK. This compiles fine. The output Now, someday someone decides to add another constructor to #include <iostream>
class parts {
public:
parts(int num) {
std::cout << R"({ "ID": 1 })";
}
parts(std::initializer_list<int> args) {
/* statements... */
}
};
int main() {
parts p{0};
} OOPS! We didn't change the first constructor, but the second constructor would change the behaviour of object construction in Programmers have to learn and keep in mind that they have to care about initializer lists when they create constructors. It leads to more thinking for code management, and it will make guidance literature bigger. Also in current Cpp2 the same problem happens. In current Cpp1 to avoid the problem, one can use |
run: (args...: int) = {}
// : (args: int, ...) = {} Now, syntactically Cpp2 can disallow a function (or constructor) to have the following overloads at the same time: run: (arg1: int) = {}
run: (args...: int) = {} // ERROR!
// : (args: int, ...) = {} // ERROR! The above example will be like the following example with run: (arg1: int) = {}
run: (args: std::initializer_list<int>) = {} But because Cpp2 is about C++20/23/... as stated in |
Some of this is getting confusing for me, so here's some feedback. Secondly, I really like the idea of arrays of constructors but
this is just an abomination, the other way is much more pleasant. Thirdly, about the initialiser list thing. I think the simplest solution would be first require ALL contructors to use parentheses
Cppfront will NEVER call any initialiser list contructors with this syntax. Now, from a cppfront point of view, initialiser list contructors could be viewed as just simple contructors taking in arrays for parameter
This will call the initialiser list contructors and it even makes sense as initialiser list behave like arrays. Someone who's new to cppfront will not have to know what initialiser lists are, this is intuitive. |
Thanks for your feedback. I'm agree about multidimensional arrays, but jagged arrays are currently supported in C++.
You're right. That syntax is for if the type is not known, but technically to support that syntax, it has to make the following notation to have a x: = ("a", 1);
// It's type is std::pair<std::string, int> It's not a general solution, and it's not needed if Cpp2 supports the following syntax. On the other hand as you've mentioned the following syntax is both general and expressive when the type is known with the help of directly calling constructors within arrays; x1: vector<pair<string, int>> = [("a", 1), ("b", 2), ("c", 3)];
In variable declaration, if the constructor has one argument in addition to a: obj = 1;
b: obj = (1);
c: obj = "text";
d: obj = ("text");
e: obj = [1, 2];
f: obj = ([1, 2]); But parentheses are required for multiple arguments: a: obj = (10, 2);
b: obj = (10, "text");
c: obj = (10, [1, 2]); Consider
Yes, that's it. Arrays will be treated like a literal in this way. Parenthesis won't call the constructor with initializer list parameter. |
One thing I realised a few days ago is that jagged arrays could be thought of as a tuple of arrays. |
Jagged arrays are different from a tuple of arrays, because the length of tuples are fixed. But yes, any extra syntax is not needed for array literals to have jagged arrays in Cpp2: a: vector<int> = [1, 2, 3];
b: vector<int> = [1, 2];
c: vector<int> = [1, 2, 3, 4, 5];
x: vector<vector<int>> = [a, b, c];
d: vector<int> = [1, 2, 3, 4];
x.push_back(d);
// x == [a, b, C, d] |
Or with new constructor syntax as described in this issue, it would be like this: x: = [[1, 2, 3]vector<int>,
[1, 2]vector<int>,
[1, 2, 3, 4, 5]vector<int>,
[1, 2, 3, 4]vector<int>
]vector<vector<int>>; And with type deduction for template arguments: x: = [[1, 2, 3]vector,
[1, 2]vector,
[1, 2, 3, 4, 5]vector,
[1, 2, 3, 4]vector
]vector; With the help of either type aliases or UDLs (aka Non-member Constructors): // Either type alias
v: <T> type == std::vector<T>;
// or Non-member Constructor
v: <T> (list: std::initializer_list<T>) -> type == std::vector<T> = {
return list;
}
x: = [[1, 2, 3]v,
[1, 2]v,
[1, 2, 3, 4, 5]v,
[1, 2, 3, 4]v
]v; |
For a: = call(: () -> std::vector<int> = {1, 2, 3}); // {} is a list.
b: = call(: () -> std::vector<int> = { /*statements*/ }); // {} is a block statement.
c: = call(: () = {}); // {} is a block statement.
d: = call(: () -> _ = {}); // {} is a list. The only thing against Another Considered AlternativeAlso x: vector<int> = (1, 2); // This is an array of one `int` with value 2.
y: vector<int> = (1, 2):list; // This is an array of two `int`s with values 1 and 2.
But the problem of this approach is that we have to write // This is an array of two `int`s with values 1 and 2.
z: = (1, 2):list:vector<int>; On the other hand, if x: vector<int> = (1, 2); // This is an array of one `int` with value 2.
a: = (1, 2):vector; // The same
y: vector<int> = (1, 2):list; // This is an array of two `int`s with values 1 and 2.
b: = (1, 2):list:vector; // The same
z: vector<int> = (1, 2, 3); // This is an array of three `int`s with values 1, 2 and 3.
c: = (1, 2, 3):vector; // The same So the constructor with Abc: <T> type = {
// General constructor for sequence of `T`s
operator=: (out this, initializer_list<T>) = { /*statements*/ }
// This is a specialized constructor for one item of `T`
operator=: (out this, a: T) = { /*statements*/ }
// This is a specialized constructor for three items of `T`s
operator=: (out this, a: T, b: T, c: T) = { /*statements*/ }
}
main: () = {
// It calls the specialized constructor for one item of `int`.
a: Abc<int> = (1);
// It calls the general constructor for sequence of `int`s.
b: Abc<int> = (1, 2);
// It calls the specialized constructor for three items of `int`s.
c: Abc<int> = (1, 2, 3);
// It calls the general constructor for sequence of `int`s.
d: Abc<int> = (1, 2, 3, 4);
} By the way, it can be a bad API if those specialized constructors had completely different behaviours as |
Sorry for misunderstanding, I thought the suggestion was about fixed length arrays. But I think cpp2 should support fixed length arrays as inbuilt as one can always resort to std::vector. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Preface
The idea of this suggestion is gathered from discussion in this issue.
I have to mention that
(...)
is already for calling constructores, grouping expressions and initializing lists.Now, consider the following ambiguities:
(1)
as an expression, are they parenthesis around a value? Or is it an array of one item?()
is not an empty array, but it calls default constructor in variable declarations.(1, 2)
in declarations, is it the arguments of a constructor? Or is it an array of two items?Yes I know that
std::vector
has a bad API design, but I ask myself why would Cpp2 (like Cpp1) allow libraries to have this ambiguity in the first place?Having array literals with a different syntax, will solve those three ambiguities. I suggest to use
[...]
for array literals:Also nested
(...)
s or;
s or etc, will create multidimensional arrays, because they don't create a new array, and they are for mathematical grouping (as they are used to group expressions and to change the precedence of operators). On the other hand, nested[...]
s will create jagged arrays, because they create a new array:I currently do not suggest to support multidimensional arrays, but it's a possibility to consider in the future.
Suggestion Detail
Three options are available instead of
()
for array literals:<...>
is already for template parameters/arguments. It's not a good choice, because:<a < b, c > 2>
.[...]
is already for accessing items of an array. It seems to be a good choice.{...}
is already for function/statement blocks and type definitions. It can be considered as a good choice.Now, it's the time to compare both
[...]
and{...}
for array literals:OK. Both of them look good. So what if we want to write an empty array?
[]
is clearly an empty array, but{}
can be either an empty function/statement block or an empty array in which it depends on the declaration. For example:{}
is visually surprising and inconsistent forx1
,x2
andx3
, although they look the same:x0
declaration,{}
is an empty array.x1
declaration,{}
is an empty statement block.x2
declaration,{}
is an error, because it must end with;
.x3
declaration,{}
is an empty array!So
[...]
is more expressive than{...}
for array literals.Now let's consider this situation in the following example:
The first
[...]
creates an array, and the second[...]
accesses an item from it. A sequence of[...]
s is not ambiguous, because its behaviour is similar to parenthesis:x0: = (call() + something)(1);
The first
(...)
groups the operands ofoperator+
, and the second(...)
callsoperator()
on the result.Your Questions
Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?
Yes. If a bad API design can suddenly change the meaning of code, it's going to be a security vulnerability. This suggestion is a way to prevent it by separating arrays from constructors and expressions.
Will your feature suggestion automate or eliminate X% of current C++ guidance literature?
Yes. It's not needed to learn if user-defined constructors are ambiguous with initializer lists, because it prevents ambiguous situation completely. It allows more API choices.
Considered Alternatives
An alternative solution was that if a type has ambiguous constructor with initializer list, it should be a syntax error. By the way, this approach wouldn't fix the bad API design of
std::vector
.Another alternative solution was a little complicated. The idea was to favor constructors over initializer list, and to consider a comma-separated list with parenthesis to be an initializer list:
With the help of unnamed variable declaration and indirect initialization, it could be used like this:
But I gave up on this idea, becuase it would encourage unnamed variable declaration more than necessary.
Finally I considered to use literal templates syntax:
But I gave up on this idea too, because
(1, 2)<int>
would require to always specify the type, and(1, 2)list
would make user-defined literal suffixes to be look like constructors.Edits
The text was updated successfully, but these errors were encountered: