-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUGGESTION] in cpp2 mode, string literals should be string_literal by default #45
Comments
I think this is a great idea, but I propose a slightly different approach. Instead of having string literals be The reason for this is because, in general, Consider how you would interface with a C library assuming Cpp2 string literals are str:= "a Cpp2 string literal"; // actually a std::string_view
some_c_function(str.data()); // OK, str.data() is null-terminated You have no option but to call the member function str: std::string = "a regular Cpp string";
substr: std::string_view = (str.begin(), std::next(str.begin(), 3));
some_c_function(substr.data()); // ¯\_(ツ)_/¯ I suggest implementing strongly-typed string literals in Cpp2. This type could have an implicit conversion operator to The code above then becomes str:= "a Cpp2 string literal"; // actually a cpp2::string_literal
str2: cpp2::string_literal = "another Cpp2 string literal"; // the type can also be given explicitly
some_c_function(str.c_str()); // Guaranteed to be null-terminated Of course, this doesn't prevent the user from passing a non null-terminated string to a C function, but it does a much better job at educating users to call In fact, this could be pushed even further and have cppfront emit a warning (or error?) whenever it sees |
I think this is well said... is it string literals themselves, or the pointer arithmetic, that is the source of bugs?
So given that Cpp2 bans pointer arithmetic in safe code, is there a remaining problem with string literals that forcing them to be strongly typed will solve, and reduce a class of CVEs or reduce a class of things we have to teach? I'm not against it, I'm interested, I just want to be sure it's solving a problem not already solved... What do you think?
Yes, that's a major reason we can't use it generally for stringlike things. This isn't a criticism of
Thanks. If we do need a strongly typed string literal, that sounds promising. |
You are right that the bad things will effectively happen in cpp mode, as cpp2 bans pointer arithmetic. But from a safety perspective it is nevertheless bad to teach everybody that strings are char*. Consider this code here:
It will compile happily in cpp2 mode, but bad things happen. Of course the real problem is the type of myprint, not the string literal per se. But as long as string literals are simple pointers people will pass strings like that.
But that is probably just a limitation of the current cppfront. Mario's suggestion of introducing a |
I do support this idea as well. The proposed It would also seamlessly integrate with I would also suggest an explicit cast to EDIT: Made a quick mock-up as an experiment on godbolt. |
I just realized that my example will fail (as in: compile and crash) even if the signature of |
While, to my knowledge, there are no CVEs directly caused by string literals themselves, discouraging the use of raw pointers is a good idea and would force developers to avoid pointer operations and C-style string APIs, which can avoid sudden bugs as @neumannt has mentioned. Another advantage of using a custom literal type is that it would be an actual |
I agree with all this. However, do we really want to be forced to wrap string literals with
If all Cpp2 functions that require a string parameter are forced to take it as a The same is true if we create a string literal and immediately need to navigate it, without passing it to a function. We have no alternative but to wrap it in a From your recent ABI design note in the wiki you mention that
Whenever we write the following str:= "I want a string literal, please and thank you" we are asking for a string literal, but in return we get a pointer 😢. It seems like the code we wrote doesn't mean what we wanted it to mean. Suppose we didn't have a @switch-blade-stuff, I was actually thinking of a slightly different implementation. In my opinion this doesn't have to be a special type like str:= "a cpp2 string literal" // deduced as cpp2::string_literal
c_str: *const char = "a C-style string literal" // explicit type provided Alternatively, you could instead completely ban C-style strings. Also, I don't think we should have the conversion operator to I have an implementation of Here's a very simplified implementation on godbolt with a very incomplete example. I'd be happy to submit a PR with an implementation of |
This suggestion, that the language generate a library type for a problematic language object, follows a pattern in recent C++ evolution. The case of Now, cpp2 can change the semantics of syntax2 so is free to break from the expeditious ISO approach (library proposals are easier to land than language proposals and not so constrained by compatibility). Why not instead improve the language array that is builtin to cpp2, so making all arrays better and safer, string literals included. I believe this can be done without any new syntax, i.e. with only semantic changes to existing syntax. First, disallow the implicit eager array decay to pointer. Second, add array-array copy semantics as proposed in P1997 (cpp2 can go further to address formal array parameters). Third, allow array-array comparisons (lexical as usual, following element comparability). Note that string literals are already special-cased with copy-init semantics, in C and C++. These semantic changes make the cpp2 builtin array a regular type, copyable and equality comparable, and get rid of C array decay (if needed, a pointer can be explicitly extracted as Language-provided string literals are special. They have important implementation defined behavior such as possible 'interning' - sharing of storage - that means their ids cannot be guaranteed unique. Cpp2 could make this language defined behavior. On the other hand, it's probably best left as is for implementers to decide. Thornier is the issue of what to do about null termination. It's needed for compatibility yet also still useful and a good choice in many cases. This might argue for a bifurcation. One way to do this would be with a language span type that only binds to constexpr null terminated char arrays. Let's strive to fix underlying language issues in C++ and its C subset first. |
Related to the part of #159 that proposes a prefix (e.g., |
Have you ever seen anyone do this? Why would you need to navigate an immutable literal? You already know at compile time every detail about it... |
This is an implementation-defined trick, but what you can do is pass The resulting type name constant will also be completely stripped of the rest of the function signature fluff. |
Just to note, compile-time code execution is indeed part of the roadmap of CPP2. |
compile-time code execution is great, what I have doubts about is the need for compile-time string literal slicing. If you've got several parts of interest in a single string literal such that you need to parse it and slice it to get the relevant parts - it'd just be easier and more readable to put these parts in separate variables to begin with... Unless there's overlap between the slices - but that's a very niche scenario, which doesn't justify adding a language feature IMO. |
Does this trick require the use of pointer arithmetic? If so, how bad would it be if we wrapped this parameter in a string_view in order to parse and trim it? How often would we encounter such code? |
You would wrap it in a string_view like container for parsing, parsing pointer strings isn't the most convenient thing to do. As for frequency, it definitely isn't something youll see every day. |
Thanks again for this suggestion. After re-reading through the thread, and trying out a small When I started to implement it, I found that I wanted to do it right including supporting |
Raw pointers are fundamentally problematic if we have to do pointer arithmetic or offset based access. A major source for pointers, that we inherited from C, are strings. There are safer alternatives to C strings (i.e., string_view), but they are not used by default.
We still need classic C strings for compatibility reasons, but the default should be a safe construct.
Thus, I would suggest that a string literal in cpp2 mode is considered a string_view by default. Traditional C strings could be constructed by using, e.g., the c suffix:
cpp2: "abc" -> cpp: "abc"sv
cpp2: "abc"c -> cpp: "abc"
That would make string handling much cleaner and safer. Compatibility with existing code is a concern, but as cpp2 code is by definition new anyway, we can always add the 'c' suffix if needed to get a traditional C string.
The text was updated successfully, but these errors were encountered: