[SUGGESTION] Change string interpolation result from a string to a lazy-evaluated parameterized-string #627
Replies: 10 comments
-
This feels like the wrong level for security. First, currently, an interpolated string literal results in a Second, I think sanitation should be orthogonal to concatenation. |
Beta Was this translation helpful? Give feedback.
-
Lazy-evaluated interpolated strings would be like capture by reference within string literals. Consider if we could use x: = 10;
// cpp2::to_string(x)
a: = "(x)$";
// lazy-evaluated, capture by reference
// cpp2::embed(x&)*
b: = "(x&)$*"; |
Beta Was this translation helpful? Give feedback.
-
I think the concept of capture by reference in string interpolation is problematic. You introduce functions with side effects which can be overlooked quite easily, thus introducing new errors. If you want to create lazy interpolation strings, then I would expect a function object which is then called accordingly:
|
Beta Was this translation helpful? Give feedback.
-
I was thinking of just using a function expression, |
Beta Was this translation helpful? Give feedback.
-
Thanks for the feedback. I'll try to answer all of the points given. Wrong Level of security: Sanitization of input is NOT a solution, and is in fact an anti-pattern, since it assumes we know at the point of input what sanitization will suffice for output even though we usually have multiple output sinks for the same input, i.e. db, log, screen, network. Introducing the same security vulnerabilities that come with using lambdas that capture by reference: my intent is actually copying values, in assumption that in most cases this is no more intensive than generating a string with to_string. However, after the remarks given here I suppose support for move and by_ref will need to be given. I was thinking of just using a function expression: Sounds like a great idea for capturing without additional library constructs like my parameterized_string. To be honest, I wrote the parameterized_string simply for POC purposes and then thought "why not use it in other places"... |
Beta Was this translation helpful? Give feedback.
-
@MaxSagebaum I now see I misunderstood the meaning of your suggestion for a function call - you suggested generating a lazy evaluated binding as a function that later accepts input parameters and fills in the blanks - this is similar to std::format/{fmt}. So, if it were to be a function than it would be a function accepting a renderer object or a pair of callbacks. |
Beta Was this translation helpful? Give feedback.
-
Ok, now I understand it better. I doubt, that a new string class will be immediately used everywhere. It needs to be introduced in the libraries and users need to be made aware of it. If you forbid the automatic cast to std::string a lot of errors will be generated. In most cases the user will use the available It might be nice to have such a thing, but it might send a counterproductive signal. |
Beta Was this translation helpful? Give feedback.
-
The question is: why would you need a conversion to string?
BTW Conversion to string is always possible via stringstream, it's just unconvinient. This is intentional - to make writing dangerous code longer/more verbose so it isn't the first option taken. And again - input arguments are impossible to correctly sanitize since you cannot predict what output they will be used for (attempting to match the sanitization to the expected output sinks creates brittle coupling between input sanitization and output). (BTW, this is contrast to validation which is definitely possible and a must) |
Beta Was this translation helpful? Give feedback.
-
This is somewhat alleviated by Cpp2 not having default captures. |
Beta Was this translation helpful? Give feedback.
-
How about starting with supporting programmer explicit buy-in of this feature via a string prefix? i.e.: |
Beta Was this translation helpful? Give feedback.
-
**The idea:
For the interpolated string:
"select email from demo.useremails where username = (name)$ and type=(emailType)$"
Instead of interpolated strings generating the following cpp1 code:
"select email from demo.useremails where username = " + cpp2::to_String(name) + " and type=" + cpp2::to_string(emailType)
it will generate this code:
"select email from demo.useremails where username = " + cpp2::embed(name) + " and type=" + cpp2::embed(emailType)
This results in a parameterized_string type, which lazily holds the string literals and embedded arguments for later encoding and effectively implements the recommendation in LangSec revisited: input security flaws of the 2nd kind of delaying input sanitization and encoding to the output phase (where output encoding is known).
Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?
This has the potential to transparently and implicitly reduce or eliminate the following Top 25 CWE categories (by rank):
(2) - CWE-79 - improperly neutralizing input when generating web pages (cross-site scripting)
(3) CWE-89 - improperly neutralizing special elements in SQL commands (SQL injection).
(6) CWE-78 - improperly neutralizing special elements in operating system commands (OS command injection).
(not-in-top25) [CWE-117] Improper Output Neutralization for Logs (log pollution/log forging)
to a lesser degree:
(8) CWE-22 - improperly limiting pathnames to restricted directories (path traversal)
(17) CWE-77 - improperly neutralizing special elements in commands (command injection).
(24) CWE-611 - improperly restricting XML external entity references.
(25) CWE-94 - improper control of code generation (code injection)
Note that these are all security vulnerabilities typically created from string concatenation with poorly sanitized/validated user input.
Describe alternatives you've considered.
Add a special interpolate-string prefix to control whether generated code creates a string or a parameterized-string, but this is not needed by simple adding an operator<< for parameterized strings.
Current methods are relying on programmers to be aware of the above mentioned CWEs and explicitly avoiding them.
By implementing lazy-concatenation and evaluation as interpolated strings default we remove the neccessity for programmer's awareness. Instead, the anus falls on library implementers to stop acceptting strings as input and instead support parameterized strings and perform appropriate validation,sanitization and encoding on them.
A good example is automatically generating a parameterized-sql-query from a parameterized string.
Another example is removing cr/lf characters when logging embedded arguments (but leave them for the string literals, thus supporting multiline log output but preventing unsanitized user input from causing log pollution or forgery)
Same goes for concatenating hard-coded html code with user supplied inputs, passing arguments to an os command by concatenating with user supplied inputs, etc.
**Change to cppfront:
The change to the cppfront is minimal: simply replace output of "cpp2::to_string" with "cpp2::embed" when handling interpolated strings and add an include file for "parameterized.hpp" which will contain the parameterized string implementation.
**POC
Checkout the fork at:
https://github.com/yarivtal/cppfront_parameterized_strings
"Example" folder contains both cpp2 samples and the generated cpp files (so you don't have to generate yourselfs)
The parameterized strings are implemented in a small library I created : pasteur
Its code is included in the fork for easier use, but you can find the lib here: https://github.com/SecureFromScratch/pasteur
Beta Was this translation helpful? Give feedback.
All reactions