-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: disallow LTR/RTL characters in string literals? #20209
Comments
String literals must be able to hold any byte sequence whatsoever. Using a string to hold binary data is not uncommon and this proposal will introduce a forbidden byte sequence. I understand the problem (nice trick BTW), but I don't think this is a way to solve it. |
Yes, but not in their raw form. Only if they're escaped. Even today: s := "`\"`" ... you need to escape the double quote. We could require that RTL/LTR characters also need to be escaped. |
Yes, but that's unfortunately only a partial solution: https://play.golang.org/p/ZL_QF7xc-e. |
It would need to be forbidden from `` as well.
…On May 2, 2017 19:01, "cznic" ***@***.***> wrote:
We could require that RTL/LTR characters also need to be escaped.
Yes, but that's unfortunately only a partial solution:
https://play.golang.org/p/ZL_QF7xc-e.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#20209 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAH6Gbwbo-lFuOIrMxjxn-Vhz7D-cgn0ks5r11N1gaJpZM4NOMYM>
.
|
@karalabe, yes. |
I don't like to say that, but then we are back to square 1 (modulo the backtick). To make things even worse, there's also the Go 1 compatibility promise and I'm quite unsure if such breaking change qualifies under it. |
That's the catch 22 :) |
Perhaps a nice compromise (alas it makes for ugly parsing rules and introduces corner cases) is to forbid the two characters in single line strings (irrelevant of the type) and only allow it in multiline strings for lines that are fully inside (i.e. not the same line as the opening or closing tick is on). In reality I think it's enough to disallow for the closing tick's line. These "forbidden" cases would already even now screw up the code so I cannot imagine anyone benevoletly using the in such a way. Yet by limiting to the closing mark only, we can retain the functionality of rtl characters that some may have in backtick code. All in all personally I'd just forbid everywhere but I'm not accustomed to rtl use cases so I can't really comment on how people use them in completely valid and meaningful ways. |
This could be handled by a linter of some kind, but then people need to be aware of that tool and run it. I know this doesn't match all the criteria for go vet, but adding it there may be best. If go vet is added to go test per #18084 it would be run by default in a lot of situations making it harder to sneak this into a codebase. Of course, that doesn't close the hole: it just makes it smaller. Given how unlikely this is to be done on purpose without malicious intent, breaking Go 1 (which is allowed when it comes to security) and making it illegal across the board probably wouldn't cause anyone any pain. |
I think solving this problem in the language is the wrong place. If you're not doing code reviews, you have bigger problems. |
GitHub flat out stated they don't care about these kinds of attack vectors. Given that most Go code is hosted on GitHub, most reviews will be done there too. Also these two characters are not homoglyphs, they are invisible. |
They are currently invisible. I am proposing to make them visible in a code review tool. |
Developers use GitHub, whether you like it or not. Telling the entire ecosystem to go use something else isn't really feasible. But even if one project or another would actually go and use something else, you still have the issues when vendoring in code. Do you have the capacity to go over every line of every package you depend on to see if there's something malicious accidentally added to it? In a perfect world, the answer is yes, but we don't live in such a world. It is easy to blame the user for using it wrong, but if something is easy to use wrong, is really the user to blame? In this instance it's easy to say "just use another tool", but this requires users to be aware of the issue in detail, and to know exactly what other tool to use, why and even then what to look for. Compared to requiring everyone to change their entire tooling, a minor spec change can solve it elegantly for all, without needing to make the community aware of one more complexity that can end very badly if taken lightly. |
While the minor spec change may solve this particular problem, the root cause stays: If you don't properly review [untrusted] code with the proper tools you are to blame yourself for the consequences. There are many other ways how to sneak malicious code into a repository in a way which is obfuscated/hidden/confusing on the same level as what this issue is about. Resolving only one of the cases is not a solution. |
That's what I'm trying to accomplish here :) Add support for avoiding this into the proper tool, the language.
Please provide examples for such. I've yet to see anything so deceitful as this one.
It is exactly infinitely better than resolving 0. |
For example, without going into details, any exec.Command(var1, var2, ...), ie. not using string literals. Level 0, admitted, but many people will simply never look at the code at all, less on where the values come from and/or how they are computed at runtime. rot13 would be enough to just move on for many out of sheer laziness. (I've recently published something with a string encoded blob of gzip of gob of VM code of a C program, source of which is not even included in the repository. Moreover, gob outputs for the same input are not reproducible so is the gzip. Here not even a proper code review
Well, the difference is 1, yes, but the ratio is actually NaN ;-) |
These things really seem more appropriate for code review or diff viewing tools. Git already marks up trailing whitespace in screaming red, it could do the same for LTR/RTL marks. Restricting Go doesn't fullly solve the problem anyway: there are lots of shell scripts and Makefiles in people's repos that you could use instead. |
The open questions for this issue, #20210, and #20115 are:
I don't think we know the answers to any of this. Without careful thought, it seems like it might be OK for vet to contain these kinds of checks, provided the scope is limited to Unicode similarity problems and the implementation scope can be cleanly restricted. But I don't know. |
Without discounting the dangers, I don't believe the language is the place to solve this, plus it's a very slippery slope to start stepping around the rich list of funny Unicode characters that can be exploited. The specification could become a mess. Tooling perhaps, but not the language specification. |
/cc @SamWhited who also works on golang.org/x/text/secure/precis, amongst other things. |
If there's a clean, stable, simple specification of what to avoid, I'm open to bradfitz's idea of requiring escaping: "\u200F" for U+200F RIGHT-TO-LEFT MARK. If there isn't, I agree with robpike's concerns. I don't know whether there is, though. @mpvl? |
@cznic I'm confused as to how prohibiting RTL inside backtick string literals (and regular string literals) brings us "back to square 1". Can you elaborate? |
Personally, I agree with Rob that the language itself should not deal with this. I suppose I could be convinced that there was a simple set of rules to apply, or a certain class of character that should never be allowed in a literal, but I suspect there won't be or you'll just run into a number of other places RTLO's or other characters could be put into the code and made to confuse people (maybe you could do something similar with an inline comment block, or with a rune literal, etc.) This is a matter of display, not syntax, and security concerns related to the display of code are best addressed by your favorite editor or vetting tool (Vim for instance shows me that as |
I think my "I don't like to say that, but then we are back to square 1 (modulo the backtick)." refers ty my earlier "String literals must be able to hold any byte sequence whatsoever. Using a string to hold binary data is not uncommon and this proposal will introduce a forbidden byte sequence.". TBH, I'm not sure if I have maybe thought about something different at that time as now my post seems confusing to me too, or at least not clear. |
Maybe something - perhaps the language, perhaps vet - should simply forbid literal non-printing characters above U+007E. |
Following a discussion with some colleagues, some notes in no particular order: It's not merely RTL overrides. https://play.golang.org/p/ifwGIJbjBA looks confusing without having non-printing characters. That example is: For "go get" path confusion, BiDi in general is a pain, let alone punycode. There's RFC 3987 Example 5: https://bugs.chromium.org/p/chromium/issues/detail?id=683314 is a Chromium bug about "аррӏе.com" where that first term is 5 Cyrillic letters, not the 5 latin letters "apple". In Go: https://play.golang.org/p/j-50CRKctO. I believe that, in Chromium, they have a narrowly defined filter for Cyrillic or Greek homographs for Latin glyphs. For homograph attacks, not just LTR/RTL, there's http://www.unicode.org/reports/tr39/#Confusable_Detection although IIUC it may be too aggressive, as it marks "m" (em) and "rn" (arr en) as confusable. The Unicode Cf "Other, Format" category is http://www.fileformat.info/info/unicode/category/Cf/list.htm In summary, it's complicated, and therefore the response is probably in the tools rather than the language. |
Thanks @nigeltao. It definitely sounds to me like we should be experimenting in a tool (maybe vet) and not in the spec. Does someone want to come up with a list of what a vet "unicode" check would check? |
Updated my comment above - meant 20115 not 20215. |
Does someone want to come up with a list of what a vet "unicode" check would check? |
In #40717 I propose 3 codepoints to add to that list for a vet check: U+115F, U+1160, U+3164 |
"‘Trojan Source’ Bug Threatens the Security of All Code": https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/ Apparently some researchers with the University of Cambridge have given this (type of) bug a cool-sounding name now. |
This PR adds a linter to disallow the use of directional formatting characters to help prevent them being used to get malicious code past code review. Ideally our code-review tool would highlight such characters for us since such characters might routinely appear in binary artifacts. See also: - https://www.trojansource.codes/ - https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html - golang/go#20209 Release note: None
TL;DR https://play.golang.org/p/LPkPTRF7fC
The above code looks quite plain and obvious, except it does something completely different than you'd expect (feel free to run it). The obvious thing that should happen is that it counts the number of bits set in the given string. The non-obvious thing that happens is that the mask is actually
0
, not0x01
.So, what happened there? The abuse in the above code is around the invisible Unicode characters that mark following text to be right-to-left or left-to-right. Since Go permits arbitrary Unicode characters to be present in string literals, it also allows me to have a string of the form "bla bla blaabc". Since we're dealing with valid Unicode sequences here, any modern editor/website will actually interpret those special characters, causing the content in between the two special marks to be reversed to the end of the line (alas still part of the string literal).
In my playground code this is abused by having the following source code:
Which will be displayed by all modern editors/websites as:
The security aspect of this issue is social engineering attacks. The "display" line of my sample code is obvious beyond doubt, so noone will ever inspect such a thing; however it managed to flip one bit in a mask (imagine doing this for file permission umasks). The issue is that such code could easily get past reviews and into a final product.
The tricky part is how to avoid these issues. My only meaningful suggestion would be to disallow these two special characters in string literals. This does break Go 1.0 compatibility guarantees, however I think it's worth it (I can't really figure out a meaningful use case for it). Using it in a single line string will screw up the display of the source code, so it doesn't make sense to do it, and using it in a multi line string is questionable at best. I think requiring users to explicitly use
\x
notation for adding these characters it a good compromise to protect source code sanity.The text was updated successfully, but these errors were encountered: