You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.
Close the Unicode RTL sections, remove trailing unclosed RTL characters in strings and comments, and warn the user about their existence to prevent CVE-2021-42574.
CVE-2021-42574, a.k.a. the Trojan Source vulnerability, is resulted from the unclosed Unicode RTL control characters. It allows the attacker to alter the order of sections of words in a line, and to make a commented section / a section inside a string to appear at the right side of the comment/string.
{lib,hello}:
hello.overrideAttrs(oldAttrs:
letscrSecure=builtins.trace"Using the secure source"oldAttrs.src;in{pname=oldAttrs.pname+"-secure";/*Replace the source with a secure onesrc = srcSecure;*/})
Output
{lib,hello}:
hello.overrideAttrs(oldAttrs:
letscrSecure=builtins.trace"Using the secure source"oldAttrs.src;in{pname=oldAttrs.pname+"-secure";/*Replace the source with a secure onesrc = srcSecure;*/})
Desired output
{lib,hello}:
hello.overrideAttrs(oldAttrs:
letscrSecure=builtins.trace"Using the secure source"oldAttrs.src;in{pname=oldAttrs.pname+"-secure";/*Replace the source with a secure onesrc = srcSecure;*/})
The trick
The execution result of the input is equivalent to that of the desired output, which would be surprising to reviewers. This is due to the explicit formatting support of the Unicode BIDI (bi-direction) algorithm with control characters:
When a explicitly-directional-changing section is not closed, it may infect the following characters in the same line.
See https://www.unicode.org/reports/tr9/tr9-42.html for more detail.
When a explicitly-directional-changing section is not closed, it may infect the following characters in the same line, and that's what happen to the line 8 of the Output:
/*Replace the source with a secure one<RLO><LRI>src = srcSecure;<PDI><LRI>*/
The solution is to:
Remove the <RLE>, <LRE>, <RLO>, <LRO>, <RLI>, <LRI>, <FSI> followed by ", '', */ or the end of line.
Otherwise, add the corresponding <PDF> or <PDI> before ", '', */ or the end of line.
Show warnings about the existence of the RTL control characters, especially when they are not closed in the unformatted file.
After that, the line should become
/*Replace the source with a secure one<RLO><LRI>src = srcSecure;<PDI><PDF>*/
(with the trailing <RLI> removed and <RLO> closed with <PDF>)
I'm not a Unicode expert. It would need someone who are more familiar with this topic to refine the proposal.
Reproduction of the example Nix expressions
Prepare the following python scripts:
gen_code.py
#!/usr/bin/env python3# Note the unclosed RTL sectioncode_string='''{ lib, hello }:hello.overrideAttrs (oldAttrs: let scrSecure = builtins.trace "Using the secure source" oldAttrs.src; in { pname = oldAttrs.pname + "-secure"; /*Replace the source with a secure one\N{RLO}\N{LRI}src = srcSecure;\N{PDI}\N{LRI}*/})''';
if__name__=='__main__':
print(code_string)
gen_code_clean.py
#!/usr/bin/env python3# The RTL sections are now closedcode_string='''{ lib, hello }:hello.overrideAttrs (oldAttrs: let scrSecure = builtins.trace "Using the secure source" oldAttrs.src; in { pname = oldAttrs.pname + "-secure"; /*Replace the source with a secure one\N{RLO}\N{LRI}src = srcSecure;\N{PDI}\N{PDF}*/})''';
if__name__=='__main__':
print(code_string)
Summary: This has to be addressed in editors/code review tools, not at lower levels of the tooling stack.
The main point in the two links is that the fix shouldn't be done in compilers (interpreters in the case of Nix). This formatter may be used as part of the editors or review tools (e.g. the jnoortheen.nix-ide VSCode extension), and the implementation may benefits developers using the related tools.
I'm fine having this in nixpkgs-fmt, we just need somebody to send in a PR. It would also be nice to have that as part of the Nix language as well, and the nixpkgs CI. It doesn't have to be an either/or situtation.
Proposal
Close the Unicode RTL sections, remove trailing unclosed RTL characters in strings and comments, and warn the user about their existence to prevent CVE-2021-42574.
CVE-2021-42574, a.k.a. the Trojan Source vulnerability, is resulted from the unclosed Unicode RTL control characters. It allows the attacker to alter the order of sections of words in a line, and to make a commented section / a section inside a string to appear at the right side of the comment/string.
https://github.com/nickboucher/trojan-source
https://www.trojansource.codes/
Boucher and Anderson (2021). Trojan Source: Invisible Vulnerabilities. https://trojansource.codes/trojan-source.pdf
Input
Output
Desired output
The trick
The execution result of the input is equivalent to that of the desired output, which would be surprising to reviewers. This is due to the explicit formatting support of the Unicode BIDI (bi-direction) algorithm with control characters:
When a explicitly-directional-changing section is not closed, it may infect the following characters in the same line.
See https://www.unicode.org/reports/tr9/tr9-42.html for more detail.
When a explicitly-directional-changing section is not closed, it may infect the following characters in the same line, and that's what happen to the line 8 of the Output:
The solution is to:
<RLE>
,<LRE>
,<RLO>
,<LRO>
,<RLI>
,<LRI>
,<FSI>
followed by"
,''
,*/
or the end of line.<PDF>
or<PDI>
before"
,''
,*/
or the end of line.After that, the line should become
(with the trailing
<RLI>
removed and<RLO>
closed with<PDF>
)I'm not a Unicode expert. It would need someone who are more familiar with this topic to refine the proposal.
Reproduction of the example Nix expressions
gen_code.py
gen_code_clean.py
The text was updated successfully, but these errors were encountered: