Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation of is_whitespace inside rustc on Windows-msvc (with -Zdylib-lto) #109067

Closed
kpreid opened this issue Mar 13, 2023 · 22 comments
Closed
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-critical Critical priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@kpreid
Copy link
Contributor

kpreid commented Mar 13, 2023

Code

fn main() {
    println!(
        "Hello \
    • world!"
    );
}

Current output

warning: non-ASCII whitespace symbol '\u{2022}' is not skipped
 --> experiment\src\main.rs:3:16
  |
3 |           "Hello \
  |  ________________^
4 | |     • world!"
  | |     ^ non-ASCII whitespace symbol '\u{2022}' is not skipped
  | |_____|
  |

Desired output

No warning, since U+2022 "" is not whitespace.

Anything else?

stable-x86_64-pc-windows-msvc - rustc 1.68.0 (2c8cc3432 2023-03-06)

Weirdly, this warning only occurs on Windows. Unfortunately, I have no Windows systems to test on other than GitHub Actions, so it isn't convenient to test further. I hope this bug report is useful anyway.

@kpreid kpreid added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 13, 2023
@Noratrieb Noratrieb added the A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. label Mar 13, 2023
@Teapot4195
Copy link
Contributor

I have a spare windows system, I'll test further once I get home

@ehuss
Copy link
Contributor

ehuss commented Mar 13, 2023

Hm, I can't seem to see a way to reproduce this. There isn't any platform-specific code involved in that warning. Are the logs from the GItHub Actions run publicly available?

@kpreid
Copy link
Contributor Author

kpreid commented Mar 13, 2023

Hm, I can't seem to see a way to reproduce this.

To clarify, do you mean that you compiled the above code using a Windows system and saw no warnings?

There isn't any platform-specific code involved in that warning.

I'm not very familiar with Windows, but isn't it the case that Git checkout puts CRLF in text files on Windows, by default? That could have an influence while not being actually due to platform dependence in rustc, because rustc would be seeing '\\' CR LF spaces '•' instead of '\\' LF spaces '•'

Are the logs from the GItHub Actions run publicly available?

Yep — https://github.com/kpreid/all-is-cubes/actions/runs/4400743620/jobs/7706316716 is the run I used to test the above code. (It has a lot of other noise because it's just a quick hack upon my regular CI — if I have to I could set up a whole new repository with a custom setup.)

@ChrisDenton
Copy link
Member

ChrisDenton commented Mar 13, 2023

I can reproduce locally using stable msvc rustc:

Z:\test> rustc --version --verbose
rustc 1.68.0 (2c8cc3432 2023-03-06)
binary: rustc
commit-hash: 2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74
commit-date: 2023-03-06
host: x86_64-pc-windows-msvc
release: 1.68.0
LLVM version: 15.0.6
Z:\test> cat whitespace.rs
fn main() {
    "\
£";
}
Z:\test> rustc whitespace.rs
warning: non-ASCII whitespace symbol '\u{a3}' is not skipped
 --> whitespace.rs:2:6
  |
2 |       "\
  |  ______^
3 | | £";
  | | ^ non-ASCII whitespace symbol '\u{a3}' is not skipped
  | |_|
  |

warning: 1 warning emitted

Switching between Unix and Windows line endings does not seem to make a difference.

Using windows-gnu does not have this warning. But that makes no sense to me.

@ChrisDenton
Copy link
Member

Ok, this is super weird. It only appears to affect x86_64-msvc:

Z:\test> rustc +1.68-i686-pc-windows-msvc whitespace.rs
Z:\test> rustc +1.68-x86_64-pc-windows-msvc whitespace.rs
warning: non-ASCII whitespace symbol '\u{a3}' is not skipped
 --> whitespace.rs:2:6
  |
2 |       "\
  |  ______^
3 | | £";
  | | ^ non-ASCII whitespace symbol '\u{a3}' is not skipped
  | |_|
  |

warning: 1 warning emitted

@ehuss
Copy link
Contributor

ehuss commented Mar 13, 2023

Ah, I see, I was using aarch64-pc-windows-msvc. I tried x86_64-pc-windows-msvc, and that reproduces. How strange!

@ehuss
Copy link
Contributor

ehuss commented Mar 13, 2023

I did a quick bisect, and it points to #103591 as where the regression started (cc @lqd).

@ehuss
Copy link
Contributor

ehuss commented Mar 13, 2023

I did a little bit of poking around. Inside rustc, it definitely believes '£'.is_whitespace() is true. I don't particularly enjoy investigating inlining or llvm issues, so I probably won't poke around much more.

@ehuss ehuss added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-prioritize Issue: Indicates that prioritization has been requested for this issue. and removed A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. A-diagnostics Area: Messages for errors, warnings, and lints labels Mar 13, 2023
@ehuss ehuss changed the title False "non-ASCII whitespace" warning on Windows Miscompilation of is_whitespace inside rustc on Windows-msvc (with thin-LTO) Mar 13, 2023
@ehuss ehuss added the A-LTO Area: Link-time optimization (LTO) label Mar 13, 2023
@ehuss
Copy link
Contributor

ehuss commented Mar 13, 2023

I removed the diagnostic labels since this isn't an error in the diagnostics themselves. I'm assuming this is a miscompilation during thin-LTO, and thus implying it is related to LLVM, but the underlying cause could be elsewhere.

To anyone wanting to reproduce the necessary build settings, I think the only thing needed is rust.lto = "thin" and x.py build --stage=2 std.

@jyn514 jyn514 added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Mar 13, 2023
@jyn514
Copy link
Member

jyn514 commented Mar 13, 2023

Adding I-unsound since I see no reason to believe the impact is limited to is_whitespace.

@Noratrieb
Copy link
Member

I'll put up a revert of using thin LTO in the meantime.

@jyn514
Copy link
Member

jyn514 commented Mar 13, 2023

cc @lqd @Mark-Simulacrum @bjorn3 @Kobzol as the people involved in LTO for MSVC

bors added a commit to rust-lang-ci/rust that referenced this issue Mar 13, 2023
…jyn514

Revert "enable ThinLTO for rustc on x86_64-pc-windows-msvc dist builds"

This lead to a miscompilation in at least `char::is_whitespace` and probably in more unknown places.....

See rust-lang#109067

This reverts commit 684663e, PR rust-lang#103591.
@ehuss
Copy link
Contributor

ehuss commented Mar 13, 2023

Here's a partially minimized sample. It can maybe get smaller than this, but after a point it seems to be fairly sensitive to the code shifting around.

use std::ops::Range;
use std::str::Chars;

#[derive(Debug)]
enum EscapeError {
    UnskippedWhitespaceWarning,
}

fn scan_escape(chars: &mut Chars<'_>) -> Result<char, EscapeError> {
    let res = match chars.next().unwrap() {
        _ => panic!("invalid"),
    };
    // Ok(res)
}

fn unescape_str_or_byte_str<F>(src: &str, callback: &mut F)
where
    F: FnMut(Range<usize>, Result<char, EscapeError>),
{
    let mut chars = src.chars();

    // The `start` and `end` computation here is complicated because
    // `skip_ascii_whitespace` makes us to skip over chars without counting
    // them in the range computation.
    while let Some(c) = chars.next() {
        let start = src.len() - chars.as_str().len() - c.len_utf8();
        let res = match c {
            '\\' => {
                match chars.clone().next() {
                    Some('\n') => {
                        // Rust language specification requires us to skip whitespaces
                        // if unescaped '\' character is followed by '\n'.
                        // For details see [Rust language reference]
                        // (https://doc.rust-lang.org/reference/tokens.html#string-literals).
                        skip_ascii_whitespace(&mut chars, start, callback);
                        continue;
                    }
                    _ => scan_escape(&mut chars),
                }
            }
            _ => Ok(c)
        };
        let end = src.len() - chars.as_str().len();
        callback(start..end, res);
    }

    fn skip_ascii_whitespace<F>(chars: &mut Chars<'_>, start: usize, callback: &mut F)
    where
        F: FnMut(Range<usize>, Result<char, EscapeError>),
    {
        let tail = chars.as_str();
        println!("tail={tail:?}");
        let first_non_space = tail
            .bytes()
            .position(|b| b != b' ' && b != b'\t' && b != b'\n' && b != b'\r')
            .unwrap_or(tail.len());
        println!("first_non_space={first_non_space:?} start={start:?}", );
        if tail[1..first_non_space].contains('\n') {
            // The +1 accounts for the escaping slash.
            // let end = start + first_non_space + 1;
            // callback(start..end, Err(EscapeError::MultipleSkippedLinesWarning));
        }
        let tail = &tail[first_non_space..];
        println!("tail={tail:?}");
        if let Some(c) = tail.chars().nth(0) {
            // For error reporting, we would like the span to contain the character that was not
            // skipped. The +1 is necessary to account for the leading \ that started the escape.
            // println!("{:?}", '£'.is_whitespace());
            println!("first char is {c:?}");
            let end = start + first_non_space + c.len_utf8() + 1;
            println!("end is {end:?}");
            if c.is_whitespace() {
                println!("{c:?} is whitespace, err range is {:?}", start..end);
                callback(start..end, Err(EscapeError::UnskippedWhitespaceWarning));
            }
        }
        *chars = tail.chars();
    }
}

fn main() {
    unescape_str_or_byte_str("\\\n£",  &mut |_range, result| {
        eprintln!("cb={result:?}", );
    });
}

Build with rustc -Zdylib-lto -Clto=thin -Cprefer-dynamic=yes -Copt-level=2 main.rs
You should see:

tail="\n£"
first_non_space=1 start=0
tail="£"
first char is '£'
end is 4
'£' is whitespace, err range is 0..4
cb=Err(UnskippedWhitespaceWarning)
cb=Ok('£')

@jyn514 jyn514 added this to the 1.68.0 milestone Mar 13, 2023
@jyn514 jyn514 added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Mar 13, 2023
@workingjubilee workingjubilee added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Mar 14, 2023
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-critical

@rustbot rustbot added the P-critical Critical priority label Mar 14, 2023
@rustbot rustbot removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 14, 2023
@Noratrieb Noratrieb changed the title Miscompilation of is_whitespace inside rustc on Windows-msvc (with thin-LTO) Miscompilation of is_whitespace inside rustc on Windows-msvc (with -Zdylib-lto) Mar 14, 2023
@Noratrieb
Copy link
Member

Noratrieb commented Mar 14, 2023

I've split up this issue "we have shipped a miscompiled rustc" from "-Zdylib-lto is broken on windows-msvc (#109114)" so that we can close this issue soon after the backport while still keeping track of the other one.

@Noratrieb
Copy link
Member

#108978, a crash in rusc seen by multiple people was also caused by the same issue.

@Mark-Simulacrum
Copy link
Member

If folks here could verify this is fixed on the dev-static build (https://internals.rust-lang.org/t/rust-1-68-1-pre-release-testing/18547 for instructions) that would be great - we're expecting a point release on Thursday with what is believed to be a fix.

@lqd
Copy link
Member

lqd commented Mar 21, 2023

There's no warning on the MCVE in the OP on 1.68.1.

@kpreid
Copy link
Contributor Author

kpreid commented Mar 21, 2023

I can confirm that the program which I originally found the problem in now compiles without the warning.

@mac198442
Copy link

Seems to me the entire idea of doing this for dist builds before testing it on nightly builds was a bad idea.

@Noratrieb
Copy link
Member

Noratrieb commented Mar 26, 2023

"dist builds" refers to all distributed builds including stable, beta, nightly and CI artifacts from master or try builds. This went through the normal nightly/beta release cycle, but apparently no one found issues there.

@jyn514
Copy link
Member

jyn514 commented Mar 26, 2023

I'm going to close this since we've published a 1.68.1 point release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-critical Critical priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests