Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance][windows][stdio] write_valid_utf8_to_console should almost certainly call MultiByteToWideChar #107092

Closed
strega-nil-ms opened this issue Jan 19, 2023 · 12 comments · Fixed by #107110
Labels
O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@strega-nil-ms
Copy link

(and similarly, read_u16s() should call WideCharToMultiByte)

Windows provides these nice functions which performantly implement utf-8 <-> utf-16 conversions. We should probably call them in the standard library when possible.

@ChrisDenton ChrisDenton added O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 20, 2023
@the8472
Copy link
Member

the8472 commented Jan 20, 2023

Do we get measurable performance wins that make calling more unsafe C APIs worth it? Especially with the inlining barrier across languages.
And I would assume console printing is bottlenecked on the console rendering part anyway, not on string conversions.

@ChrisDenton
Copy link
Member

Do we have any good benchmarks for console I/O perf?

@the8472
Copy link
Member

the8472 commented Jan 20, 2023

A simple main writing 100MB worth of lines to stdout and measuring the time it takes should do?

@ChrisDenton
Copy link
Member

ChrisDenton commented Jan 20, 2023

For sure. I was just wondering if we had something already. I mean, we'd probably want to be testing a few different languages as well to make sure we're not just optimizing for ASCII only.

@the8472
Copy link
Member

the8472 commented Jan 20, 2023

I don't think we do because most benchmarks suppress console output.

@strega-nil-ms
Copy link
Author

strega-nil-ms commented Jan 21, 2023

I did do a benchmark, and interestingly, ASCII text was about twice as fast with MultiByteToWideChar as with rust-native code; however, Chinese text was slightly slower. Given that most of the actual cost of printing to the screen is in the actual printing, I no longer think it's a good idea to do this.

See benchmarks at strega-nil/rust-bench-cvt-utf8-utf16.

Run on M1 chip:

test tests::ascii_windows_rust_equal ... ignored
test tests::chinese_windows_rust_equal ... ignored
test tests::bench_ascii_rust      ... bench:      32,544 ns/iter (+/- 2,549)
test tests::bench_ascii_windows   ... bench:      17,571 ns/iter (+/- 835)
test tests::bench_chinese_rust    ... bench:       7,710 ns/iter (+/- 561)
test tests::bench_chinese_windows ... bench:       8,556 ns/iter (+/- 669)

someone may want to check on an x64 chip, since that might be faster.

@the8472
Copy link
Member

the8472 commented Jan 21, 2023

Just to check my understanding: The utf16 conversion only happens when one writes to an actually rendered console. There isn't some stuff like virtual consoles that act more like NUL where speeding up the conversion could help because the data will be discarded by the OS. And output to files is binary anyway.

@ChrisDenton
Copy link
Member

There are now (since Windows 10, 1809) pseudo consoles which are consoles without the rendering part. Though the intent would be for the rendering to be implemented by another program, it would also be possible to skip or delay (e.g. by writing a log) rendering.

@the8472
Copy link
Member

the8472 commented Jan 23, 2023

In that case a speedup may be useful, but is the optimization potential of the built-in conversion already squeezed dry?

@strega-nil-ms
Copy link
Author

@the8472 yeah, I think it's unlikely that the added complexity is worth it given what the benchmarks show; I would really appreciate if someone ran the benchmark on x64, since it may be that Windows implements faster algorithms on that platform.

@ChrisDenton
Copy link
Member

I don't think anyone has worked on optimizing this so I'm sure there's plenty of optimization potential. The results from I got from an AMD Ryzen 5 3600 machine are:

running 6 tests
test tests::ascii_windows_rust_equal ... ignored
test tests::chinese_windows_rust_equal ... ignored
test tests::bench_ascii_rust      ... bench:      19,443 ns/iter (+/- 244)
test tests::bench_ascii_windows   ... bench:       5,689 ns/iter (+/- 63)
test tests::bench_chinese_rust    ... bench:       8,925 ns/iter (+/- 107)
test tests::bench_chinese_windows ... bench:       8,065 ns/iter (+/- 214)

@strega-nil-ms
Copy link
Author

Mmh, thanks @ChrisDenton! I'll also push some benchmarks for testing other cases later, since there are plenty of other cases than just "bulk-converting a bunch of text". Thanks!

Nicole

compiler-errors added a commit to compiler-errors/rust that referenced this issue Feb 26, 2023
[stdio][windows] Use MBTWC and WCTMB

`MultiByteToWideChar` and `WideCharToMultiByte` are extremely well optimized, and therefore should probably be used when we know we can (specifically in the Windows stdio stuff).

Fixes rust-lang#107092
@bors bors closed this as completed in 3fe4023 Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants