-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion `(expected_utf16_length) == (utf16_length)' failed #47457
Comments
Out of curiosity, can you test with node v19.6.1 and see if the same problem exists? |
No error.
|
/cc @anonrig This seems to be related to simdutf. |
cc @lemire |
Seems to be working 🙄 |
I don't use Windows, but on macOS, it does not crash. @lemire, do you see any particular reason why would this fail? |
This could fail if the input is not valid UTF-8, in which case |
I have opened a PR that might help with this issue: #47471 This being said, it is not clear to me where the error comes from. Something in Node expects UTF-8 and tries to convert it to UTF-16 but the UTF-8 is not valid. Where does the invalid UTF-8 comes from? Does it come from Node itself? |
Could it be associated with the result preview in the CLI? |
Almost certainly but I see the previews in my terminals as well. It is weird because Date() will result in pure ASCII. That's not the kind of string that should be problematic. There might be an issue elsewhere in Node.js. If you use a different terminal, does the same problem persists? Could you try Windows Terminal? |
Yes, on all the ones I have installed. I tried CMD, git bash, and power shell. I will try to instal WSL and test in there |
It works in Ubuntu wsl
|
The issue also occurs in Node 18.16.0 on Windows 11, but not in Node 18.15.0. |
Same problem here. Same circumstances and conditions. Using Windows 11 with Node v18.16.1. It happens under both PowerShell and Command Prompt. It doesn't happen when running inside WSL. Same message:
|
@lemire Your suspicions are right. I've just tried out with 20.3.1 and had no issue whatsoever. 🚀 Do you know if there is any chance to have this bugfix implemented in v18? |
Welcome to Node.js v20.3.0 versão |
And also for v18.17.1 and Windows 11. |
The issue also occurs for Windows 11 and Node.js v20.6.1 (works with v20.5.1). |
The error can also be provoked on Windows 11 and Node v20.5.1 by typing
The offending character is |
I can verify the problem with Node 20.7.0 under Windows 11. In fact, it appears that typing any non-ASCII character in the Node shell as the first character seems to trigger the check and lead to an abort. Just 'é' will suffice. The problem only occurs while typing directly in the Node shell. If you execute a script, it is fine. If I build Node from source under Windows, using the current Node code, and simply executing vcbuild.bat, and running the generated node.exe, I cannot reproduce the problem. So the bug is real, but it seems to have to do with how Node is built for windows as part of the release process. I would normally consider the possibility that the problem was recently fixed in the code, but Node 20.7.0 is a very recent release, and I find it unlikely that something happened in the last three days to the code to fix this issue. |
cc @nodejs/platform-windows @nodejs/platform-windows-arm @nodejs/build |
Not any non-ASCII character: Typing
|
Interesting. Again, if I build Node from source (Visual Studio 2022), I don’t see this effect. It would be important to know how the releases for Windows are built. |
Currently on Windows 2012r2 with Visual Studio 2019: https://github.com/nodejs/node/blob/main/BUILDING.md#official-binary-platforms-and-toolchains |
@richardlau When trying to build for vs2019, I hit #48987 That is, doing Is there some documentation on how the code is built for releases? Precisely? In a way that we can reproduce locally? Building for vs2022, I don't have the problem. |
As for what is happening... every time you press a key in the REPL, a JSON document is generated of the form... {"id":1,"method":"Runtime.evaluate","params":{"expression":"v","throwOnSideEffect":true,"timeout":333}} E.g., that is what is generated if I type the letter Importantly, the JSON is encoding in UTF-8, as it should. Now, this gets parsed in An easy way to debug this problem without any sophistication would be to patch the Node code like so... index 0f780f46c8..8cbe6ce05f 100644
--- a/src/inspector/node_string.cc
+++ b/src/inspector/node_string.cc
@@ -34,16 +34,27 @@ void builderAppendQuotedString(StringBuilder& builder,
}
std::unique_ptr<Value> parseJSON(const std::string_view string) {
+ printf("parsejson\n");
+ printf(" input: '%.*s'\n",(int)string.size(), string.data());
+ printf(" input: ");
+ for(size_t i = 0; i < string.size(); i++) {
+ printf(" %02x ", (unsigned)string[i]&0xff);
+ }
+ printf(" (%zu words)\n", string.size());
if (string.empty())
return nullptr;
size_t expected_utf16_length =
simdutf::utf16_length_from_utf8(string.data(), string.length());
+ printf(" expected_utf16_length %zu \n", expected_utf16_length);
+
MaybeStackBuffer<char16_t> buffer(expected_utf16_length);
// simdutf::convert_utf8_to_utf16 returns zero in case of error.
size_t utf16_length = simdutf::convert_utf8_to_utf16(
string.data(), string.length(), buffer.out());
// We have that utf16_length == expected_utf16_length if and only
// if the input was a valid UTF-8 string.
+ printf(" utf16_length %zu \n", utf16_length); This would print out the offending input and we could figure things out from there. |
Precisely, no, but the Jenkins job that does the release builds executes @StefanStojanovic added some documentation for creating local custom Windows VMs (e.g. VirtualBox) that reuse much of the Ansible setup we use to set up the CI (both test and release) machines but I've not personally run these myself https://github.com/nodejs/build/blob/main/doc/create-windows-custom-vm.md |
Ok, so I have dug more into what is happening in this code.
Seemingly, this is where things go bad under the released binaries under Windows but I cannot reproduce the problem with the binaries that I build. |
I have created PR #49780 where I attempt to remove this back and forth between UTF-8 and UTF-16 at each key press in the REPL. |
@lemire I might have some additional debug info that would related to this bug, We started seeing consistent failure in our CI for
The failure started after updating our branch from
As a next step, I tried to repro the failure locally and I was able to confirm the failure with VS 2019 while test passes when building Node.js with VS 2022. I added the debug logs to node/src/inspector/node_string.cc Line 61 in fce8fba
With this data, I tried to create a minimal repro outside Node.js and ended up with the following test case #include <memory>
#include <uchar.h>
#include <iostream>
#include "simdutf.h"
int main() {
const uint16_t buf[] = {123,34,105,100,34,58,49,44,34,109,101,116,104,111,100,34,58,34,82,117,110,116,105,109,101,46,101,118,97,108,117,97,116,101,34,44,34,112,97,114,97,109,115,34,58,123,34,101,120,112,114,101,115,115,105,111,110,34,58,34,99,111,110,115,116,32,120,50,32,61,32,39,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,39,34,44,34,116,104,114,111,119,79,110,83,105,100,101,69,102,102,101,99,116,34,58,116,114,117,101,44,34,116,105,109,101,111,117,116,34,58,51,51,51,125,125};
const char16_t* source =
reinterpret_cast<const char16_t*>(buf);
const size_t length = std::size(buf);
size_t expected_size =
simdutf::utf8_length_from_utf16(source, length);
std::unique_ptr<char[]>buffer(new char[expected_size]);
size_t out_size =
simdutf::convert_utf16_to_utf8(source, length, buffer.get());
std::string final_string(buffer.get(), out_size);
std::cout << final_string << std::endl;
printf("%zu : %zu\n", expected_size, out_size);
assert(expected_size == out_size);
return 0;
} The & 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64\cl.exe' simd_test.cpp C:\Users\demohan\github\vscode-node\deps\simdutf\simdutf.cpp /I C:\Users\demohan\github\vscode-node\deps\simdutf /EHsc /Zc:__cplusplus -std:c++17 /W3 /link /MACHINE:x64 /LIBPATH:'C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\lib\x64' /LIBPATH:'C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\um\x64' /LIBPATH:'C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64'
& 'C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.37.32822\bin\Hostx64\x64\cl.exe' simd_test.cpp C:\Users\demohan\github\vscode-node\deps\simdutf\simdutf.cpp /EHsc /I C:\Users\demohan\github\vscode-node\deps\simdutf /FC /Zc:__cplusplus -std:c++17 /W3 /link /MACHINE:x64 /LIBPATH:'C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.37.32822\lib\x64' /LIBPATH:'C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\um\x64' /LIBPATH:'C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64' Results from VS 2019: > simd_test.exe
611 : 611 Results from VS 2022: > simd_test.exe
611 : 611 The length are as expected in both cases, so clearly a build flag is triggering the differences. I then copied the flags Node.js uses to build Results from VS 2019: > simd_test.exe
611 : 575
Assertion failed: expected_size == out_size, file simd_test.cpp, line 19 Results from VS 2022: > simd_test.exe
611 : 611 Additional data, between |
@deepak1556 That's a fantastic analysis. I'll investigate soon and, hopefully, I'll find a bug fix or workaround. Should be reasonably easy now that you have done all this work. |
@deepak1556 I reproduce the issue, but only on the icelake (AVX-512) kernel and only with Visual Studio 2019. Is the hardware that you are testing this on capable of AVX-512? E.g., Zen 4 or Ice Lake, something of the sort? Could you run the modified version of your code that looks like this : #include "simdutf.h"
#include <iostream>
#include <memory>
#include <uchar.h>
int main() {
const uint16_t buf[] = {
123, 34, 105, 100, 34, 58, 49, 44, 34, 109,
101, 116, 104, 111, 100, 34, 58, 34, 82, 117,
110, 116, 105, 109, 101, 46, 101, 118, 97, 108,
117, 97, 116, 101, 34, 44, 34, 112, 97, 114,
97, 109, 115, 34, 58, 123, 34, 101, 120, 112,
114, 101, 115, 115, 105, 111, 110, 34, 58, 34,
99, 111, 110, 115, 116, 32, 120, 50, 32, 61,
32, 39, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
39, 34, 44, 34, 116, 104, 114, 111, 119, 79,
110, 83, 105, 100, 101, 69, 102, 102, 101, 99,
116, 34, 58, 116, 114, 117, 101, 44, 34, 116,
105, 109, 101, 111, 117, 116, 34, 58, 51, 51,
51, 125, 125};
const char16_t *source = reinterpret_cast<const char16_t *>(buf);
const size_t length = std::size(buf);
for (auto &e : simdutf::get_available_implementations()) {
if (!e->supported_by_runtime_system()) {
continue;
}
std::cout << "testing " << e->name() << std::endl;
size_t expected_size = simdutf::utf8_length_from_utf16(source, length);
std::unique_ptr<char[]> buffer(new char[expected_size]);
size_t out_size =
simdutf::convert_utf16_to_utf8(source, length, buffer.get());
std::string final_string(buffer.get(), out_size);
std::cout << final_string << std::endl;
printf("%zu : %zu\n", expected_size, out_size);
}
return 0;
}
|
@anonrig I expect that bumping simdutf to 3.2.18 might fix this. Note that 3.2.18 is a patch release that merely disables the icelake kernel under Visual Studio 2019. The rationale is in the release notes. https://github.com/simdutf/simdutf/releases/tag/v3.2.18 |
@anonrig ❤️ |
@lemire yes the hardware supports some of the AVX-512 instructions, here is the trimmed output from coreinfo utility
Also, the output from the modified test file with VS2019
I was able to confirm that change from simdutf/simdutf#328 fixes the test case and building it with node also makes the |
@deepak1556 Thanks. So I think we nailed it down. @anonrig has already pushed the simdutf update to node, so the fix should be part of future releases. It is less important right now, but my code was in error, this is the code I wanted you to try... #include "simdutf.h"
#include <iostream>
#include <memory>
#include <uchar.h>
int main() {
const uint16_t buf[] = {
123, 34, 105, 100, 34, 58, 49, 44, 34, 109,
101, 116, 104, 111, 100, 34, 58, 34, 82, 117,
110, 116, 105, 109, 101, 46, 101, 118, 97, 108,
117, 97, 116, 101, 34, 44, 34, 112, 97, 114,
97, 109, 115, 34, 58, 123, 34, 101, 120, 112,
114, 101, 115, 115, 105, 111, 110, 34, 58, 34,
99, 111, 110, 115, 116, 32, 120, 50, 32, 61,
32, 39, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
39, 34, 44, 34, 116, 104, 114, 111, 119, 79,
110, 83, 105, 100, 101, 69, 102, 102, 101, 99,
116, 34, 58, 116, 114, 117, 101, 44, 34, 116,
105, 109, 101, 111, 117, 116, 34, 58, 51, 51,
51, 125, 125};
const char16_t *source = reinterpret_cast<const char16_t *>(buf);
const size_t length = std::size(buf);
for (auto &e : simdutf::get_available_implementations()) {
if (!e->supported_by_runtime_system()) {
continue;
}
std::cout << "testing " << e->name() << std::endl;
size_t expected_size = e->utf8_length_from_utf16(source, length);
std::unique_ptr<char[]> buffer(new char[expected_size]);
size_t out_size =
e->convert_utf16_to_utf8(source, length, buffer.get());
std::string final_string(buffer.get(), out_size);
std::cout << final_string << std::endl;
printf("%zu : %zu\n", expected_size, out_size);
}
return 0;
} |
For people reading this, here is the gist of the story so far. In recent versions of Node, we use the library I anticipate that we will be closing this issue soon. |
Version
v19.8.1
Platform
Microsoft Windows NT 10.0.22621.0 x64
Subsystem
No response
What steps will reproduce the bug?
How often does it reproduce? Is there a required condition?
Every time I try this in cli.
What is the expected behavior? Why is that the expected behavior?
Display current date.
Date Thu Apr 06 2023 21:06:02 GMT-0300
What do you see instead?
Additional information
No response
The text was updated successfully, but these errors were encountered: