-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: added big-endian check and support to UTF-16 encode #3410
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ | |
|
||
#include <limits.h> | ||
#include <string.h> // memcpy | ||
#include <vector> | ||
|
||
// When creating strings >= this length v8's gc spins up and consumes | ||
// most of the execution time. For these cases it's more performant to | ||
|
@@ -406,9 +407,7 @@ size_t StringBytes::Write(Isolate* isolate, | |
reinterpret_cast<uintptr_t>(buf) % sizeof(uint16_t); | ||
if (is_aligned) { | ||
uint16_t* const dst = reinterpret_cast<uint16_t*>(buf); | ||
for (size_t i = 0; i < nchars; i++) | ||
dst[i] = dst[i] << 8 | dst[i] >> 8; | ||
break; | ||
SwapBytes(dst, dst, nchars); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was pointed out in #7645 that this change introduces a bug: the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing out! Does anyone already develop a fix? if not, I can do the favour. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #7645 should fix this when it lands but I've made a note in case that PR stalls. |
||
} | ||
|
||
ASSERT_EQ(sizeof(uint16_t), 2); | ||
|
@@ -857,7 +856,16 @@ Local<Value> StringBytes::Encode(Isolate* isolate, | |
const uint16_t* buf, | ||
size_t buflen) { | ||
Local<String> val; | ||
|
||
std::vector<uint16_t> dst; | ||
if (IsBigEndian()) { | ||
// Node's "ucs2" encoding expects LE character data inside a | ||
// Buffer, so we need to reorder on BE platforms. See | ||
// http://nodejs.org/api/buffer.html regarding Node's "ucs2" | ||
// encoding specification | ||
dst.resize(buflen); | ||
SwapBytes(&dst[0], buf, buflen); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious, since we know There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it's necessary. Simple "new uint16_t[buflen]" is sufficient, and using a vector would just increase overhead. Do you think so? @bnoordhuis There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use of |
||
buf = &dst[0]; | ||
} | ||
if (buflen < EXTERN_APEX) { | ||
val = String::NewFromTwoByte(isolate, | ||
buf, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? It looks like a semantic null change and it makes the diff noisier than it needs to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnoordhuis
In this file, there are four cases that must be considered:
At present, reinterpreting "buf" as data is only done for the first case, and all other cases (2, 3 and 4) will involve a copy being made to produce an aligned version of unaligned data. This operation has the side effect of doing endian swapping ONLY when run on a big-endian system. This means that on a big-endian system, we will have data stored for the wrong endianness (little endian data gets swapped by alignment operation to big endian, then encode runs its own swapping function which swaps it to little endian again).
The proposed change will apply the alignment correction operation only for case 2 (little endian, misaligned). Case 1 (little endian, aligned) needs neither endianness swapping or alignment correction, while cases 3 and 4 will be handled using StringBytes::Encode's swap operation (which also takes care of alignment issues).
ALL cases need that buf is reinterpreted as a uint16_t variable, so we reinterpret either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. Thanks.