bug: encodeInto implementation is not correct #9006

lucacasonato · 2021-01-05T19:09:36Z

You can enable the failing WPT tests by adding this snippet to cli/tests/wpt.jsonc:

{
   "name": "encodeInto",
   "expectFail": [
     // TODO(lucacasonato): enable once we support MessageChannel
     "encodeInto() and a detached output buffer"
   ]
},

Errors from the failing tests look like this:

encodeInto() into ArrayBuffer with �A�A¥Hi and destination length 10, offset 4, filler random
assert_equals: expected 239 but got 237
Error
    at AssertionError.get_stack (file:///tmp/wpt-bundle-J6kGw.js:3617:21)
    at new AssertionError (file:///tmp/wpt-bundle-J6kGw.js:3610:27)
    at assert (file:///tmp/wpt-bundle-J6kGw.js:3600:19)
    at assert_equals (file:///tmp/wpt-bundle-J6kGw.js:1225:9)
    at Test.<anonymous> (file:///tmp/wpt-bundle-J6kGw.js:4320:13)
    at Test.step (file:///tmp/wpt-bundle-J6kGw.js:2041:25)
    at test (file:///tmp/wpt-bundle-J6kGw.js:569:30)
    at file:///tmp/wpt-bundle-J6kGw.js:4286:7
    at Array.forEach (<anonymous>)
    at file:///tmp/wpt-bundle-J6kGw.js:4285:42

The text was updated successfully, but these errors were encountered:

ghost · 2021-01-17T20:37:46Z

I set up a quick test using a detached buffer and Deno threw an exception,

error: Uncaught TypeError: Cannot perform %TypedArray%.prototype.entries on a detached ArrayBuffer
    const iter = value.entries();

const memory = new WebAssembly.Memory({
	initial: 1,
	maximum: 1,
	shared: false
});

const array = new Uint8Array(memory.buffer, 0, 100);

memory.grow(0);

const result = new TextEncoder().encodeInto("Hello, world!", array);

console.log(array, result);

result is { read: 0, written: 0 } in Chromium.

Is this even related to the WPT test assertion?

I've updated Deno's TextEncoder#encodeInto to match Chromium's behavior, can you try the WPT tests again?

lucacasonato · 2021-01-19T00:55:25Z

#9143 did not fix the error WPT errors.

ghost · 2021-01-19T01:10:54Z

Unfortunate. Can I get more details on this WPT test in particular? i.e. where are the WPT tests sourced?

lucacasonato · 2021-01-19T08:09:13Z

Can't you run it? uncomment / comment the tests you care about in //cli/tests/wpt.jsonc, then run cargo test web_platform_tests. You can find sources for all the tests in //test_util/wpt.

ghost · 2021-01-19T23:48:40Z

Checking out the actual test, the problem isn't what it seems at all, as far as I can tell, Deno's implementation is okay.

Besides that WebAssembly.Memory#grow trick that I did there, Deno lacks any other method for buffer detachment.

Deno is missing globalThis.MessageChannel, and that is how they detach the buffer. Technicially this issue is incorrect.

If the test could've successfully detached the buffer, then it would've encountered the errors that I had, so I guess that PR did fix the implementation at least.

tarruda · 2021-04-11T00:46:22Z

@lucacasonato is this issue still present? I cannot find cli/tests/wpt.jsonc, do you have up-to-date instructions on how I can run this test locally?

lucacasonato · 2021-04-11T00:48:33Z

Yes the issue is still present. The expectations file is now in tools/wpt/expectations.json. Instructions for running WPT: https://deno.land/[email protected]/contributing/web_platform_tests#web-platform-test. If you have more questions feel free to also ask on https://discord.gg/deno :-)

TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it more efficient by lowering pressure on memory allocator. The current implementation was doing a lot of allocations under the hood: - UTF8Encoder is instantiated. - UTF8Encoder allocates arrays as the return value - The string is essentially duplicated by the stringToCodePoints function and then wrapped into a Stream. The implementation also failed some wpt tests (denoland#9006). To fix all of these issues, reimplement UTF-8 encoding in a function that 0 allocations and receives offset states from the caller, while also correctly handling boundaries of the input buffer.

tarruda · 2021-04-11T02:43:21Z

@lucacasonato I've managed to fix the ArrayBuffer failures in #10129. The SharedArrayBuffer tests are still failing though, and doesn't seem to be related to the UTF-8 encoding implementation.

tarruda · 2021-04-11T03:12:53Z

It seems the reason SharedArrayBuffer tests are failing is because the constructor is throwing this:

WebAssembly.Memory does not support shared:true
Error: WebAssembly.Memory does not support shared:true
    at file:///tmp/wpt-bundle-d116c909.js:4250:15
    at Test.<anonymous> (file:///tmp/wpt-bundle-d116c909.js:4346:24)
    at Test.step (file:///tmp/wpt-bundle-d116c909.js:2101:25)
    at test (file:///tmp/wpt-bundle-d116c909.js:572:30)
    at file:///tmp/wpt-bundle-d116c909.js:4339:7
    at Array.forEach (<anonymous>)
    at file:///tmp/wpt-bundle-d116c909.js:4338:42
    at Array.forEach (<anonymous>)
    at file:///tmp/wpt-bundle-d116c909.js:4337:5
    at Array.forEach (<anonymous>)

tarruda · 2021-04-11T03:28:33Z

The problem is in test_util/wpt/common/sab.js. It uses new WebAssembly.Memory({ shared:true, initial:0, maximum:0 }).buffer.constructor as the constructor for SharedArrayBuffer, but throws when it sees that the constructor name is not SharedArrayBuffer. Using SharedArrayBuffer constructor directly fixes the tests:

diff --git a/common/sab.js b/common/sab.js
index d40dd7cca..f1df4f93c 100644
--- a/common/sab.js
+++ b/common/sab.js
@@ -5,10 +5,7 @@ const createBuffer = (() => {
     if (type === "ArrayBuffer") {
       return new ArrayBuffer(length);
     } else if (type === "SharedArrayBuffer") {
-      if (sabConstructor.name !== "SharedArrayBuffer") {
-        throw new Error("WebAssembly.Memory does not support shared:true");
-      }
-      return new sabConstructor(length);
+      return new SharedArrayBuffer(length);
     } else {
       throw new Error("type has to be ArrayBuffer or SharedArrayBuffer");
     }

Though I assume the correct fix is in ensuring that new WebAssembly.Memory({ shared: true }) returns a SharedArrayBuffer.

TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it more efficient by lowering pressure on memory allocator. The current implementation was doing a lot of allocations under the hood: - UTF8Encoder is instantiated. - UTF8Encoder allocates arrays as the return value - The string is essentially duplicated by the stringToCodePoints function and then wrapped into a Stream. The implementation also failed some wpt tests (denoland#9006). To fix all of these issues, reimplement UTF-8 encoding in a function that 0 allocations and receives offset states from the caller, while also correctly handling boundaries of the input buffer.

ghost · 2021-04-26T16:18:02Z

@bartlomieju According to @tarruda, the Wasm shared memory was only one part of the problem here, should this be re-opened?

tarruda · 2021-04-26T18:02:49Z

the Wasm shared memory was only one part of the problem here

Yes. the failing ArrayBuffer tests were caused by a buggy implementation of encodeInto which is fixed by #10129.

The failing SharedArrayBuffer tests should be fixed by #10116, but some should still fail until #10129 is merged.

TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it more efficient by lowering pressure on memory allocator. The current implementation was doing a lot of allocations under the hood: - UTF8Encoder is instantiated. - UTF8Encoder allocates arrays as the return value - The string is essentially duplicated by the stringToCodePoints function and then wrapped into a Stream. The implementation also failed some wpt tests (denoland#9006). To fix all of these issues, reimplement UTF-8 encoding in a function that 0 allocations and receives offset states from the caller, while also correctly handling boundaries of the input buffer.

tarruda · 2021-04-26T18:45:21Z

I've rebased #10129 to fix some merge conflicts and confirm that most encodeInto tests are fixed. There's still one failure though: " encodeInto() and a detached output buffer". This test requires MessageChannel which Deno does not implement yet: #6691

TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it more efficient by lowering pressure on memory allocator. The current implementation was doing a lot of allocations under the hood: - UTF8Encoder is instantiated. - UTF8Encoder allocates arrays as the return value - The string is essentially duplicated by the stringToCodePoints function and then wrapped into a Stream. The implementation also failed some wpt tests (denoland#9006). To fix all of these issues, reimplement UTF-8 encoding in a function that 0 allocations and receives offset states from the caller, while also correctly handling boundaries of the input buffer.

lucacasonato added bug Something isn't working correctly ext/web related to the ext/web crate labels Jan 5, 2021

ghost mentioned this issue Jan 17, 2021

fix(op_crates/web) let TextEncoder#encodeInto accept detached ArrayBuffers #9143

Merged

ghost mentioned this issue Jan 22, 2021

Worker MessageChannel #6691

Closed

tarruda mentioned this issue Apr 11, 2021

fix(op_crates/web) Make TextEncoder.encodeInto more efficient. Also fixes #9006 #10129

Merged

caspervonb mentioned this issue Apr 11, 2021

feat(core): enable wasm threading support #10116

Merged

bartlomieju closed this as completed in #10116 Apr 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: encodeInto implementation is not correct #9006

bug: encodeInto implementation is not correct #9006

lucacasonato commented Jan 5, 2021

ghost commented Jan 17, 2021 •

edited by ghost

Loading

lucacasonato commented Jan 19, 2021

ghost commented Jan 19, 2021 •

edited by ghost

Loading

lucacasonato commented Jan 19, 2021

ghost commented Jan 19, 2021 •

edited by ghost

Loading

tarruda commented Apr 11, 2021

lucacasonato commented Apr 11, 2021

tarruda commented Apr 11, 2021

tarruda commented Apr 11, 2021

tarruda commented Apr 11, 2021

ghost commented Apr 26, 2021

tarruda commented Apr 26, 2021

tarruda commented Apr 26, 2021

bug: encodeInto implementation is not correct #9006

bug: encodeInto implementation is not correct #9006

Comments

lucacasonato commented Jan 5, 2021

ghost commented Jan 17, 2021 • edited by ghost Loading

lucacasonato commented Jan 19, 2021

ghost commented Jan 19, 2021 • edited by ghost Loading

lucacasonato commented Jan 19, 2021

ghost commented Jan 19, 2021 • edited by ghost Loading

tarruda commented Apr 11, 2021

lucacasonato commented Apr 11, 2021

tarruda commented Apr 11, 2021

tarruda commented Apr 11, 2021

tarruda commented Apr 11, 2021

ghost commented Apr 26, 2021

tarruda commented Apr 26, 2021

tarruda commented Apr 26, 2021

ghost commented Jan 17, 2021 •

edited by ghost

Loading

ghost commented Jan 19, 2021 •

edited by ghost

Loading

ghost commented Jan 19, 2021 •

edited by ghost

Loading