Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: encodeInto implementation is not correct #9006

Closed
lucacasonato opened this issue Jan 5, 2021 · 13 comments · Fixed by #10116
Closed

bug: encodeInto implementation is not correct #9006

lucacasonato opened this issue Jan 5, 2021 · 13 comments · Fixed by #10116
Labels
bug Something isn't working correctly ext/web related to the ext/web crate

Comments

@lucacasonato
Copy link
Member

You can enable the failing WPT tests by adding this snippet to cli/tests/wpt.jsonc:

{
   "name": "encodeInto",
   "expectFail": [
     // TODO(lucacasonato): enable once we support MessageChannel
     "encodeInto() and a detached output buffer"
   ]
},

Errors from the failing tests look like this:

encodeInto() into ArrayBuffer with �A�A¥Hi and destination length 10, offset 4, filler random
assert_equals: expected 239 but got 237
Error
    at AssertionError.get_stack (file:///tmp/wpt-bundle-J6kGw.js:3617:21)
    at new AssertionError (file:///tmp/wpt-bundle-J6kGw.js:3610:27)
    at assert (file:///tmp/wpt-bundle-J6kGw.js:3600:19)
    at assert_equals (file:///tmp/wpt-bundle-J6kGw.js:1225:9)
    at Test.<anonymous> (file:///tmp/wpt-bundle-J6kGw.js:4320:13)
    at Test.step (file:///tmp/wpt-bundle-J6kGw.js:2041:25)
    at test (file:///tmp/wpt-bundle-J6kGw.js:569:30)
    at file:///tmp/wpt-bundle-J6kGw.js:4286:7
    at Array.forEach (<anonymous>)
    at file:///tmp/wpt-bundle-J6kGw.js:4285:42
@lucacasonato lucacasonato added bug Something isn't working correctly ext/web related to the ext/web crate labels Jan 5, 2021
@ghost
Copy link

ghost commented Jan 17, 2021

I set up a quick test using a detached buffer and Deno threw an exception,

error: Uncaught TypeError: Cannot perform %TypedArray%.prototype.entries on a detached ArrayBuffer
    const iter = value.entries();
const memory = new WebAssembly.Memory({
	initial: 1,
	maximum: 1,
	shared: false
});

const array = new Uint8Array(memory.buffer, 0, 100);

memory.grow(0);

const result = new TextEncoder().encodeInto("Hello, world!", array);

console.log(array, result);

result is { read: 0, written: 0 } in Chromium.

Is this even related to the WPT test assertion?


I've updated Deno's TextEncoder#encodeInto to match Chromium's behavior, can you try the WPT tests again?

@lucacasonato
Copy link
Member Author

#9143 did not fix the error WPT errors.

@ghost
Copy link

ghost commented Jan 19, 2021

Unfortunate. Can I get more details on this WPT test in particular? i.e. where are the WPT tests sourced?

@lucacasonato
Copy link
Member Author

Can't you run it? uncomment / comment the tests you care about in //cli/tests/wpt.jsonc, then run cargo test web_platform_tests. You can find sources for all the tests in //test_util/wpt.

@ghost
Copy link

ghost commented Jan 19, 2021

Checking out the actual test, the problem isn't what it seems at all, as far as I can tell, Deno's implementation is okay.

Besides that WebAssembly.Memory#grow trick that I did there, Deno lacks any other method for buffer detachment.

Deno is missing globalThis.MessageChannel, and that is how they detach the buffer. Technicially this issue is incorrect.

If the test could've successfully detached the buffer, then it would've encountered the errors that I had, so I guess that PR did fix the implementation at least.

@ghost ghost mentioned this issue Jan 22, 2021
@tarruda
Copy link
Contributor

tarruda commented Apr 11, 2021

@lucacasonato is this issue still present? I cannot find cli/tests/wpt.jsonc, do you have up-to-date instructions on how I can run this test locally?

@lucacasonato
Copy link
Member Author

Yes the issue is still present. The expectations file is now in tools/wpt/expectations.json. Instructions for running WPT: https://deno.land/[email protected]/contributing/web_platform_tests#web-platform-test. If you have more questions feel free to also ask on https://discord.gg/deno :-)

tarruda added a commit to tarruda/deno that referenced this issue Apr 11, 2021
TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it
more efficient by lowering pressure on memory allocator.

The current implementation was doing a lot of allocations under the
hood:

- UTF8Encoder is instantiated.
- UTF8Encoder allocates arrays as the return value
- The string is essentially duplicated by the stringToCodePoints
  function and then wrapped into a Stream.

The implementation also failed some wpt tests (denoland#9006).

To fix all of these issues, reimplement UTF-8 encoding in a function
that 0 allocations and receives offset states from the caller, while
also correctly handling boundaries of the input buffer.
@tarruda
Copy link
Contributor

tarruda commented Apr 11, 2021

@lucacasonato I've managed to fix the ArrayBuffer failures in #10129. The SharedArrayBuffer tests are still failing though, and doesn't seem to be related to the UTF-8 encoding implementation.

@tarruda
Copy link
Contributor

tarruda commented Apr 11, 2021

It seems the reason SharedArrayBuffer tests are failing is because the constructor is throwing this:

WebAssembly.Memory does not support shared:true
Error: WebAssembly.Memory does not support shared:true
    at file:///tmp/wpt-bundle-d116c909.js:4250:15
    at Test.<anonymous> (file:///tmp/wpt-bundle-d116c909.js:4346:24)
    at Test.step (file:///tmp/wpt-bundle-d116c909.js:2101:25)
    at test (file:///tmp/wpt-bundle-d116c909.js:572:30)
    at file:///tmp/wpt-bundle-d116c909.js:4339:7
    at Array.forEach (<anonymous>)
    at file:///tmp/wpt-bundle-d116c909.js:4338:42
    at Array.forEach (<anonymous>)
    at file:///tmp/wpt-bundle-d116c909.js:4337:5
    at Array.forEach (<anonymous>)

@tarruda
Copy link
Contributor

tarruda commented Apr 11, 2021

The problem is in test_util/wpt/common/sab.js. It uses new WebAssembly.Memory({ shared:true, initial:0, maximum:0 }).buffer.constructor as the constructor for SharedArrayBuffer, but throws when it sees that the constructor name is not SharedArrayBuffer. Using SharedArrayBuffer constructor directly fixes the tests:

diff --git a/common/sab.js b/common/sab.js
index d40dd7cca..f1df4f93c 100644
--- a/common/sab.js
+++ b/common/sab.js
@@ -5,10 +5,7 @@ const createBuffer = (() => {
     if (type === "ArrayBuffer") {
       return new ArrayBuffer(length);
     } else if (type === "SharedArrayBuffer") {
-      if (sabConstructor.name !== "SharedArrayBuffer") {
-        throw new Error("WebAssembly.Memory does not support shared:true");
-      }
-      return new sabConstructor(length);
+      return new SharedArrayBuffer(length);
     } else {
       throw new Error("type has to be ArrayBuffer or SharedArrayBuffer");
     }

Though I assume the correct fix is in ensuring that new WebAssembly.Memory({ shared: true }) returns a SharedArrayBuffer.

tarruda added a commit to tarruda/deno that referenced this issue Apr 11, 2021
TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it
more efficient by lowering pressure on memory allocator.

The current implementation was doing a lot of allocations under the
hood:

- UTF8Encoder is instantiated.
- UTF8Encoder allocates arrays as the return value
- The string is essentially duplicated by the stringToCodePoints
  function and then wrapped into a Stream.

The implementation also failed some wpt tests (denoland#9006).

To fix all of these issues, reimplement UTF-8 encoding in a function
that 0 allocations and receives offset states from the caller, while
also correctly handling boundaries of the input buffer.
@ghost
Copy link

ghost commented Apr 26, 2021

@bartlomieju According to @tarruda, the Wasm shared memory was only one part of the problem here, should this be re-opened?

@tarruda
Copy link
Contributor

tarruda commented Apr 26, 2021

the Wasm shared memory was only one part of the problem here

Yes. the failing ArrayBuffer tests were caused by a buggy implementation of encodeInto which is fixed by #10129.

The failing SharedArrayBuffer tests should be fixed by #10116, but some should still fail until #10129 is merged.

tarruda added a commit to tarruda/deno that referenced this issue Apr 26, 2021
TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it
more efficient by lowering pressure on memory allocator.

The current implementation was doing a lot of allocations under the
hood:

- UTF8Encoder is instantiated.
- UTF8Encoder allocates arrays as the return value
- The string is essentially duplicated by the stringToCodePoints
  function and then wrapped into a Stream.

The implementation also failed some wpt tests (denoland#9006).

To fix all of these issues, reimplement UTF-8 encoding in a function
that 0 allocations and receives offset states from the caller, while
also correctly handling boundaries of the input buffer.
@tarruda
Copy link
Contributor

tarruda commented Apr 26, 2021

I've rebased #10129 to fix some merge conflicts and confirm that most encodeInto tests are fixed. There's still one failure though: " encodeInto() and a detached output buffer". This test requires MessageChannel which Deno does not implement yet: #6691

tarruda added a commit to tarruda/deno that referenced this issue May 5, 2021
TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it
more efficient by lowering pressure on memory allocator.

The current implementation was doing a lot of allocations under the
hood:

- UTF8Encoder is instantiated.
- UTF8Encoder allocates arrays as the return value
- The string is essentially duplicated by the stringToCodePoints
  function and then wrapped into a Stream.

The implementation also failed some wpt tests (denoland#9006).

To fix all of these issues, reimplement UTF-8 encoding in a function
that 0 allocations and receives offset states from the caller, while
also correctly handling boundaries of the input buffer.
tarruda added a commit to tarruda/deno that referenced this issue May 5, 2021
TextEncoder.encodeInto main advantage vs TextEncoder.encode is that it
more efficient by lowering pressure on memory allocator.

The current implementation was doing a lot of allocations under the
hood:

- UTF8Encoder is instantiated.
- UTF8Encoder allocates arrays as the return value
- The string is essentially duplicated by the stringToCodePoints
  function and then wrapped into a Stream.

The implementation also failed some wpt tests (denoland#9006).

To fix all of these issues, reimplement UTF-8 encoding in a function
that 0 allocations and receives offset states from the caller, while
also correctly handling boundaries of the input buffer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly ext/web related to the ext/web crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants