Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My working project for several years (port of RPython for WASM) cannot run anymore in latest emscripten, please tell me what have been changed in the last couple of emscripten versions #19457

Closed
rafi16jan opened this issue May 26, 2023 · 25 comments

Comments

@rafi16jan
Copy link

Please include the following in your bug report:

Version of emscripten/emsdk:
Please include the output emcc -v here

Unworking version:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.39-git
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Working version:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.0.1-git
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This is the version that is known to work now (I installed an old version from homebrew), I remember supporting multiple versions of emscripten that still uses EXTRA_EXPORTED_RUNTIME_METHODS instead of EXPORTED_RUNTIME_METHODS so this bug is about regression.

Failing command line in full:
The initial compilation worked successfully, only some of the program's feature is emitting unreachable keyword of wasm or abort randomly.

Full link command and output with -v appended:
Not sure how to get this, RPython is using Makefile for the compilation and what my fork does is changing the CC and some other flags

Description

Please ask for more details by replying/asking about specific topic about my fork of the project or how RPython works, I will only describe this problem briefly because I'm currently on some deadline. I even decided to use a better alternative for this deadline and my report of this bug is just purely for historical and technical reasons. If this problem did not even get resolved in a couple of months, maybe I will kill my project for good.

So in the version 3.1.39 both default wasm backend and wasm2js backend emitted unreachable code. I will show the wasm2js version for easier syntax highlighting:

function pypy_g_ll_dict_grow__v171___simple_call__function_l($0) {
  $0 = $0 | 0;
  wasm2js_trap();
}

And wasm2js_trap is basically unreachable:

function wasm2js_trap() { throw new Error('abort'); }

So this function is called when the dict (RPython's dict) is grown when there is a new entry (key-value pair), weirdly the previous function that handle the key reindexing and other things worked as usual, the only weird thing is that unreachable statement is emitted every corner on the generated code. After several hours of wasting my time I installed an old version of emscripten that still worked on Monterey and voila. It worked flawlessly. And here's the generated dict_grow function:

function pypy_g_ll_dict_grow__v157___simple_call__function_l($0) {
  $0 = $0 | 0;
  var $1 = 0, $2 = 0;
  $1 = HEAP32[(HEAP32[($0 + 20 | 0) >> 2] | 0) >> 2] | 0;
  $1 = ($1 >> 3 | 0) + $1 | 0;
  $2 = HEAP32[$0 >> 2] | 0;
  label$1 : {
   label$2 : {
    label$3 : {
     label$4 : {
      label$5 : {
       switch ((HEAP32[($0 + 16 | 0) >> 2] | 0) & 7 | 0 | 0) {
       case 0:
        if (($2 | 0) < (253 | 0)) {
         break label$4
        }
        pypy_g_RPyRaiseException(8032 | 0, 7724 | 0);
        $0 = HEAP32[(0 + 23884 | 0) >> 2] | 0;
        $1 = $0 << 3 | 0;
        HEAP32[($1 + 23892 | 0) >> 2] = 0;
        HEAP32[($1 + 23888 | 0) >> 2] = 12184;
        HEAP32[(0 + 23884 | 0) >> 2] = ($0 + 1 | 0) & 127 | 0;
        return 1 | 0;
       case 1:
        break label$5;
       default:
        break label$1;
       };
      }
      label$7 : {
       if (($2 | 0) > (65532 | 0)) {
        break label$7
       }
       if (($1 | 0) > (65525 | 0)) {
        break label$3
       }
       break label$1;
      }
      pypy_g_RPyRaiseException(8032 | 0, 7724 | 0);
      $0 = HEAP32[(0 + 23884 | 0) >> 2] | 0;
      $1 = $0 << 3 | 0;
      HEAP32[($1 + 23892 | 0) >> 2] = 0;
      HEAP32[($1 + 23888 | 0) >> 2] = 12196;
      $1 = 1;
      HEAP32[(0 + 23884 | 0) >> 2] = ($0 + 1 | 0) & 127 | 0;
      break label$2;
     }
     if (($1 | 0) < (246 | 0)) {
      break label$1
     }
    }
    pypy_g_ll_dict_remove_deleted_items__dicttablePtr($0 | 0);
    label$8 : {
     if (!(HEAP32[(0 + 23752 | 0) >> 2] | 0)) {
      break label$8
     }
     $0 = HEAP32[(0 + 23884 | 0) >> 2] | 0;
     $1 = $0 << 3 | 0;
     HEAP32[($1 + 23892 | 0) >> 2] = 0;
     HEAP32[($1 + 23888 | 0) >> 2] = 12208;
     HEAP32[(0 + 23884 | 0) >> 2] = ($0 + 1 | 0) & 127 | 0;
     return 1 | 0;
    }
    $1 = 1;
    if ((HEAP32[$0 >> 2] | 0 | 0) == (HEAP32[($0 + 4 | 0) >> 2] | 0 | 0)) {
     break label$2
    }
    pypy_g_RPyRaiseException(8032 | 0, 7724 | 0);
    $0 = HEAP32[(0 + 23884 | 0) >> 2] | 0;
    $1 = $0 << 3 | 0;
    HEAP32[($1 + 23892 | 0) >> 2] = 0;
    HEAP32[($1 + 23888 | 0) >> 2] = 12220;
    HEAP32[(0 + 23884 | 0) >> 2] = ($0 + 1 | 0) & 127 | 0;
    return 1 | 0;
   }
   return $1 | 0;
  }
  label$9 : {
   $1 = pypy_g_ll_malloc_varsize__Signed_Signed_Signed_Signed($1 + 8 | 0 | 0, 4 | 0, 8 | 0, 0 | 0) | 0;
   if ($1) {
    break label$9
   }
   $0 = HEAP32[(0 + 23884 | 0) >> 2] | 0;
   $1 = $0 << 3 | 0;
   HEAP32[($1 + 23892 | 0) >> 2] = 0;
   HEAP32[($1 + 23888 | 0) >> 2] = 12232;
   HEAP32[(0 + 23884 | 0) >> 2] = ($0 + 1 | 0) & 127 | 0;
   return 1 | 0;
  }
  $2 = HEAP32[($0 + 20 | 0) >> 2] | 0;
  pypy_g_ll_arraycopy__arrayPtr_arrayPtr_Signed_Signed_Si_1($2 | 0, $1 | 0, 0 | 0, 0 | 0, HEAP32[$2 >> 2] | 0 | 0);
  HEAP32[($0 + 20 | 0) >> 2] = $1;
  return 0 | 0;
}

Just as the founding fathers (of RPython obviously) intended. This is exactly what the C code generated. And ignore the v** in the middle of function name, that's the RPython typer in action.

@rafi16jan rafi16jan changed the title My working project for several years (port of RPython for WASM) cannot compile anymore in latest emscripten, please tell me what have been changed in the last couple of emscripten versions My working project for several years (port of RPython for WASM) cannot run anymore in latest emscripten, please tell me what have been changed in the last couple of emscripten versions May 26, 2023
@kripken
Copy link
Member

kripken commented May 26, 2023

If code was replaced by a trap, then my first guess is that a clang update added new optimizations that found undefined behavior somewhere in that code (or the new optimizations could have a bug, in theory).

please tell me what have been changed in the last couple of emscripten versions

Unfortunately there is a truly massive amount of changes in the 3.0.1-3.1.39 range (Dec 2021-May2023). But there is a good way to narrow this down, which is to bisect:

https://emscripten.org/docs/contributing/developers_guide.html#bisecting

Bisection even on such a large range takes a small number of steps (logarithmic). So once you get set up for bisection as described there, it tends to not take too long. Probably less than an hour on a reasonable network connection.

@kripken
Copy link
Member

kripken commented May 26, 2023

Another option is to look for undefined behavior directly: https://emscripten.org/docs/debugging/Sanitizers.html

@rafi16jan
Copy link
Author

I think bisecting is the way to go, but it will take sometime. Regarding the undefined behaviour, would the UBSan and ASan find the problem when the whole function block is replaced with unreachable?

@kripken
Copy link
Member

kripken commented May 26, 2023

It might, yes. An entire block of code being turned into an unreachable is a common outcome from optimizing undefined behavior.

@rafi16jan
Copy link
Author

It might, yes. An entire block of code being turned into an unreachable is a common outcome from optimizing undefined behavior.

So UBSan or ASan will show an error when compiling? Or the other way around, on runtime?

@rafi16jan
Copy link
Author

Another option is to look for undefined behavior directly: https://emscripten.org/docs/debugging/Sanitizers.html

When trying UBSan with WASM mode (non-wasm2js) it worked, and printed this error on runtime:

rpython_rtyper_lltypesystem.c:1315:3: runtime error: signed integer overflow: 1000003 * 14592 cannot be represented in type 'long'
rpython_rtyper_lltypesystem.c:5120:2: runtime error: shift exponent 63 is too large for 32-bit type 'Signed' (aka 'long')
hello from rpython

I'm totally happy with this as my program finally work even though there are integer overflow errors, but how to run this on wasm2js? This error is showing when compiling with wasm2js:

emcc: error: wasm2js is not compatible with USE_OFFSET_CONVERTER (see #14630)

@rafi16jan
Copy link
Author

I desperately needed wasm2js because default wasm backend Memory object cannot be shrinked unless re-initialized. See this.

@rafi16jan
Copy link
Author

Can undefined behavior trap instruction be turned off temporarily? Is there a flag for that?

@rafi16jan
Copy link
Author

It seems that if I turn off the error on emcc.py and polyfill customSections badly like:WebAssembly.Module.customSections = () => []; the program runs and UBSan worked as intended:

rpython_rtyper_lltypesystem.c:1314:3: runtime error: signed integer overflow: 1000003 * 14592 cannot be represented in type 'long'
rpython_rtyper_lltypesystem.c:5120:2: runtime error: shift exponent 63 is too large for 32-bit type 'Signed' (aka 'long')
hello from rpython

Maybe the offset converter will still do weird behavior but the most important thing the program worked and I can debug it, can I make a PR to turn off the wasm2js error?

@kripken
Copy link
Member

kripken commented May 30, 2023

emcc: error: wasm2js is not compatible with USE_OFFSET_CONVERTER (see #14630)

That error is for the sanitizers. It's possible sanitizers don't work with wasm2js atm. But since you only need them for debugging, you can debug without wasm2js.

I'm not sure it makes sense to remove that error, as it would make stack traces wrong and confusing. But perhaps the error message could be improved.

@rafi16jan
Copy link
Author

emcc: error: wasm2js is not compatible with USE_OFFSET_CONVERTER (see #14630)

That error is for the sanitizers. It's possible sanitizers don't work with wasm2js atm. But since you only need them for debugging, you can debug without wasm2js.

I'm not sure it makes sense to remove that error, as it would make stack traces wrong and confusing. But perhaps the error message could be improved.

Not entirely removing it, but only just turning it off with special flag or environment variable. Currently what I'm doing is telling the user to do WASM=1 instead or patching emcc.py directly .

@sbc100
Copy link
Collaborator

sbc100 commented May 31, 2023

What if we instead error on when -fsanitize=address is used with wasm2j? Would it not be OK with simply disallow santizers in -sWASM=0 builds?

@sbc100
Copy link
Collaborator

sbc100 commented May 31, 2023

Or do you really want to support sanitizers (albeit without any real stack traces) with -sWASM=0?

@sbc100
Copy link
Collaborator

sbc100 commented May 31, 2023

It seems that if I turn off the error on emcc.py and polyfill customSections badly like:WebAssembly.Module.customSections = () => []; the program runs and UBSan worked as intended:

rpython_rtyper_lltypesystem.c:1314:3: runtime error: signed integer overflow: 1000003 * 14592 cannot be represented in type 'long'
rpython_rtyper_lltypesystem.c:5120:2: runtime error: shift exponent 63 is too large for 32-bit type 'Signed' (aka 'long')
hello from rpython

Did these errors allow you track down and fix the original "unreachable" issue?

@rafi16jan
Copy link
Author

It seems that if I turn off the error on emcc.py and polyfill customSections badly like:WebAssembly.Module.customSections = () => []; the program runs and UBSan worked as intended:

rpython_rtyper_lltypesystem.c:1314:3: runtime error: signed integer overflow: 1000003 * 14592 cannot be represented in type 'long'
rpython_rtyper_lltypesystem.c:5120:2: runtime error: shift exponent 63 is too large for 32-bit type 'Signed' (aka 'long')
hello from rpython

Did these errors allow you track down and fix the original "unreachable" issue?

It is yeah, and the fix is simple, it's weird that RPython original developers is using this undefined behavior although it may have something to do with gcc as their default backend (clang is only on macos).

@rafi16jan
Copy link
Author

What if we instead error on when -fsanitize=address is used with wasm2j? Would it not be OK with simply disallow santizers in -sWASM=0 builds?

Don't do this, I am currently using this loophole

@sbc100
Copy link
Collaborator

sbc100 commented May 31, 2023

What if we instead error on when -fsanitize=address is used with wasm2j? Would it not be OK with simply disallow santizers in -sWASM=0 builds?

Don't do this, I am currently using this loophole

But just to be clear you found and fixed you issue for this particular case? .. you aren't using -fsanitize=address in your normal builds?

Or are you just saying that it would be good to have support in case you need to performs the same debugging on a different issue later?

@sbc100
Copy link
Collaborator

sbc100 commented May 31, 2023

What if we instead error on when -fsanitize=address is used with wasm2j? Would it not be OK with simply disallow santizers in -sWASM=0 builds?

Don't do this, I am currently using this loophole

I think we can instead downgrade this to a warning. e.g. "warning: santizer support in wasm2js is limited"

@rafi16jan
Copy link
Author

What if we instead error on when -fsanitize=address is used with wasm2j? Would it not be OK with simply disallow santizers in -sWASM=0 builds?

Don't do this, I am currently using this loophole

But just to be clear you found and fixed you issue for this particular case? .. you aren't using -fsanitize=address in your normal builds?

Or are you just saying that it would be good to have support in case you need to performs the same debugging on a different issue later?

The latter is correct. I can't use -fsanitize-trap=... (the default) because from what I observe on different deployments 99% of them fail with unreachable (only simple programs that doesn't use RPython's dict is working). Turning -fsanitize=... on will solve my users problem and would be easier for me to debug problems that will occur in the future. I will just make a new flag that turn off -fsanitize if the user is certain that the app is not ridden with undefined behavior. Remember that RPython literally translate small subset of Python to C, and different instruction/opcode will compile to radically different C code, and I'm sure that I will deal with hundreds of them until there will be no undefined behavior left.

And about wasm2js my project is more inclined to it because it supports shrinking memory. Pure wasm on the other hand only support zero'ing it and impossible to shrink the length. 90% of my users deployment is running on wasm2js and changing it to pure wasm will making those small serverless functions reach max heap of the node process.

@sbc100
Copy link
Collaborator

sbc100 commented May 31, 2023

I'm not sure I fully understand exactly what you are you proposing proposing, but you should know that -fsanitize.. options are not designed to be run in production, they are debugging aids to help find undefined behaviour. They don't actually stop the undefined behaviour from occurring (or at least they are not designed to do). They can also dramatically slow down your application and increase its memory usage.

@rafi16jan
Copy link
Author

I'm not sure I fully understand exactly what you are you proposing proposing, but you should know that -fsanitize.. options are not designed to be run in production, they are debugging aids to help find undefined behaviour. They don't actually stop the undefined behaviour from occurring (or at least they are not designed to do). They can also dramatically slow down your application and increase its memory usage.

I know that it doesn't stop undefined behaviour, but I bet you that UBSan runtime is much lighter than Asyncify. It is also the reason why we are more inclined to wasm2js, our software is designed to fallback to initial memory (that's why we need for the memory to shrink) after polluting it to run some operations. So the memory usage AFAIK will still be low.

@kripken
Copy link
Member

kripken commented May 31, 2023

Downgrading that error to a warning makes sense, but it would require some work as without that error it will error later on, see #14630

@rafi16jan
Copy link
Author

@kripken @sbc100 it turns out the default behavior is all restored by adding -fcommon

@rafi16jan
Copy link
Author

*the previous behavior

@sbc100
Copy link
Collaborator

sbc100 commented Jun 1, 2023

Thats odd.. emscripten/wasm doesn't support -fcommon symbols.. you would get a compiler error if you tried to use one.. strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants