Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nullptr dereference in WASM code path #81

Open
jelmervdl opened this issue Mar 15, 2022 · 10 comments
Open

nullptr dereference in WASM code path #81

jelmervdl opened this issue Mar 15, 2022 · 10 comments
Assignees
Labels
bug Something isn't working wasm

Comments

@jelmervdl
Copy link
Member

Bug description

When trying to run the WASM compiled version of bergamot-translator, some models (specifically, the student.base models from browsermt/students) produce invalid output. Typically a bunch of repetitions of a single word or character per sentence. Not unlike this bug report.

In an attempt to figure out what was going on, I basically compiled the WASM code path into a native app so I could run it through llvm. This caught a nullptr dereference inside intgemm which was ultimately caused by the nullptr which is supposed to be const float* input_bias this line:

int8PrepareBias((const int8_t *)b->data(), scale_a, 0.0 /*zero_point_a*/, scale_b, 0.0 /*zero_point_b*/, rows(b), cols(b), nullptr/*input_bias*/, val_->data());

This then translates/binds/magics into

float unquant_factor = (-1) * ((127.0f / scale_A) * (127.0f / scale_B)) / (127.0f);
intgemm::Int8Shift::PrepareBias(
input_B_prepared,
width,
cols_B,
intgemm::callbacks::UnquantizeAndAddBiasAndWrite(unquant_factor, input_bias, output));
}

That nullptr then ends up here as config.bias_addr:

template <> class CallbackImpl<CPUType::CPU_NAME, UnquantizeAndAddBiasAndWrite> {
  ...
    auto result = kernels::unquantize(input, mult_reg);
    result = kernels::add_bias(result, config.bias_addr, info.col_idx);
    kernels::write(result, config.output_addr, info.row_idx * info.cols + info.col_idx);
  ...
};

And I'm not sure which of these implementations of kernels::add_bias it ends up at, but none of these seem to be happy with nullptr.

As an experiment, I re-implemented the fallback function to handle the nulltr and call the callback without the bias term if the bias was null:

extern "C" void int8PrepareBias(const int8_t* input_B_prepared,
                                        float scale_A,
                                        float zero_point_A,
                                        float scale_B,
                                        float zero_point_B,
                                        Index width,
                                        Index cols_B,
                                        const float* input_bias,
                                        float* output) {
  float unquant_factor = (-1) * ((127.0f / scale_A) * (127.0f / scale_B)) / (127.0f);
  if (input_bias == nullptr) {
    intgemm::Int8Shift::PrepareBias(
        input_B_prepared,
        width,
        cols_B,
        intgemm::callbacks::UnquantizeAndWrite(unquant_factor, output));
  } else {
    intgemm::Int8Shift::PrepareBias(
        input_B_prepared,
        width,
        cols_B,
        intgemm::callbacks::UnquantizeAndAddBiasAndWrite(unquant_factor, input_bias, output));
  }
}

That fixes both the nullptr dereference error and the broken model output for my non-wasm wasm build.

I'm reporting this as a bug as I imagine there was some reasoning behind writing a nullptr there.

I'm also surprised that what seems to be buggy code compiled to a (mostly) functioning wasm build that works for the tiny models. The base and tiny11 models don't seem to differ all that much. Same layers it seems, they're just larger in base?

Lastly, I'm not sure how to fix this. The quick hack above won't work since that's the fallback code path. Something similar would need to be added to the intgemm code in Mozilla's tree.

Reproduce

Tested by building app/bergamot.cpp from bergamot-translator, after patching cmake files to not pass emcc specific flags when COMPILE_WASM is defined. Also changed wasm_intgemm_interface.h to remove the compiler attributes, and wasm_intgemm_fallback.cpp to remove all the Fallback bits from the names so that those functions get called directly.

I also needed to patch the config.intgemm8bitalpha.yml to include:

max-length-break: 128
mini-batch-words: 1024

After that, this works (or without the patch to fallback.cpp, gives a useful crash):

> lldb -- app/bergamot --model-config-paths ~/.config/translateLocally/ende.student.base-1647129297/config.intgemm8bitalpha.yml
(lldb) target create "app/bergamot"
Current executable set to '/Users/jelmer/Workspace/statmt/firefox-translations/bergamot-translator/build/app/bergamot' (x86_64).
(lldb) settings set -- target.run-args  "--model-config-paths" "/Users/jelmer/.config/translateLocally/ende.student.base-1647129297/config.intgemm8bitalpha.yml"
(lldb) run
Process 3022 launched: '/Users/jelmer/Workspace/statmt/firefox-translations/bergamot-translator/build/app/bergamot' (x86_64)
Hello world!
Hallo Welt!
Process 3022 exited with status = 0 (0x00000000)
@jelmervdl
Copy link
Member Author

Can confirm, this caused the result in jelmervdl/translatelocally-web-ext#14. Recompiling the wasm code with the patch mentioned above, and forcing the fallback path fixes translation.

@jelmervdl
Copy link
Member Author

The nullptr was introduced at the very beginning. I'm even more confused now how this code "worked" for so long.

@XapaJIaMnu
Copy link
Collaborator

I just took a look at this. @jelmervdl you are right, the problem is in the wasm intgemm fallback, because whoever implemented it mapped PrepareBias and PrepareFakeBias to the same function, and just passing a nullptr in the case of a fake bias, which is clearly not correct. The solution that you entered for the fallback is the correct one. You should PR it.

@jelmervdl
Copy link
Member Author

@abhi-agg could you confirm you've also seen this, and help get this fix into Firefox?

@abhi-agg
Copy link

abhi-agg commented Mar 25, 2022

I'm reporting this as a bug as I imagine there was some reasoning behind writing a nullptr there.

I'm also surprised that what seems to be buggy code compiled to a (mostly) functioning wasm build that works for the tiny models.

Lastly, I'm not sure how to fix this. The quick hack above won't work since that's the fallback code path. Something similar would need to be added to the intgemm code in Mozilla's tree.

@abhi-agg could you confirm you've also seen this, and help get this fix into Firefox?

The goal of the wasm gemm interface was to support the fastest configuration int8shiftAlphaAll with the student tiny models. This combination must not be invoking PrepareFakeBiasForBNodeOp and that's why we never saw this issue. However, it could also have been caught during the review of the PR.

I am not sure if #82 is the right way to resolve this issue because we always expect an input bias with non-zero size.

This is more of an API question which needs to be discussed here before we land anything in Firefox:

  1. When is PrepareFakeBias invoked and why is there a need to prepare a fake bias at all?
  2. What's the difference b/w student base and tiny models in context of PrepareFakeBias call? It seems to me that PrepareFakeBias is being invoked for student base models but not for student tiny models.
  3. Can PrepareFakeBias ever be invoked for tiny models with int8shiftAlphaAll?
  4. Can PrepareFakeBias ever be invoked for tiny models with int8 (to support Ukranian centric models as per Dump of Ukranian-centric models mozilla/firefox-translations#166 (comment))?

Perhaps @XapaJIaMnu or @kpu can throw some light? cc @andrenatal @lonnen

@jelmervdl
Copy link
Member Author

The goal of the wasm gemm interface was to support the fastest configuration int8shiftAlphaAll with the student tiny models. This combination must not be invoking PrepareFakeBiasForBNodeOp and that's why we never saw this issue. However, it could also have been caught during the review of the PR.

Sorry, my surprise wasn't how could this bug have made it into the running code, but how did it not cause issues earlier since it looks to me like that code is called for both the base and tiny11 models I tested.

I've attached two log files for a tiny its related base model to compare the intgemm function calls. It shows that both the tiny and the base models call the nullptr (e.g. line 2330), but somehow it doesn't seem to have any noticeable impact on tiny models. Either that's a bug on its own (maybe a graph node who's output isn't used?)

tiny.log
base.log

@abhi-agg
Copy link

It shows that both the tiny and the base models call the nullptr (e.g. line 2330), but somehow it doesn't seem to have any noticeable impact on tiny models.

That is surprising. Are you using the exact same model config for both tiny and base models? May I ask the gemm precision that you are using?

@jelmervdl
Copy link
Member Author

gemm-precision: int8shiftAlphaAll in my native tests, and the same for the tests I did in Firefox.

Here is a script that produces such intgemm call log files: https://gist.github.com/jelmervdl/7dc651fc53889d016261eaa0b3f30db8

It should work with nodejs 17 and up. The paths right now are such that it should work with the wasm build code if you place it in the wasm/ subdirectory, but that's pretty easy to change. It downloads the tiny en-de model and then produces a file intgemm-calls.json. Then it is a bit of searching through that file for "int8_prepare_bias" with the 0 as the second-to-last argument, but if you start at the bottom it should fairly quickly pop up.

@XapaJIaMnu
Copy link
Collaborator

XapaJIaMnu commented Mar 29, 2022

@abhi-agg the thing is different models take different code paths inside marian. The tiny model (apparently) only uses the affine operation during decoding, whereas the base model also makes use of dot and dot doesn't have a bias, whereas affine does.

There is a reason why we implement both. And indeed maybe nobody noticed the discrepancy at the time of the review, but now we did and it should be fixed.

Edit:
Prepare Fake Bias is used when we want to use the faster code path (which always requires bias) when using the dot operation which doesn't have a bias.

@kpu
Copy link
Member

kpu commented Mar 29, 2022

It's a bug

I'm reporting this as a bug as I imagine there was some reasoning behind writing a nullptr there.
I'm also surprised that what seems to be buggy code compiled to a (mostly) functioning wasm build that works for the tiny models.
Lastly, I'm not sure how to fix this. The quick hack above won't work since that's the fallback code path. Something similar would need to be added to the intgemm code in Mozilla's tree.
@abhi-agg could you confirm you've also seen this, and help get this fix into Firefox?

The goal of the wasm gemm interface was to support the fastest configuration int8shiftAlphaAll with the student tiny models. This combination must not be invoking PrepareFakeBiasForBNodeOp and that's why we never saw this issue. However, it could also have been caught during the review of the PR.

I am not sure if #82 is the right way to resolve this issue because we always expect an input bias with non-zero size.

This is more of an API question which needs to be discussed here before we land anything in Firefox:

1. When is `PrepareFakeBias` invoked and why is there a need to prepare a `fake` bias at all?

2. What's the difference b/w student base and tiny models in context of `PrepareFakeBias` call? It seems to me that `PrepareFakeBias` is being invoked for student base models but not for student tiny models.

3. Can `PrepareFakeBias` ever be invoked for tiny models with `int8shiftAlphaAll`?

4. Can `PrepareFakeBias` ever be invoked for tiny models with `int8` (to support Ukranian centric models as per [Dump of Ukranian-centric models mozilla/firefox-translations#166 (comment)](https://github.com/mozilla/firefox-translations/issues/166#issuecomment-1078595427))?

Perhaps @XapaJIaMnu or @kpu can throw some light? cc @andrenatal @lonnen

You @abhi-agg wrote the line that calls with nullptr:

https://github.com/browsermt/marian-dev/blame/53c4f7e4537dbf7782c583e98f50e513f0a27541/src/tensors/cpu/intgemm_interface.h#L386

You've been handed the fix on a silver platter and it is a short fix.

If arguing about the API is easier for you than putting the fix in Firefox, that's a process problem with Firefox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wasm
Projects
None yet
Development

No branches or pull requests

4 participants