See the original repo for README of whisper.cpp
To paraphrase wikipedia, WebAssembly (WASM) was created to let us to run code at "near-native" speed on the front-end.
WASM achieves this by creating a binary-format, low-level, compiled language that can be directly executed by a browser. Developers would code in a high level language, i.e. C, then use a special compiler to compile their code to WASM code, which can then be served to the frontend and ran. (Constrast this with javascript, which is sent as plain text to the browser and interpreted)
Since WASM is an open standard, many compiler toolchains exist. For ScribeAR we chose emscripten, a gcc-like C / C++ to WASM compiler - mainly because whisper.cpp chose that.
Refer to the MDN and Emscripten official documentations for more juicy info. We recommend going through this tutorial first to get a feeling of it running.
Similar to gcc, emscripten takes in a bunch of .c
or .cpp
files, and compiles them into a single executable .wasm
file. However, since the .wasm
file must be able to intereact with a webpage (and the browser at large), it also generates a 'glue' .js
file that loads and supports the WASM code. Optionally, it can also generate a demo .html
file that runs the WASM code, but we will soon see how to run the WASM code in our own webpage.
Also similar to gcc, what exactly emscripten outputs can be controlled with the -o
flag.
The demo hello.html
file just runs the WASM code and print out the output. How does it do that? If you dig into it, you should see two <script>
elements. The first sets up a global object called Module
with some members, and the second one just includes the glue hello.js
file.
This Module
object serves as an interface between hello.html
(and our js code in general) and the WASM code. Recall that when an html
file is loaded, the (non-module non-async) scripts are ran in order. Thus, the first <script>
runs, initializes the Module
objects, and populate it with values and callbacks. Then, the second <script>
runs hello.js
, which reads from Module
to get its arguments, then loads and runs the WASM code (specifically its main
function) using them, and finally store everything in Module
(making it an WASM instance). Thus, hello.html
can pass data to WASM, and WASM can pass data back.
(More specifically, the print
member of Module
serves as a stdout
redirect, so to speak. It is called whenever the WASM code tries to print to stdout
. See here for a full specification of Module
)
You may realize that there are two major problems with how WASM is ran so far:
- It relies on
<script>
tags executed in order, which doesn't work once we move from plainhtml
files to something like React - There is no way to directly call a C function in our JS code, or vice versa (
print
is called implicitly when weprintf
in C)
There is also a more hidden third problem - What happens when we step it up and introduce pthreads to our C program?
We will see how all of these can be solved in the following sections on modularize
, binding stuff, and web workers.
As we just said, relying on <script>
tags limit what we can do with WASM quite a lot. Luckily, there is an emscripten option aptly named MODULARIZE
that outputs hello.js
as a JS module exporting a constructor for Module
, which can be ran anywhere at any time.
(This is a good place to introduce the myriad of options emscripten has, which are helpfully listed on this very hidden website. To enable an option add -s OPTION
to the emcc
command)
To use Modularize
without a build error, we must also use EXPORT_NAME='name'
to rename the constructor, and build to a .js file. This is because emscripten's default .html
template is not designed for modularized WASM.
We also highly recommend using the EXPORT_ES6
option, which lets you statically import ...
from the .js
file (It generates the .js
files as an ES6 rather an UMD module).
To sum it up, we make a modularized version of the hello.c
file:
$ emcc -s MODULARIZE -s EXPORT_NAME='makeHello' -s EXPORT_ES6 -o hello.html hello.c
Which yields the hello.js
file below:
(You may also noticed that the js
file has been minified. We recommend using tools such as Prettier to un-minify it for manual manipulations)
To use this module in our code, we import makeHello
, call it with an object containing arguments, and it will return a fully populated WASM instance. Notice that the constructor is async
, so you would need to do .then((module) => {...})
or something similar.
For example, if we were to modify the default hello.html
to use the modularized hello.js
, we would get something like this:
Notice that we have to use dynamic import here since this is a script. In a proper ES6 module we could just do:
WIP, see here
WASM implements pthreads using web workers. See here for more info.
To share memory between worker threads WASM uses SharedArrayBuffer
, which is disabled by default by browsers due to security risks. To enable it an website must:
- Be in a secure context
- Be cross-origin isolated
See here for a more details. To cross-origin isolate your site you need to modify your response header, or use this hack to modify it on the client side.
Note that in contrast to pthread_create
, to create a web worker we need to call new worker()
with the URL of a separate worker js script. As a result, compiling with the USE_PTHREADS
option generates an additional *.worker.js
file.
This also causes issues with webpack, as emscripten does not expect the worker and main js files to be bundled. In particular two functions need to be manually fixed:
- The main
.js
callsnew Worker(url)
to create web workers. However, in Webpack4worker-loader
must be used instead, and in Webpack5 thenew Worker(new URL(...))
syntax must be used instead. - The worker calls
importScripts()
to run the main.js
file, which is largely broken by webpack5. If your main file is modularized you need to import and call the constructor instead.
On a side note, SINGLE_FILE
option makes emscripten embed the .wasm
file into the .js
file as a blob, which also helps dealing with webpack.
You will need to first download and install emscripten. You will also need cmake.
This instance of Whisper is built from source code in /examples/whisper.wasm
. Go into whisper.cpp/
and do:
mkdir build & cd build
emcmake cmake ..
make libmain
This compiles whisper into libmain.js
and libmain.worker.js
in build/bin
(libmain.worker.js
is ran by the web worker). Copy them both into src/components/api/whisper
.
(If you are reading this guide for your own React project, make sure to copy them into the same folder. This is important because we will hardcode some relative paths in a moment which will break if they are in separate folders.)
We made a few changes to the CMakeLists.txt
scripts to make the WASM whisper build interface with ScribeAR (Or React.js + Webpack5 app in general) properly.
Changes to /examples/whisper.wasm/CMakeLists.txt
:
- Added
MODULARIZE
,EXPORT_NAME='makeWhisper'
, andEXPORT_ES6
to modularize whisper - Added
ENVIRONMENT=web,worker
to build for a browser environment (as opposed to backend node.js environment)
Changes to ScribeAR:
coi-serviceworker.js
was modified to be typescript compliant, and ran by the app to give us access toSharedArrayBuffer
for threading- Adapter code in
index.html
is adopted intowhisperRecognizer
to instantiate and run the WASM module
To elaborate on the last point, if you want to use libmain.js
in your own project you need to do the following:
- Import
makeWhisper
fromlibmain.js
- The following functions are exposed by the WASM module to the js code:
init
for loading a ggml module into Whisper, andfull_default
for transcribing a piece of audio. You can find their signatures in theemscripten.c
file in thewhisper.wasm
folder - To let the WASM module pass data (in particular transcript) back to the js code, redirect its stderr (see above to see how)
- We recommend referring to
index.html
to see exactly how these functions are used to create a complete web app