Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using WebAssembly #4835

Open
ansis opened this issue Jun 14, 2017 · 27 comments
Open

using WebAssembly #4835

ansis opened this issue Jun 14, 2017 · 27 comments

Comments

@ansis
Copy link
Contributor

ansis commented Jun 14, 2017

WebAssembly or wasm is a new portable, size- and load-time-efficient format suitable for compilation to the web.
http://webassembly.org/

A couple months ago WebAssembly reached cross-browser consensus. This means it's shipping in the next or current versions of most major browsers. The support isn't wide enough yet, but this might be a good time to start talking about whether we want to use it, how we could use it and what the timeline on that might look like.

Why?
Theoretically we could compile a modified mapbox-gl-native to WebAssembly and we could have a single core codebase for all our platforms. No more porting between -js and -native and struggling to keep things in sync. There might also be performance benefits.

I don't have any answers. Just questions:

Browser support

  • How many versions back to we need to support browsers? (only the current ios safari? or last two versions? three?)
  • How soon do we think there will be support for all the browsers/versions we need to support?
  • Is compilation to unoptimized asm.js a viable fallback for IE?
  • Could we maintain the current -js version and only add new features / spec changes for browsers that do support wasm?

Switching process

  • Would we want to switch all at once?
  • Or by gradually replacing pieces (eg label collision calculations) with wasm versions?

Developer friendliness

  • Does the necessary debug tooling exist?
  • How slow would the compile-refresh-test cycle be?
  • Does having only a C++ version of the core make the barrier to entry higher?

Architecture

  • how would we deal with the different concurrency models in browsers and native platforms?
  • how much would the code in -native need to change to make it suitable for compilation to wasm?

Performance

  • Is the runtime performance fast enough?
  • Is the compiled binary small enough?
  • Is there a significant cost to calling into wasm? Is this a concern for any of our apis?
  • What is the performance of asm.js like? What about in browsers that don't support asm.js optimization, like IE?

Also

  • Would the benefits even be worth the large effort this would take?

@mapbox/gl-core, is this something we should start thinking about now? Not yet?

@ChrisLoer
Copy link
Contributor

Since mapbox-gl-rtl-text is already compiling native code with Emscripten, it should be relatively easy to build a wasm version. Going through that exercise, and trying to publish/support a wasm version of the plugin, might give us insight into some of these questions.

@asheemmamoowala
Copy link
Contributor

I've been thinking about this as well. @kkaefer and I briefly discussed a port of wagyu to support better data processing in JS and potentially for use in gl-js.

Figma has been using Emscriptem and just moved to WebAssembly. Their blog post is an informative read.

@ChrisLoer
Copy link
Contributor

ChrisLoer commented Jun 16, 2017

It was pretty trivial to get mapbox-gl-rtl-text building with WASM: I just updated to the latest emscripten and added the -s WASM=1 flag. The only challenge I ran into was that setting memory size at runtime didn't work. There's an extra file to distribute, but it cuts the cumulative plugin size by about 2/3:

Current wasmified
mapbox-gl-rtl-text.js.min 396 KB 27 KB
mapbox-gl-rtl-text.wasm - 105 KB

I tried using the "time to first tile" benchmarking (a la #3758 (comment)), but didn't see much difference between pre and post wasm-ification -- basically, the cost of running the code is already pretty small so it gets lost in the noise.

I collected performance profiles in Chrome, and I could see that initial calls to applyArabicShaping and processBidirectionalText were down to around 1ms or less to execute (using the asm.js version, the first calls can take 10-20ms, but they quickly optimize down to around 1ms).

This was using Chrome 58. Firefox 52 and Safari 10.1.1 both failed to load the wasm, which in this case failed pretty smoothly: the map just worked as if the plugin hadn't been loaded.

@femski
Copy link

femski commented Jul 13, 2017

Its great news that you are considering webassembly.

@mourner's Supercluster.hpp (I am running a slightly modified version on an 8 MB GeoJSON) is roughly 4x faster than the Javascript version. GeoJSON parsing is 4X faster too. Tile fetch is sub-milliseconds but still 4-10x faster. I think this is pretty representative of GL Native/wasm combo vs GL JS performance advantage.

Even if we assume a 20-30% performance loss going from C++ to Webassembly - GL Native/wasm will be three times faster.

It will take 5 years for CPUs to get 3 times faster - so if pure JS to Native/wasm migration is done in say 1 year- we are still ahead by a few years in terms of performance. Plus consider the benefits of centralizing all code base and logic.

@jfirebaugh
Copy link
Contributor

Earcut tends to show up as a significant contributor to time spent doing layout on the web worker, so I've experimented with porting it to Rust and compiling to WebAssemply using wasm-bindgen.

The port is in the rust branch of mapbox/earcut. It passes all of the original test fixtures, and the initial results seemed quite promising, with the Rust port comparable to the C++ implementation when run natively:

C++:

+----------------+--------------------+--------------------+
| Polygon        | earcut             | libtess2           |
+----------------+--------------------+--------------------+
| building       |    1,256,238 ops/s |      115,137 ops/s |
| dude           |       93,457 ops/s |       33,060 ops/s |
| water          |        1,033 ops/s |          207 ops/s |
+----------------+--------------------+--------------------+

Rust (with cargo bench):

test tests::building      ... bench:         728 ns/iter (+/- 140) (= 1,373,626 ops/s)
test tests::dude          ... bench:      10,898 ns/iter (+/- 2,305) (= 91,759 ops/s)
test tests::water         ... bench:     982,168 ns/iter (+/- 81,979) (= 1,018 ops/s)

However, the results for WebAssembly are disappointing. Here are some benchmark comparisons in various environments.

Firefox 60.0.2 (with yarn wasm-browser):

JS typical OSM building (15 vertices): x 289,571 ops/sec ±0.98% (57 runs sampled)
WASM typical OSM building (15 vertices): x 197,851 ops/sec ±0.80% (59 runs sampled)
JS dude shape (104 vertices): x 13,796 ops/sec ±1.14% (57 runs sampled)
WASM dude shape (104 vertices): x 12,106 ops/sec ±1.76% (57 runs sampled)
JS complex OSM water (2523 vertices): x 320 ops/sec ±0.68% (57 runs sampled)
WASM complex OSM water (2523 vertices): x 273 ops/sec ±1.21% (55 runs sampled)

Chrome 67.0.3396.87 (with yarn wasm-browser):

JS typical OSM building (15 vertices): x 635,057 ops/sec ±2.27% (58 runs sampled)
WASM typical OSM building (15 vertices): x 72,222 ops/sec ±1.17% (64 runs sampled)
JS dude shape (104 vertices): x 34,503 ops/sec ±0.56% (66 runs sampled)
WASM dude shape (104 vertices): x 1,490 ops/sec ±0.24% (12 runs sampled)
JS complex OSM water (2523 vertices): x 596 ops/sec ±0.72% (65 runs sampled)
WASM complex OSM water (2523 vertices): x 37.29 ops/sec ±0.59% (49 runs sampled)

Node v8.11.1 (with yarn wasm-node):

JS typical OSM building (15 vertices): x 825,015 ops/sec ±0.84% (92 runs sampled)
WASM typical OSM building (15 vertices): x 165,339 ops/sec ±3.58% (86 runs sampled)
JS dude shape (104 vertices): x 33,635 ops/sec ±4.13% (84 runs sampled)
WASM dude shape (104 vertices): x 7,850 ops/sec ±0.94% (89 runs sampled)
JS complex OSM water (2523 vertices): x 627 ops/sec ±0.68% (89 runs sampled)
WASM complex OSM water (2523 vertices): x 160 ops/sec ±0.77% (81 runs sampled)

I would expect the "typical OSM building" benchmark to be most affected by the overhead of the JS<->wasm bridge, and for Firefox we do seem to see that effect, although even with complex polygons, the wasm implementation is slower than the JS implementation.

For the V8-based enviroments, the wasm implementation performs poorly across the board, in the worst cases 15-25x slower than JS.

The time profiling I've done so far does not reveal anything obvious -- both Chrome and Firefox indicate the large majority of time is spent in earcut_linked, which is probably the recipient of most of the inlining. Firefox profile.json here if you want to look.

Any suggestions or advice from folks more familiar with WebAssembly is very much welcome. If you'd like to try to reproduce my results, you can check out the rust branch of mapbox/earcut and try either the wasm-node or wasm-browser scripts. For reference, I'm using:

~/Development/earcut $ rustc --version
rustc 1.28.0-nightly (a805a2a5e 2018-06-10)
~/Development/earcut $ wasm-bindgen --version
wasm-bindgen 0.2.11

@fitzgen
Copy link

fitzgen commented Jun 22, 2018

Hi! Apologies for taking a while to circle back to this after talking to you on IRC the other day.

Since the native code is so much faster, I suspect maybe there is some funky stuff going on with the way data is transferred between JS and wasm memory. I hope to dig in deeper soon and do some profiling. Going to read the papers linked from the README to try and understand what this is even doing first :-p

Also: thanks for making the benchmarks easy to repeat! :)

@jfirebaugh
Copy link
Contributor

Thanks for looking into it @fitzgen! If there's anything I can help with, including investigating possible hypotheses, let me know.

When you say "funky stuff going on with the way data is transferred between JS and wasm memory", are you imagining something beyond the expected overhead of the transfer itself, like an unexpectedly high per-memory-access cost? If so, the "complex OSM water" benchmark is probably the best one to look at -- it's the where I expect the algorithmic costs due to input complexity to most strongly dominate the transfer overhead.

@ChrisLoer
Copy link
Contributor

I got some initial results hacking together a version of the benchmark using embind and earcut.hpp. My steps were roughly:

  • Make an earcut.hpp wrapper that takes a vertex vector and a hole vector, turns it into a "Polygon", calls through to earcut, and then returns the result as a vector
  • Compile that to wasm plus a JS loader that includes the emscripten runtime: emcc --bind -std=c++11 -Oz -v -o earcut.js src/earcut_wrapper.cpp ...
  • Hack the emscripten runtime to remove the "running in node environment" logic since that doesn't work with webpack. Also modify the "wasmBinaryFile" variable to point to a location hosted by the webpack dev server.
  • Write a wrapper script that copies javascript arrays to the embound std::vector types and back, add calls to that wrapper script to index_browser.js.
  • As a rough test that the embind code was actually working, I did array equality checking on the results of the embound earcut vs the other two implementations.

Overall, it looks like my JS/rust results are relatively similar to what @jfirebaugh saw. The "embind WASM" results also look pretty consistent across runs.

It looks like on Chrome the embound version is even worse for the "typical OSM building" shape, and better than rust wasm but not back up to JS par for the dude and "complex water" shapes. This at least seems consistent with the idea that binding overhead is significant. On Firefox, it looks like the embound version is worse across the board. 🙁

Chrome 67.0.3396.87:

JS typical OSM building (15 vertices): x 526,696 ops/sec ±3.05% (55 runs sampled)
rust WASM typical OSM building (15 vertices): x 62,692 ops/sec ±3.06% (55 runs sampled)
embind WASM typical OSM building (15 vertices): x 44,527 ops/sec ±3.73% (53 runs sampled)

JS dude shape (104 vertices): x 29,350 ops/sec ±4.31% (53 runs sampled)
rust WASM dude shape (104 vertices): x 1,324 ops/sec ±1.91% (54 runs sampled)
embind WASM dude shape (104 vertices): x 6,250 ops/sec ±13.22% (56 runs sampled)

JS complex OSM water (2523 vertices): x 510 ops/sec ±3.10% (55 runs sampled)
rust WASM complex OSM water (2523 vertices): x 34.45 ops/sec ±1.23% (45 runs sampled)
embind WASM complex OSM water (2523 vertices): x 214 ops/sec ±19.06% (55 runs sampled)

Firefox 60.0.2:

JS typical OSM building (15 vertices): x 408,575 ops/sec ±2.30% (54 runs sampled)
rust WASM typical OSM building (15 vertices): x 210,591 ops/sec ±2.83% (58 runs sampled)
embind WASM typical OSM building (15 vertices): x 37,155 ops/sec ±7.16% (43 runs sampled)

JS dude shape (104 vertices): x 15,305 ops/sec ±2.55% (56 runs sampled)
rust WASM dude shape (104 vertices): x 13,068 ops/sec ±1.62% (58 runs sampled)
embind WASM dude shape (104 vertices): x 5,077 ops/sec ±7.98% (29 runs sampled)

JS complex OSM water (2523 vertices): x 321 ops/sec ±2.70% (54 runs sampled)
rust WASM complex OSM water (2523 vertices): x 277 ops/sec ±1.61% (56 runs sampled)
embind WASM complex OSM water (2523 vertices): x 191 ops/sec ±2.79% (51 runs sampled)

@ChrisLoer
Copy link
Contributor

To kind-of-isolate the bridging costs, I made a modified version of the embind benchmark that:

  1. Copies the vertex/hole arrays into wasm memory before starting
  2. Never copies the results back to the JS heap

This starts to get the embind wasm version close to JS in Chrome, and in Firefox we can actually see significant improvement in the dude and "complex water" cases.

Chrome 67.0.3396.87:

JS typical OSM building (15 vertices): x 563,116 ops/sec ±2.18% (55 runs sampled)
rust WASM typical OSM building (15 vertices): x 65,679 ops/sec ±2.85% (55 runs sampled)
embind WASM typical OSM building (15 vertices): x 199,363 ops/sec ±16.25% (48 runs sampled)

JS dude shape (104 vertices): x 27,292 ops/sec ±5.68% (54 runs sampled)
rust WASM dude shape (104 vertices): x 1,370 ops/sec ±1.42% (59 runs sampled)
embind WASM dude shape (104 vertices): x 28,056 ops/sec ±2.19% (54 runs sampled)

JS complex OSM water (2523 vertices): x 516 ops/sec ±3.54% (55 runs sampled)
rust WASM complex OSM water (2523 vertices): x 35.07 ops/sec ±1.35% (45 runs sampled)
embind WASM complex OSM water (2523 vertices): x 493 ops/sec ±24.57% (50 runs sampled)

Firefox 60.0.2:

JS typical OSM building (15 vertices): x 432,564 ops/sec ±1.88% (56 runs sampled)
rust WASM typical OSM building (15 vertices): x 219,198 ops/sec ±2.24% (58 runs sampled)
embind WASM typical OSM building (15 vertices): x 348,351 ops/sec ±1.23% (59 runs sampled)

JS dude shape (104 vertices): x 15,521 ops/sec ±2.30% (59 runs sampled)
rust WASM dude shape (104 vertices): x 13,446 ops/sec ±1.35% (58 runs sampled)
embind WASM dude shape (104 vertices): x 42,861 ops/sec ±1.08% (60 runs sampled)

JS complex OSM water (2523 vertices): x 351 ops/sec ±1.47% (56 runs sampled)
rust WASM complex OSM water (2523 vertices): x 283 ops/sec ±1.08% (57 runs sampled)
embind WASM complex OSM water (2523 vertices): x 755 ops/sec ±0.84% (60 runs sampled)

🤷‍♂️ ? I guess my takeaways so far are:

  • No easy win, but at least for complex cases we can show a wasm version running faster than plain JS
  • Surprisingly large differences between JS engines
  • Seems like getting a win would depend on managing the bridge really carefully (e.g. in FillBucket keep the "flattened", "holeIndices" and "indices" arrays entirely on the wasm side to avoid extra copies).

@MaxGraey
Copy link

MaxGraey commented Aug 18, 2018

I also started port to AssemblyScript. To open full WebAssembly power I also added water-huge.json. My early results:

Node.js v10.9.0 results for water-huge.json:

JS huge complex water (5667 vertices): x 50.47 ops/sec ±2.53% (65 runs sampled)
AssemblyScript WASM huge complex water (5667 vertices): x 218 ops/sec ±0.87% (84 runs sampled)
Rust WASM huge complex water (5667 vertices): x 21.26 ops/sec ±0.68% (39 runs sampled)

But I don't yet unboxing and copy result triangle indices back to JS. It's still very draft and need some fixes and may contain some bugs. Also I don't deallocation. Version of complex OSM water just 5% faster, JS dude shape and OSM building twice slower. Also AssemblyScript produce much smaller binary compare to Rust (approximately 4 times smaller).

You can investigate AS branch.

EDIT
Found variable shadowing bug which cause to wrong hole elimination. After fix, all results the same except huge complex water mentioned before. Also added conversion for result array.

New results

JS huge complex water (5667 vertices): x 50.73 ops/sec ±2.27% (66 runs sampled)
AssemblyScript WASM huge complex water (5667 vertices): x 56.79 ops/sec ±0.77% (72 runs sampled)
Rust WASM huge complex water (5667 vertices): x 21.02 ops/sec ±1.05% (39 runs sampled)

So improvements not so significant just 5-11%. But I still see a room for father improvements.

@MaxGraey
Copy link

MaxGraey commented Aug 18, 2018

After some minor improvments:

UPDATED

JS typical OSM building (15 vertices): x 615,456 ops/sec ±2.00% (93 runs sampled)
AssemblyScript WASM typical OSM building (15 vertices): x 300,604 ops/sec ±5.80% (79 runs sampled)
Rust WASM typical OSM building (15 vertices): x 169,775 ops/sec ±1.39% (88 runs sampled)

JS dude shape (104 vertices): x 30,180 ops/sec ±2.96% (92 runs sampled)
AssemblyScript WASM dude shape (104 vertices): x 28,816 ops/sec ±5.38% (86 runs sampled)
Rust WASM dude shape (104 vertices): x 6,599 ops/sec ±0.91% (90 runs sampled)

JS complex OSM water (2523 vertices): x 565 ops/sec ±2.14% (88 runs sampled)
AssemblyScript WASM complex OSM water (2523 vertices): x 628 ops/sec ±2.88% (89 runs sampled)
Rust WASM complex OSM water (2523 vertices): x 135 ops/sec ±0.75% (76 runs sampled)

JS huge complex water (5667 vertices): x 49.71 ops/sec ±2.72% (65 runs sampled)
AssemblyScript WASM huge complex water (5667 vertices): x 58.73 ops/sec ±1.06% (61 runs sampled)
Rust WASM huge complex water (5667 vertices): x 21.20 ops/sec ±0.95% (39 runs sampled)

Still have problem with deallocation memory, so launch series of diff tasks not possible yet.

@MaxGraey
Copy link

MaxGraey commented Aug 19, 2018

@jfirebaugh By the way, I updated rust's dependencies and add LTO and opt-level = 3 and now I got more expectable results for Rust as well =) Even slightly better than AssemblyScript.

JS typical OSM building (15 vertices): x 624,717 ops/sec ±2.31% (92 runs sampled)
AssemblyScript WASM typical OSM building (15 vertices): x 290,853 ops/sec ±6.86% (76 runs sampled)
Rust WASM typical OSM building (15 vertices): x 366,717 ops/sec ±1.29% (83 runs sampled)

JS dude shape (104 vertices): x 30,631 ops/sec ±3.08% (91 runs sampled)
AssemblyScript WASM dude shape (104 vertices): x 28,793 ops/sec ±5.37% (83 runs sampled)
Rust WASM dude shape (104 vertices): x 34,130 ops/sec ±0.76% (93 runs sampled)

JS complex OSM water (2523 vertices): x 561 ops/sec ±2.10% (87 runs sampled)
AssemblyScript WASM complex OSM water (2523 vertices): x 642 ops/sec ±2.72% (89 runs sampled)
Rust WASM complex OSM water (2523 vertices): x 673 ops/sec ±0.74% (90 runs sampled)

JS huge complex water (5667 vertices): x 50.76 ops/sec ±2.77% (66 runs sampled)
AssemblyScript WASM huge complex water (5667 vertices): x 60.30 ops/sec ±1.10% (62 runs sampled)
Rust WASM huge complex water (5667 vertices): x 64.72 ops/sec ±2.01% (66 runs sampled)

Environment

node v10.9.0
rustc 1.30.0-nightly (1fa944914 2018-08-17)

@lukewagner
Copy link

In Firefox, to see if and how much the entry/exit stubs are hurting performance, you can open the Performance pane, click the gear in the top-right, check "Show Gecko Platform Data", profile the benchmark, and then look at the self time for frames that have "trampoline (in wasm)" in the name. I'd also suggest measuring in Firefox Nightly, since it has some new optimizations for this path.

@MaxGraey
Copy link

@lukewagner Nice! Thanks for the tip.

@MaxGraey
Copy link

ok, with latest improvments:

JS typical OSM building (15 vertices): x 637,368 ops/sec ±2.34% (85 runs sampled)
AssemblyScript WASM typical OSM building (15 vertices): x 251,311 ops/sec ±14.07% (59 runs sampled)
Rust WASM typical OSM building (15 vertices): x 376,918 ops/sec ±1.50% (84 runs sampled)

JS dude shape (104 vertices): x 31,497 ops/sec ±2.48% (88 runs sampled)
AssemblyScript WASM dude shape (104 vertices): x 34,940 ops/sec ±0.85% (92 runs sampled)
Rust WASM dude shape (104 vertices): x 34,569 ops/sec ±0.83% (91 runs sampled)

JS complex OSM water (2523 vertices): x 580 ops/sec ±1.46% (88 runs sampled)
AssemblyScript WASM complex OSM water (2523 vertices): x 716 ops/sec ±0.61% (92 runs sampled)
Rust WASM complex OSM water (2523 vertices): x 689 ops/sec ±0.37% (92 runs sampled)

JS huge complex water (5667 vertices): x 51.54 ops/sec ±2.02% (66 runs sampled)
AssemblyScript WASM huge complex water (5667 vertices): x 62.70 ops/sec ±0.86% (65 runs sampled)
Rust WASM huge complex water (5667 vertices): x 70.53 ops/sec ±0.94% (72 runs sampled)

@lukewagner
Copy link

Do you have a link to the demo I could try locally?

@MaxGraey
Copy link

@lukewagner
Copy link

Thanks! Sorry if my npm-fu is just weak here, but how would I go about running the above tests in a browser?

@MaxGraey
Copy link

MaxGraey commented Aug 22, 2018

I don't try run in browser yet, only on node.js but this should be easy. Just need little bit modify loader.js. I'll do this tomorrow or later

@femski
Copy link

femski commented Aug 13, 2019

As experience here shows - overhead of back and forth between Webassembly and JS is just too much. So a better approach may be to compile Gl Native to WASM - so all memory and threads are managed inside WASM/native layer with minimal API overhead. But we need SharedArrayBuffer, Atomics and threads for that.

Because of Specter and Meltown scare- SharedArrayBuffer, Atomics and pThreads - were all stalled.

Things are changing again. And there are signs of thawing. Chrome 70 now has threads. And Google just ported Earth to WASM - which is wicked fast - proving that WASM is the future of maps.

My own experience with WASM build a for large library (Spatialite 5.0) shows WASM being at least 3-10x faster and I think a WASM build of GL Native will be similarly fast.

Are there any takers for compiling GL native to WASM instead?

@femski
Copy link

femski commented Mar 9, 2020

Dear Mapbox Denizens:

I am happy to report I have Mapbox GL Native running in a browser via WASM !

Click this link (Chrome 80 or higher is required, Firefox coming soon : Desktop only, may have to move mouse sometimes to see map refresh):

https://avnav.com/mbgl/

to see a mapbox street map displayed in browser at 30+ frames per seconds. This is entire GL Native running in WASM – compiled via Emscripten.

Main challenges overcome so far:

There were three main challenges:

Current limitations:

  • Because of lack of real eventloop – we are using more than usual number of threads – this is causing performance degradation and wasted CPU. I expect - as an when emscripten has polling support (across threads - via Fibers) – we will have a fully running GL Native that will be smoother than GL JS. After all todays GLFW Native is much smoother than GL JS in browser on same device.
  • I have run it for hours and found no OOM (Out of memory) or runtime exception. Any such exception introduced if it ever happens likely due to above hack for Event Loop and will be fixed when we have event loop properly working.

Future:

  • Fix event loop – this is tops Event Loop Integration: Support for Poll, Epoll or Coroutines emscripten-core/emscripten#10556
  • Add caching to have fair comparison with GL JS and GL native.
  • Window size is fixed at 1600x1024 pixel (Emscripten GLFW3 limitation) – make it sync w/ browser for full screen resizable experience
  • No touch support –GLFW does not have touch
  • Add WEBP w/ SIMD Support
  • Generate Typescript, Javascript (Go, C# ?) bindings. What would be a good name (GL WASM ?)
  • QT WASM Plugin?
  • Migrate to new Buck build, tests etc.

@lars-t-hansen
Copy link

This is great news.

But (speaking as a Firefox wasm engineer) I'm curious about what's holding this back from working in Firefox? There seems to be a couple of things.

One, I see that when I try to load that part of the problem is the missing COOP/COEP headers. Context for this:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer/Planned_changes
https://docs.google.com/document/d/1zDlfvfTJ_9e8Jdc8ehuV4zMEu9ySMCiTGMS9y0GU92k/edit (linked from the previous one too)

These headers are required by Firefox now to use shared memory but will also be required by Chrome in the future (Chrome intent to ship: https://mail.google.com/mail/u/0/#search/label%3Ab-blink-dev+COEP/FMfcgxwGDWwLKWrkFltRgXKbrMhpCqKq)

There's a pref in Nightly for disabling this security measure, dom.postMessage.sharedArrayBuffer.bypassCOOP_COEP.insecure.enabled. Flipping this to true, I get further: I can load the demo page but it never shows anything except a slow script warning, and when I kill it I get a backtrace that makes it looks like the event loop problem:

Script terminated by timeout at:
_emscripten_futex_wait@https://avnav.com/mbgl/mbgl-glfw.js:1:135517
__emscripten_do_pthread_join@https://avnav.com/mbgl/mbgl-glfw.js:1:243064
_pthread_join@https://avnav.com/mbgl/mbgl-glfw.js:1:243163
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15189]:0x1e222b
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15093]:0x1ddb4a
Module.dynCall_vi@https://avnav.com/mbgl/mbgl-glfw.js:1:292545
invoke_vi@https://avnav.com/mbgl/mbgl-glfw.js:1:356618
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[4122]:0x69213
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[1551]:0x262f4
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[2229]:0x324ec
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[551]:0x14458
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[5701]:0x96bcb
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[9071]:0xfcc9b
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[21191]:0x2fd49c
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[2229]:0x324ec
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[551]:0x14458
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[4219]:0x6ae44
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[21446]:0x30553c
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[2229]:0x324ec
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[551]:0x14458
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[545]:0x13ceb
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[4395]:0x6d6c1
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15092]:0x1ddb40
Module.dynCall_vii@https://avnav.com/mbgl/mbgl-glfw.js:1:292869
invoke_vii@https://avnav.com/mbgl/mbgl-glfw.js:1:355894
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25912]:0x3e7a46
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[10072]:0x111a64
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[5996]:0x9e288
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[3050]:0x475ca
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25915]:0x3e7bc9
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[2229]:0x324ec
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[551]:0x14458
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[545]:0x13ceb
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[3796]:0x5ddb1
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[14081]:0x1beb47
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[2229]:0x324ec
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[551]:0x14458
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[10053]:0x1113a1
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25856]:0x3e5ff7
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[2229]:0x324ec
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[551]:0x14458
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[545]:0x13ceb
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[20440]:0x2dfa3d
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15013]:0x1dd216
Module.dynCall_iiii@https://avnav.com/mbgl/mbgl-glfw.js:1:321378
invoke_iiii@https://avnav.com/mbgl/mbgl-glfw.js:1:356051
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25859]:0x3e60b6
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25860]:0x3e615c
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15013]:0x1dd216
Module.dynCall_iiii@https://avnav.com/mbgl/mbgl-glfw.js:1:321378
invoke_iiii@https://avnav.com/mbgl/mbgl-glfw.js:1:356051
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25861]:0x3e61d7
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15013]:0x1dd216
Module.dynCall_iiii@https://avnav.com/mbgl/mbgl-glfw.js:1:321378
invoke_iiii@https://avnav.com/mbgl/mbgl-glfw.js:1:356051
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25862]:0x3e6267
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[10162]:0x114f32
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15091]:0x1ddb34
Module.dynCall_viii@https://avnav.com/mbgl/mbgl-glfw.js:1:293196
invoke_viii@https://avnav.com/mbgl/mbgl-glfw.js:1:356918
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[10178]:0x116f7b
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25761]:0x3e2afa
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15092]:0x1ddb40
Module.dynCall_vii@https://avnav.com/mbgl/mbgl-glfw.js:1:292869
invoke_vii@https://avnav.com/mbgl/mbgl-glfw.js:1:355894
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[22328]:0x32aef2
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15093]:0x1ddb4a
Module.dynCall_vi@https://avnav.com/mbgl/mbgl-glfw.js:1:292545
invoke_vi@https://avnav.com/mbgl/mbgl-glfw.js:1:356618
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25313]:0x3cbe06
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25292]:0x3cb385
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15093]:0x1ddb4a
Module.dynCall_vi@https://avnav.com/mbgl/mbgl-glfw.js:1:292545
_emscripten_set_interval/<@https://avnav.com/mbgl/mbgl-glfw.js:1:183577
setInterval handler*_emscripten_set_interval@https://avnav.com/mbgl/mbgl-glfw.js:1:183544
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25299]:0x3cb69a
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[15093]:0x1ddb4a
Module.dynCall_vi@https://avnav.com/mbgl/mbgl-glfw.js:1:292545
invoke_vi@https://avnav.com/mbgl/mbgl-glfw.js:1:356618
@https://avnav.com/mbgl/mbgl-glfw.wasm:wasm-function[25262]:0x3ca044
Module._main@https://avnav.com/mbgl/mbgl-glfw.js:1:278912
callMain@https://avnav.com/mbgl/mbgl-glfw.js:1:423906
doRun@https://avnav.com/mbgl/mbgl-glfw.js:1:424473
run/<@https://avnav.com/mbgl/mbgl-glfw.js:1:424620
setTimeout handler*run@https://avnav.com/mbgl/mbgl-glfw.js:1:424558
runCaller@https://avnav.com/mbgl/mbgl-glfw.js:1:423299
removeRunDependency@https://avnav.com/mbgl/mbgl-glfw.js:1:22639
receiveInstance@https://avnav.com/mbgl/mbgl-glfw.js:1:24190
receiveInstantiatedSource@https://avnav.com/mbgl/mbgl-glfw.js:1:24516
promise callback*instantiateAsync/<@https://avnav.com/mbgl/mbgl-glfw.js:1:25091
promise callback*instantiateAsync@https://avnav.com/mbgl/mbgl-glfw.js:1:24994
createWasm@https://avnav.com/mbgl/mbgl-glfw.js:1:25544
@https://avnav.com/mbgl/mbgl-glfw.js:1:278255

That was after waiting about 30s for it to complete; not sure what it's waiting for.

@femski
Copy link

femski commented Mar 9, 2020

This is great news.

But (speaking as a Firefox wasm engineer) I'm curious about what's holding this back from working in Firefox? There seems to be a couple of things.

Yes, the headers are missing. I will set it today and you will see it work in Firefox too.
Its great to see Firefox taking interest in this - I believe WSM works fastest in Firefox and our frame rates may be even better there.

yes, there is hack to get Style URL from network- they may be wait on main thread here - not sure why it times out. Will fix.

@femski
Copy link

femski commented Mar 10, 2020

@lars-t-hansen
header should now be fixed. I tried Firefox nightly and worked right out of box - no change in any settings required. Firefox FPS is better than Chrome (50 vs 30) - but we have some ways to go still.

@lars-t-hansen
Copy link

Yeah, I can confirm that - looks really good, too :-) Thanks for doing this, it is very exciting.

@femski
Copy link

femski commented Mar 23, 2020

Dear All,

Time for an Update:

GL WASM port is doing very well! Please be sure to checkout latest update (now runs in 90% browsers that support WebAssembly. Firefox remains fastest. No more pthreads - meaning it runs even in iOS Safari !):

https://avnav.com/mbgl/

  • All the hangs, freezes, stutters and lost frames are gone - smooth, complete map that always renders correctly.
  • Its now performing within 25% of GL-Native on my Linux machine in Firefox Nightly - without any tile caching and on a single thread in a browser !
  • No more event loop (I could not get Emscripten to jump threads), So no pthreads - got singe threading to work. Meaning we don't need SharedArayBuffer. Don't know when that will happen now - Corona Virus has changed the landscape completely and ptherads now may be delayed significantly. So its good to run without pthread/SharedArrayBuffer for a while.
  • Most importantly WASM port (because it does not SharedArayBuffer/pthreads anymore) now runs in Safari and even iOS Safari. This is huge, Think of all the battery saving - its running on single thread and performing so well even without cache.

Please stay tuned for more !

@ShanonJackson
Copy link

We're doing some extremely performance heavy stuff in the Farm environmental planning scene.... many... many.... layers. I know WASM is the endgame for canvas performance so ended up here from google.

Don't know anything about WASM but really interested in anything you have that is consumable @femski (in terms of bindings for javascript/typescript). Please let me know the second you've got something on that front.

Love your work, you guys are really pushing the boundries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests