-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using WebAssembly #4835
Comments
Since mapbox-gl-rtl-text is already compiling native code with Emscripten, it should be relatively easy to build a wasm version. Going through that exercise, and trying to publish/support a wasm version of the plugin, might give us insight into some of these questions. |
It was pretty trivial to get mapbox-gl-rtl-text building with WASM: I just updated to the latest emscripten and added the
I tried using the "time to first tile" benchmarking (a la #3758 (comment)), but didn't see much difference between pre and post wasm-ification -- basically, the cost of running the code is already pretty small so it gets lost in the noise. I collected performance profiles in Chrome, and I could see that initial calls to This was using Chrome 58. Firefox 52 and Safari 10.1.1 both failed to load the wasm, which in this case failed pretty smoothly: the map just worked as if the plugin hadn't been loaded. |
Its great news that you are considering webassembly. @mourner's Supercluster.hpp (I am running a slightly modified version on an 8 MB GeoJSON) is roughly 4x faster than the Javascript version. GeoJSON parsing is 4X faster too. Tile fetch is sub-milliseconds but still 4-10x faster. I think this is pretty representative of GL Native/wasm combo vs GL JS performance advantage. Even if we assume a 20-30% performance loss going from C++ to Webassembly - GL Native/wasm will be three times faster. It will take 5 years for CPUs to get 3 times faster - so if pure JS to Native/wasm migration is done in say 1 year- we are still ahead by a few years in terms of performance. Plus consider the benefits of centralizing all code base and logic. |
Earcut tends to show up as a significant contributor to time spent doing layout on the web worker, so I've experimented with porting it to Rust and compiling to WebAssemply using wasm-bindgen. The port is in the C++:
Rust (with
However, the results for WebAssembly are disappointing. Here are some benchmark comparisons in various environments. Firefox 60.0.2 (with
Chrome 67.0.3396.87 (with
Node v8.11.1 (with
I would expect the "typical OSM building" benchmark to be most affected by the overhead of the JS<->wasm bridge, and for Firefox we do seem to see that effect, although even with complex polygons, the wasm implementation is slower than the JS implementation. For the V8-based enviroments, the wasm implementation performs poorly across the board, in the worst cases 15-25x slower than JS. The time profiling I've done so far does not reveal anything obvious -- both Chrome and Firefox indicate the large majority of time is spent in Any suggestions or advice from folks more familiar with WebAssembly is very much welcome. If you'd like to try to reproduce my results, you can check out the
|
Hi! Apologies for taking a while to circle back to this after talking to you on IRC the other day. Since the native code is so much faster, I suspect maybe there is some funky stuff going on with the way data is transferred between JS and wasm memory. I hope to dig in deeper soon and do some profiling. Going to read the papers linked from the README to try and understand what this is even doing first :-p Also: thanks for making the benchmarks easy to repeat! :) |
Thanks for looking into it @fitzgen! If there's anything I can help with, including investigating possible hypotheses, let me know. When you say "funky stuff going on with the way data is transferred between JS and wasm memory", are you imagining something beyond the expected overhead of the transfer itself, like an unexpectedly high per-memory-access cost? If so, the "complex OSM water" benchmark is probably the best one to look at -- it's the where I expect the algorithmic costs due to input complexity to most strongly dominate the transfer overhead. |
I got some initial results hacking together a version of the benchmark using embind and earcut.hpp. My steps were roughly:
Overall, it looks like my JS/rust results are relatively similar to what @jfirebaugh saw. The "embind WASM" results also look pretty consistent across runs. It looks like on Chrome the embound version is even worse for the "typical OSM building" shape, and better than rust wasm but not back up to JS par for the dude and "complex water" shapes. This at least seems consistent with the idea that binding overhead is significant. On Firefox, it looks like the embound version is worse across the board. 🙁 Chrome 67.0.3396.87:
Firefox 60.0.2:
|
To kind-of-isolate the bridging costs, I made a modified version of the embind benchmark that:
This starts to get the embind wasm version close to JS in Chrome, and in Firefox we can actually see significant improvement in the dude and "complex water" cases. Chrome 67.0.3396.87:
Firefox 60.0.2:
🤷♂️ ? I guess my takeaways so far are:
|
I also started port to AssemblyScript. To open full WebAssembly power I also added Node.js v10.9.0 results for
But I don't yet unboxing and copy result triangle indices back to JS. It's still very draft and need some fixes and may contain some bugs. Also I don't deallocation. Version of You can investigate AS branch. EDIT New results
So improvements not so significant just 5-11%. But I still see a room for father improvements. |
After some minor improvments: UPDATED
Still have problem with deallocation memory, so launch series of diff tasks not possible yet. |
@jfirebaugh By the way, I updated rust's dependencies and add
Environment
|
In Firefox, to see if and how much the entry/exit stubs are hurting performance, you can open the Performance pane, click the gear in the top-right, check "Show Gecko Platform Data", profile the benchmark, and then look at the self time for frames that have "trampoline (in wasm)" in the name. I'd also suggest measuring in Firefox Nightly, since it has some new optimizations for this path. |
@lukewagner Nice! Thanks for the tip. |
ok, with latest improvments:
|
Do you have a link to the demo I could try locally? |
@lukewagner |
Thanks! Sorry if my npm-fu is just weak here, but how would I go about running the above tests in a browser? |
I don't try run in browser yet, only on node.js but this should be easy. Just need little bit modify |
As experience here shows - overhead of back and forth between Webassembly and JS is just too much. So a better approach may be to compile Gl Native to WASM - so all memory and threads are managed inside WASM/native layer with minimal API overhead. But we need SharedArrayBuffer, Atomics and threads for that. Because of Specter and Meltown scare- SharedArrayBuffer, Atomics and pThreads - were all stalled. Things are changing again. And there are signs of thawing. Chrome 70 now has threads. And Google just ported Earth to WASM - which is wicked fast - proving that WASM is the future of maps. My own experience with WASM build a for large library (Spatialite 5.0) shows WASM being at least 3-10x faster and I think a WASM build of GL Native will be similarly fast. Are there any takers for compiling GL native to WASM instead? |
Dear Mapbox Denizens: I am happy to report I have Mapbox GL Native running in a browser via WASM ! Click this link (Chrome 80 or higher is required, Firefox coming soon : Desktop only, may have to move mouse sometimes to see map refresh): to see a mapbox street map displayed in browser at 30+ frames per seconds. This is entire GL Native running in WASM – compiled via Emscripten. Main challenges overcome so far: There were three main challenges:
Current limitations:
Future:
|
This is great news. But (speaking as a Firefox wasm engineer) I'm curious about what's holding this back from working in Firefox? There seems to be a couple of things. One, I see that when I try to load that part of the problem is the missing COOP/COEP headers. Context for this: These headers are required by Firefox now to use shared memory but will also be required by Chrome in the future (Chrome intent to ship: https://mail.google.com/mail/u/0/#search/label%3Ab-blink-dev+COEP/FMfcgxwGDWwLKWrkFltRgXKbrMhpCqKq) There's a pref in Nightly for disabling this security measure, dom.postMessage.sharedArrayBuffer.bypassCOOP_COEP.insecure.enabled. Flipping this to true, I get further: I can load the demo page but it never shows anything except a slow script warning, and when I kill it I get a backtrace that makes it looks like the event loop problem:
That was after waiting about 30s for it to complete; not sure what it's waiting for. |
Yes, the headers are missing. I will set it today and you will see it work in Firefox too. yes, there is hack to get Style URL from network- they may be wait on main thread here - not sure why it times out. Will fix. |
@lars-t-hansen |
Yeah, I can confirm that - looks really good, too :-) Thanks for doing this, it is very exciting. |
Dear All, Time for an Update: GL WASM port is doing very well! Please be sure to checkout latest update (now runs in 90% browsers that support WebAssembly. Firefox remains fastest. No more pthreads - meaning it runs even in iOS Safari !):
Please stay tuned for more ! |
We're doing some extremely performance heavy stuff in the Farm environmental planning scene.... many... many.... layers. I know WASM is the endgame for canvas performance so ended up here from google. Don't know anything about WASM but really interested in anything you have that is consumable @femski (in terms of bindings for javascript/typescript). Please let me know the second you've got something on that front. Love your work, you guys are really pushing the boundries. |
A couple months ago WebAssembly reached cross-browser consensus. This means it's shipping in the next or current versions of most major browsers. The support isn't wide enough yet, but this might be a good time to start talking about whether we want to use it, how we could use it and what the timeline on that might look like.
Why?
Theoretically we could compile a modified mapbox-gl-native to WebAssembly and we could have a single core codebase for all our platforms. No more porting between -js and -native and struggling to keep things in sync. There might also be performance benefits.
I don't have any answers. Just questions:
Browser support
Switching process
Developer friendliness
Architecture
Performance
Also
@mapbox/gl-core, is this something we should start thinking about now? Not yet?
The text was updated successfully, but these errors were encountered: