-
Notifications
You must be signed in to change notification settings - Fork 8
Optimizations
This page is intended to be a general running discussion of places with known hot-spots which could be optimized, as part of a deep dive into optimizing both memory usage, load times, and general runtime performance.
All asset loading is handled by Elation Engine's asset manager. This system is responsible for fetching files from any number of storage backends (commonly HTTP, but also data URIs, File objects, or distributed filesystems like DAT and IPFS). Assets are fetched asynchronously by the assetdownloader, then handed off to loaders by type (image, model, video, audio, etc). Each one has their own performance characteristics which need to be considered separately.
Models are loaded using a pool of workers. The number of workers is determined by looking at navigator.hardwareConcurrency
which tells us the number of cores available to us. We don't have any control over CPU affinity like we might in native environments, so we just create n-1 asset workers to leave one core available for the main thread, and hope that the browser/OS handles scheduling intelligently.
Inside the model loader worker script, we first determine whether the content is gzipped. If it is, we inflate it and pass it on to the next step, otherwise the raw data is passed on to a function which determines the content type. Based on this content type, we then pass it on to the model-specific loader - OBJ, glTF, FBX, DAE, etc. Each of these loaders maps to the underlying THREE.Loader, so again, each of those implementations has its own performance characteristics and memory usage patterns - some are more optimized than others.
Once the loader has done its job, we usually end up with a THREE.Object3D containing the whole hierarchy of objects represented by the model we've just loaded. We need to transfer this back to the main thread - here's one place where we get into trouble though. The easiest way to get this whole hierarchy from our worker thread back to our main thread is by serializing it to JSON, but it turns out that's very inefficient.
Here we see the CPU and memory usage of a worker that's loading 8.3mb of geometry data from a .gltf/.bin pair. Let's break it into stages.
- The first block of time we see is spent evaluating the asset worker's script. During profiling, this takes about 600ms - real-world usage is a bit faster, but this could be improved by paring down on what gets pulled into our assetworker's JS. We're currently including a full copy of Three.js and the engine code, when we could pare this down a bit to only code that's necessary for loading models.
- Once the worker's scripting environment is up and ready to start processing, our onmessage handler fires, as the main thread has passed some work off to us to process. We very quickly identify the file type by looking at the first few bytes, and pass it off to THREE.GLTFLoader. It's interesting to note that for separate .gltf/.bin files, the way GLTFLoader works actually breaks some of our asset loading optimizations, by forcing the large data to download in a way whicch blocks the worker thread. Ideally we would have already fetched the .bin file before passing it to the worker to be processed, so this is one possible optimization already. Other formats generally don't suffer from this problem.
- Once the .bin file loads, THREE.GLTFLoader parses it into a THREE.Scene hierarchy for us in about 90ms. Not bad.
- Now that we have a THREE.Scene object containing all the objects, geometries, and materials that make up our newly loaded model, we need to send that back to the main thread. This is where things get ugly. Right now, we're doing that by calling scene.toJSON(), which turns the whole hierarchy into a JSON object representing the sum total of all textures, images, geometry data, lights, cameras, and object relationships. In our example, this step takes a whopping 2.56 seconds.
- From here, we take the JSON object and do a little bit of postprocessing on it. We replace simple JavaScript arrays containing geometry data with TypedArrays where it makes sense - Float32Array, Int32Array, etc. In theory this makes it more efficient for us to transfer the data back to the main thread, but since we're doing this as an extra stp on the end, it ends up costing us another 310ms.
So ignoring the time we spent downloading the .bin file (or at least waiting for the browser's download cache to give it to us), the total time spent in the worker to load this model is about 3 seconds of actual CPU processing time. Of this, 2.8 seconds is wasted in inefficient serialization. On top of the excess CPU usage and wall-clock time, we balloon the memory up from about 6.7MB up to about 26MB during the serialization process, triggering several GC passes in the process. We can clearly do better!