-
Notifications
You must be signed in to change notification settings - Fork 8
Optimizations
This page is intended to be a general running discussion of places with known hot-spots which could be optimized, as part of a deep dive into optimizing both memory usage, load times, and general runtime performance.
All asset loading is handled by Elation Engine's asset manager. This system is responsible for fetching files from any number of storage backends (commonly HTTP, but also data URIs, File objects, or distributed filesystems like DAT and IPFS). Assets are fetched asynchronously by the assetdownloader, then handed off to loaders by type (image, model, video, audio, etc). Each one has their own performance characteristics which need to be considered separately.
Models are loaded using a pool of workers. The number of workers is determined by looking at navigator.hardwareConcurrency
which tells us the number of cores available to us. We don't have any control over CPU affinity like we might in native environments, so we just create n-1 asset workers to leave one core available for the main thread, and hope that the browser/OS handles scheduling intelligently.
Inside the model loader worker script, we first determine whether the content is gzipped. If it is, we inflate it and pass it on to the next step, otherwise the raw data is passed on to a function which determines the content type. Based on this content type, we then pass it on to the model-specific loader - OBJ, glTF, FBX, DAE, etc. Each of these loaders maps to the underlying THREE.Loader, so again, each of those implementations has its own performance characteristics and memory usage patterns - some are more optimized than others.
Once the loader has done its job, we usually end up with a THREE.Object3D containing the whole hierarchy of objects represented by the model we've just loaded. We need to transfer this back to the main thread - here's one place where we get into trouble though. The easiest way to get this whole hierarchy from our worker thread back to our main thread is by serializing it to JSON, but it turns out that's very inefficient.
Here we see the CPU and memory usage of a worker that's loading 8.3mb of geometry data from a .gltf/.bin pair. Let's break it into stages.
- The first block of time we see is spent evaluating the asset worker's script. During profiling, this takes about 600ms - real-world usage is a bit faster, but this could be improved by paring down on what gets pulled into our assetworker's JS. We're currently including a full copy of Three.js and the engine code, when we could pare this down a bit to only code that's necessary for loading models.
- Once the worker's scripting environment is up and ready to start processing, our onmessage handler fires, as the main thread has passed some work off to us to process. We very quickly identify the file type by looking at the first few bytes, and pass it off to THREE.GLTFLoader. It's interesting to note that for separate .gltf/.bin files, the way GLTFLoader works actually breaks some of our asset loading optimizations, by forcing the large data to download in a way whicch blocks the worker thread. Ideally we would have already fetched the .bin file before passing it to the worker to be processed, so this is one possible optimization already. Other formats generally don't suffer from this problem.
- Once the .bin file loads, THREE.GLTFLoader parses it into a THREE.Scene hierarchy for us in about 90ms. Not bad.
- Now that we have a THREE.Scene object containing all the objects, geometries, and materials that make up our newly loaded model, we need to send that back to the main thread. This is where things get ugly. Right now, we're doing that by calling scene.toJSON(), which turns the whole hierarchy into a JSON object representing the sum total of all textures, images, geometry data, lights, cameras, and object relationships. In our example, this step takes a whopping 2.56 seconds.
- From here, we take the JSON object and do a little bit of postprocessing on it. We replace simple JavaScript arrays containing geometry data with TypedArrays where it makes sense - Float32Array, Int32Array, etc. In theory this makes it more efficient for us to transfer the data back to the main thread, but since we're doing this as an extra stp on the end, it ends up costing us another 310ms.
So ignoring the time we spent downloading the .bin file (or at least waiting for the browser's download cache to give it to us), the total time spent in the worker to load this model is about 3 seconds of actual CPU processing time. Of this, 2.8 seconds is wasted in inefficient serialization. On top of the excess CPU usage and wall-clock time, we balloon the memory up from about 6.7MB up to about 26MB during the serialization process, triggering several GC passes in the process. We can clearly do better!
It turns out the fix for this is pretty simple, and highly effective. Looking into the toJSON() function, it turns out that serializing the Object3Ds themselves is pretty quick, and almost all of the time is spent in THREE.BufferGeometry
's .toJSON()
which encodes the actual geometry data. The way this function works is by using Array.slice() to clone the TypedArrays used internally by THREE.BufferGeometry
/ WebGL in general. This gives us a regular old non-typed JavaScript array, containing a copy of all the values that make up the geometry. While this is necessary if we were truly outputting JSON, in our case it would actually be preferable to use the TypedArrays directly.
As an experiment, we tried overriding this function inside of our assetloader worker. JavaScript makes this easy - we clone the function from the original implementation (https://github.com/mrdoob/three.js/blob/dev/src/core/BufferGeometry.js#L952-L1074), and tweak it slightly:
THREE.BufferGeometry.prototype.toJSON = function() {
...
data.data = { attributes: {} };
var index = this.index;
if ( index !== null ) {
data.data.index = {
type: index.array.constructor.name,
array: index.array
};
}
var attributes = this.attributes;
for ( var key in attributes ) {
var attribute = attributes[ key ];
data.data.attributes[ key ] = {
itemSize: attribute.itemSize,
type: attribute.array.constructor.name,
array: attributes.array,
normalized: attribute.normalized
};
}
...
return data;
};
Now that we've made this change, we can also remove the remapping step - now instead of creating new TypedArrays, we just make a list of transferrable objects, so we can pass that a the second argument to postMessage()
:
this.parse(modeldata, job).then(function(data) {
var transferrables = [];
// Buld a list of ArrayBuffers that can be transferred to the main thread, to avoid memory copies
try {
if (data.geometries) {
for (var i = 0; i < data.geometries.length; i++) {
var geo = data.geometries[i];
for (var k in geo.data.attributes) {
transferrables.push(geo.data.attributes[k].array.buffer);
}
}
}
postMessage({message: 'finished', id: job.id, data: data}, transferrables);
} catch (e) {
postMessage({message: 'error', id: job.id, data: e.toString()});
}
}, function(d) {
postMessage({message: 'error', id: job.id, data: d.toString()});
});
After making this change, we profile again. The same asset now loads in 37ms, and only uses 6.7mb of memory to do so, with only a single GC pass. Hooray! Model assets now load noticably faster - this change affects all model types, not glTF. This change is a big win across the board.