Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Matrices update optimization (Introducing .needsMatrixUpdate) #14138

Closed
wants to merge 6 commits into from

Conversation

takahirox
Copy link
Collaborator

@takahirox takahirox commented May 24, 2018

#14044 (comment)

This PR is for improving .updateMatrixWorld() performance.

The idea is

  • Introduce .needsMatrixUpdate.
  • Set .needsMatrixUpdate true when .position/quaternion/scale is updated.
  • Only when .needsMatrixUpdate is true, .updateMatrixWorld() calls .updateMatrix().

In short, we don't call .updateMatrix() if we don't need to.

(I haven't tried lazy update yet)

The changes are

  • Add .onChange to Vector3.
  • Set .needsMatrixUpdate true in onChange callback of Vector3/Quaternion/Euler.
  • Only when .needsMatrixUpdate is true, .updateMatrixWorld() calls .updateMatrix().

The disadvantage of this idea is the code could be a bit complex. The purpose of this PR is to see how big/small changes are and how hard/easy to maintain, and plus, to see the benefit.

UpdateMatrix benchmark Performance numbers. (.updateMatrixWorld( true ) )

On my Windows + Chrome

graph old [ops/sec] new [ops/sec] new / old
depth=100, breadth=1 108746 136512 1.26
depth=3, breadth=10 9784 11619 1.19
depth=9, breadth=3 125 141 1.13

On my Windows + FireFox Nightly

graph old [ops/sec] new [ops/sec] new / old
depth=100, breadth=1 50817 68536 1.35
depth=3, breadth=10 4633 7285 1.57
depth=9, breadth=3 124 167 1.35

UpdateMatrix benchmark Performance numbers. (.updateMatrixWorld( undefined ) )

Additional benefit is .updataMatrixWorld( false or undefined ) can be much faster if local matrix and parent's world matrix aren't updated. Simply we can skip multiplication for world matrix calculation because both .needsMatrixUpdate and .needsMatrixWorldUpdate are false.

On my Windows + Chrome

graph old [ops/sec] new [ops/sec] new / old
depth=100, breadth=1 108746 542296 4.99
depth=3, breadth=10 9784 53359 5.45
depth=9, breadth=3 125 441 3.53

On my Windows + FireFox Nightly

graph old [ops/sec] new [ops/sec] new / old
depth=100, breadth=1 50817 345471 6.80
depth=3, breadth=10 4633 48427 10.45
depth=9, breadth=3 124 528 4.26

@takahirox
Copy link
Collaborator Author

takahirox commented May 24, 2018

More precisely, even now we can do this optimization in user code like

// object creatation
var object = new THREE.Object3D();
object.matrixAutoUpdate = false;

// animation
object.position.x = newX;
object.updateMatrix();

With this PR, this optimization will be automatically done.

@mqp
Copy link
Contributor

mqp commented May 25, 2018

My team and I are super grateful that you're working on this. Thanks.

This change makes me wonder what is the intended use case of the force parameter to updateMatrixWorld. There are a number of places in the three source which pass true where I would not have thought to pass it -- for example, in Object3D.getWorldPosition and Object3D.getWorldQuaternion. Changes like this would make a bigger and bigger difference between the performance of passing true unnecessarily (which would be the ordinary speed) and not doing so (potentially very fast.)

Is there any reason for callers to pass true, unless they have manually manipulated world matrices? If not, should places like this be changed?

@Mugen87
Copy link
Collaborator

Mugen87 commented May 25, 2018

@takahirox
Copy link
Collaborator Author

takahirox commented May 25, 2018

Would you please explain a bit more?

I think introducing .needsMatrixUpdate is almost matrix/matrixWorld cache because they're calculated only when necessary.

@Mugen87
Copy link
Collaborator

Mugen87 commented May 25, 2018

I like the idea of the mentioned isSynchronized() method. So you keep information like position, rotation and scale in an internal cache and only return true if the cached data are equal with the real properties. In this case, you know that the cached matrix represents the actual transformation so you don't need a new calculation. In this way, you don't need to modify Vector3.

@takahirox
Copy link
Collaborator Author

takahirox commented May 25, 2018

Like this?

Object.prototype.isSynchronized = function () {
    return (
        this.position.x === this.cachedPosition.x &&
        this.position.y === this.cachedPosition.y &&
        this.position.z === this.cachedPosition.z &&
        this.quaternion.x === this.cachedQuaternion.x &&
        this.quaternion.y === this.cachedQuaternion.y &&
        this.quaternion.z === this.cachedQuaternion.z &&
        this.quaternion.w === this.cachedQuaternion.w &&
        this.scale.x === this.cachedScale.x &&
        this.scale.y === this.cachedScale.y &&
        this.scale.z === this.cachedScale.z
    )
}

Object.prototype.updateMatrixWorld = function () {

    if ( ! this.isSynchronized() ) this.updateMatrix();

    ....

}

@takahirox
Copy link
Collaborator Author

@mquander I'm thinking the same thing. Probably we should consider whether we really need force true for each .updateMatrixWorld() call.

@WestLangley
Copy link
Collaborator

@takahirox

It is great you are interested in this issue.

updateMatrixWorld() was written to be applied to a scene, and traverses the scene graph in an efficient manner. It is not the correct method to be used in the case of scene node.

For example,

object.parent.rotation.x += 1;
object.updateMatrixWorld();

This is a long-standing issue. See #6448 (comment).

It may be beneficial to have a look at the history of this topic.

@gkjohnson
Copy link
Collaborator

gkjohnson commented May 26, 2018

Hey @takahirox!

It's great to see that someone is working on this. I was thinking about looking into matrix update optimizations, as well! I'd be happy to help if you're looking for someone to contribute.

I'm sure you already have some of this in mind (and have a good portion of it in, already), but here's how I was thinking about the optimization:

Assumptions

  • If an objects localMatrix changes, then the worldMatrix needs updating.
  • If an objects worldMatrix changes, then all of the childrens (and childrens childrens) world matrices needs updating.
  • If an objects worldMatrix is being updated, then its parents worldMatrix needs to be made up to date.

Implementation

  • If an objects localMatrix needs updating, then the worldMatrix should be marked as needing an update.
  • If the worldMatrix is marked as needing an update, then mark all children as needing their world matrices updated.
  • If an objects worldMatrix is already marked as needing an update, then all of its children should already be marked.
  • If a call to "updateWorldMatrix" is made, then it should call its parent's "updateWorldMatrix" function to.
  • The "updateWorldMatrix" function doesn't need to run if the world matrix does not need an update.

It means that you should only have to call "updateWorldMatrix" on a child and know immediately whether or not it's up to date. This also means that the renderer will only have to call "updateWorldMatrix" on the meshes it wants to update rather than updating the full tree (or nodes that don't even need to be updated).

@Mugen87 I didn't know Babylon did that. Sounds like it would optimize for the case where you move something and then it moves back, but wouldn't you still have to check everything to see if it has needs to be updated? This approach would let you queue up only the things that need to be changed (rather than iterating over everything). I may be misunderstanding, though.

A couple other related thoughts:

  • Adding the ability to register for when something has had its transform changed or marked as changed would be nice (and would enable some nice new extensions to THREE)
  • A similar approach could be used to more optimally create and update bounding boxes on internal tree nodes, which would be nice to have, as well.

@pailhead
Copy link
Contributor

pailhead commented May 29, 2018

Could we maybe consider some kind of an API where we would have onBeforeUpdate and onAfterUpdate. I suggested this back when onBeforeRender got accepted.

I did not dive too deep into this, but it seems like it would make sense. Before would be a time where one sets myMesh.position, after would be where you have the matrices computed. These would be triggered when scene updates the graph.

Is there a pattern here that could be established?

@titansoftime
Copy link
Contributor

I've actually wrote something like this into my game engine but had to disable it as it was too buggy. I never really went back to debug it (huge pita).

So yea, SUPER happy to see this potentially being adapted into three.js!

@takahirox
Copy link
Collaborator Author

I tried a cache approach. Much less changes but less performance gain. Roughly the performance gain is half or more less from this PR. I think 10 times comparing positioin.xyz, quaternion.xyzw, scale.xyz isn't so fast.

updateMatrix: function () {

    if ( ! this.position.equals( this.positionCache ) ||
        ! this.quaternion.equals( this.quaternionCache ) ||
        ! this.scale.equals( this.scaleCache ) ) {

        this.matrix.compose( this.position, this.quaternion, this.scale );

        this.positionCache.copy( this.position );
        this.quaternionCache.copy( this.quaternion );
        this.scaleCache.copy( this.scale );

        this.matrixWorldNeedsUpdate = true;

    }

},

Which one do you prefer? I have two options in my mind. First is this PR because I wanna improve the performance of the hot-spot function as much as possible. Second one is the cache approach. If user want further optimization user can set object.matrixAutoUpdate = false; and call object.updateMatrix(); when necessary by themselves.

@Mugen87
Copy link
Collaborator

Mugen87 commented Jun 7, 2018

I vote for the cache approach.

@takahirox
Copy link
Collaborator Author

@gkjohnson

Thanks for sharing your idea. I wanna focus on the current behavior of .updateMatrixWorls() so far which is

to be applied to a scene, and traverses the scene graph in an efficient manner.

as @WestLangley explained.

I'd like to discuss in another thread for calculating world matrix of a node.

@takahirox
Copy link
Collaborator Author

I vote for the cache approach.

How about adopting the caching approach now because the change is very small, and we'll consider further optimization like this PR idea when users request further improvement?

Any thoughts? Or any better solutions?

/cc @mrdoob

@titansoftime
Copy link
Contributor

Out of curiosity, besides less changes, what is the advantage of the 2nd PR vs the 1st one?

@takahirox
Copy link
Collaborator Author

takahirox commented Jun 7, 2018

I ran the benchmark test again. I ran each test five times and took the average. (Of course we need more samples for more precise numbers.)

.updateMatrixWorld( true )

Simulating the case where only a few nodes near the top are updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.updateMatrixWorld(true);
});

Chrome

graph current [ops/sec] needsMatrixUpdate [ops/sec] cache [ops/sec]
depth=100, breadth=1 109580.8 134058.0 (1.22) 113460.8 (1.04)
depth=3, breadth=10 9609.8 11606.8 (1.21) 9476.2 (0.99)
depth=9, breadth=3 132.4 146.0 (1.10) 122.8 (0.93)

(num) indicates the performance gain/loss, where (needsMatrixUpdate or cache) / current.

FireFox Nightly

graph current [ops/sec] needsMatrixUpdate [ops/sec] cache [ops/sec]
depth=100, breadth=1 51301.8 70755.4 (1.38) 57574.6 (1.12)
depth=3, breadth=10 4778.0 6956.2 (1.46) 5234.8 (1.10)
depth=9, breadth=3 104.8 146.0 (1.39) 109.4 (1.04)

.updateMatrixWorld( undefined )

Simulating the case where no node is updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.updateMatrixWorld(true);
});

Chrome

graph current [ops/sec] needsMatrixUpdate [ops/sec] cache [ops/sec]
depth=100, breadth=1 109580.8 523200.2 (4.78) 322480.8 (2.94)
depth=3, breadth=10 9609.8 51019.6 (5.31) 27298.8 (2.84)
depth=9, breadth=3 132.4 434.6 (3.28) 238.2 (1.80)

FireFox Nightly

graph current [ops/sec] needsMatrixUpdate [ops/sec] cache [ops/sec]
depth=100, breadth=1 51301.8 349046.6 (6.80) 152019.8 (2.96)
depth=3, breadth=10 4778.0 48701.2 (10.19) 14671.8 (3.07)
depth=9, breadth=3 104.8 527.0 (5.03) 177.6 (1.69)

overhead

Simulating the case where all nodes are updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.traverse(function(obj) {
        obj.position.x += 0.1;
    });
    rootA.updateMatrixWorld(true);
});

The both two ideas have overhead on update. The .needsMatrixUpdate one calls callback when position/quaternion/scale is updated. The cache one copies position/quaternion/scale on .updateMatrix().

Chrome

graph current [ops/sec] needsMatrixUpdate [ops/sec] cache [ops/sec]
depth=100, breadth=1 88573.2 74439.8 (0.84) 85957.6 (0.97)
depth=3, breadth=10 7726.6 5821.4 (0.75) 7363.0 (0.95)
depth=9, breadth=3 93.0 66.0 (0.71) 90.2 (0.97)

FireFox Nightly

graph current [ops/sec] needsMatrixUpdate [ops/sec] cache [ops/sec]
depth=100, breadth=1 44721.6 40382.0 (0.90) 36318.2 (0.81)
depth=3, breadth=10 4344.8 3902.2 (0.90) 3445.4 (0.79)
depth=9, breadth=3 77.0 66.0 (0.86) 69.2 (0.90)

@takahirox
Copy link
Collaborator Author

takahirox commented Jun 7, 2018

@titansoftime One of the disadvantages of this PR against the cache approach is every update of .position/quaternion/scale calls callback. For example,

object.position.x = 0;
object.position.y = 0;
object.position.z = 0;
object.quaternion.x = 0;
object.quaternion.y = 0;
object.quaternion.z = 0;
object.quaternion.w = 0;
object.scale.x = 0;
object.scale.y = 0;
object.scale.z = 0;

this code invokes callback call 10 times though the second or later lines ideally don't need to because the first line sets object.needsMatrixUpdate = true;.

@Mugen87
Copy link
Collaborator

Mugen87 commented Jun 7, 2018

@takahirox Out of curiosity: Can you please explain how you determine the mentioned performance metric ([ops/sec])? A common approach for benchmarking would be useful for the further development process since the results between users and PRs are more comparable.

@takahirox
Copy link
Collaborator Author

I just ran test/benchmark/benchmarks.html.

@Mugen87
Copy link
Collaborator

Mugen87 commented Jun 7, 2018

Oh, okay 😊

@gkjohnson
Copy link
Collaborator

@takahirox This is great!

I think my vote is for adding the dirty flag and "change" callback on vectors if only because I think it opens the door to future optimizations and cleaner (external) development experience. I do see that the code is a bit more complicated, though. In my opinion it seems worth it. It seems like it would benefit all kinds of use cases with THREE. I've also had to implement this in the past and side-stepped a lot of the THREE.js matrix update logic was too heavy I don't think cache checking would have been quite enough for that case.

If you've moved a bunch of stuff in the scene, for example, and you want to know where one specific item is in world space, you'd still have to always climb the parents to ensure their world matrices have been updated before you can update the child you care about, which means a lot of matrix cache checks. It would be great if this could happen seamlessly with THREE rather than having to manually traverse the scene, which a dirty flag could quickly enable.

One of the disadvantages of this PR against the cache approach is every update of .position/quaternion/scale calls callback. For example,

That does seem like the big downside of this approach. The documentation could call out that using set(x,y,z) is better than individually setting fields when it comes to performance.

@mqp
Copy link
Contributor

mqp commented Jun 7, 2018

I'm surprised by the size of the performance hit in the Vector-change-callback case where all nodes are updated. Fundamentally, the extra work is just flipping one flag on update and checking it later. That doesn't seem like it should add up to 10%. I don't know enough about IonMonkey or V8 JIT to comment on whether another approach to the same thing (e.g. where the function call is eliminated, or where the flag is stored in the Vector3 instead of the Object3D) might be faster, though.

@takahirox
Copy link
Collaborator Author

takahirox commented Jun 8, 2018

I tried to add dirty flag to Vector3. Setting dirty flag should be faster than calling callback on update and also checking dirty flags should be faster than checking cache values. But the change is still as big as this PR.

// Vector3
set: function ( x, y, z ) {

    this._x = x;
    this._y = y;
    this._z = z;

    this.dirty = true;

    return this;

},

....

// Object3D
updateMatrix: function () {

    if ( this.position.dirty === true ||
        this.rotation.dirty === true ||
        this.scale.dirty === true ) {

        this.matrix.compose( this.position, this.quaternion, this.scale );

        this.position.dirty = false;
        this.rotation.dirty = false;
        this.scale.dirty = false;

        this.matrixWorldNeedsUpdate = true;

    }

},

I ran benchmark test and compared the results. As I expected, dirty flag is faster than the cache approach and less overhead than the callback and cache approach. I wrote only performance gain/loss (= new idea / current) the below.

.updateMatrixWorld( true )

Simulating the case where only a few nodes near the top are updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.updateMatrixWorld(true);
});

Chrome

graph needsMatrixUpdate cache dirty flag
depth=100, breadth=1 1.22 1.04 1.25
depth=3, breadth=10 1.21 0.99 1.14
depth=9, breadth=3 1.10 0.93 1.08

FireFox Nightly

graph needsMatrixUpdate cache dirty flag
depth=100, breadth=1 1.38 1.12 1.41
depth=3, breadth=10 1.46 1.10 1.46
depth=9, breadth=3 1.39 1.04 1.20

.updateMatrixWorld( undefined )

Simulating the case where no node is updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.updateMatrixWorld(true);
});

Chrome

graph needsMatrixUpdate cache dirty flag
depth=100, breadth=1 4.78 2.94 4.19
depth=3, breadth=10 5.31 2.84 3.84
depth=9, breadth=3 3.28 1.80 2.11

FireFox Nightly

graph needsMatrixUpdate cache dirty flag
depth=100, breadth=1 6.80 2.96 5.25
depth=3, breadth=10 10.19 3.07 5.72
depth=9, breadth=3 5.03 1.69 2.35

overhead

Simulating the case where all nodes are updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.traverse(function(obj) {
        obj.position.x += 0.1;
    });
    rootA.updateMatrixWorld(true);
});

Chrome

graph needsMatrixUpdate cache dirty flag
depth=100, breadth=1 0.84 0.97 1.02
depth=3, breadth=10 0.75 0.95 1.00
depth=9, breadth=3 0.71 0.97 1.04

FireFox Nightly

graph needsMatrixUpdate cache dirty flag
depth=100, breadth=1 0.90 0.81 0.96
depth=3, breadth=10 0.90 0.79 0.97
depth=9, breadth=3 0.86 0.90 0.99

@mrdoob
Copy link
Owner

mrdoob commented Dec 18, 2018

@takahirox that's good to know, thanks for sharing!

@fernandojsg
Copy link
Collaborator

SIMD.js is deprecated but it will be implemented on WASM though, but imo that should be a different problem to discuss than the current approach.

I like the idea of introducing the static=true as it's quite understandable for newcomers and it's a common practise in most engines. Even with this updated I'd still leave it just renaming it from matrixAutoUpdate, we could bypass flags or cache checks with it, it's add extra info about the object behaviour, it could be used by physics...

I also like the proposal from @Mugen87 on introducing cache/isSync similar to the babylon's approach, I think it has low impact on the code, it's easy to follow and understand and could help getting the work done and avoiding a lot of callback when an attribute is changed on the pos/sca/rot

@takahirox
Copy link
Collaborator Author

I think we should move it forward, too. I'll write and post the summary of my performance evaluation soon because the discussion here is a bit too long.

@takahirox
Copy link
Collaborator Author

takahirox commented Jan 31, 2019

I think we have three options now. Refer to #14138 (comment) for my benchmark test. Performance numbers below are of scene.updateMatrixWorld() unit test, not of entire app.


  1. Renaming to .static property

Renaming .matrixAutoUpdate to .static (.matrixAutoUpdate. = ! .static)

Pros:

  • User friendly name more than .matrixAutoUpdate
  • No performance penalty

Cons:

  • Situation doesn't change
    • User still needs to manually set flag
    • Unnecessarily updating matrix of dynamic object which doesn't move since last frame

  1. cache position/quaternion/scale ([WIP] Add position/quaternion/scale cache into Object3D #15706)
objectMatrix: function () {

    if ( ! this.positionCache.equals( this.position ) ||
        ! this.quaternionCache.equals( this.quaternion ) ||
        ! this.scaleCache.equals( this.scale ) ) {

        this.matrix.compose( this.position, this.quaternioin, this.scale );
        this.matrixWorldNeedsUpdate = true;
        this.positionCache.copy( this.position );
        this.quaternionCache.copy( this.quaternion );
        this.scaleCache.copy( this.scale );

    }

}

Pros:

  • Easy to understand, implement and maintain
  • Automatically skipping unnecessary matrix update
  • Good performance gain for static objects
    • 50-200 % gain in case all objects are static in a scene graph

Cons:

  • Performance penalty for dynamic objects
    • Up to 10 times number comparison per an object
    • 5 % (on Chrome), 20 % (FireFox) loss in case all objects are dynamic in a graph
    • No or small performance benefit even if most of objects are static, if an object near the top of graph is dynamic. (Probably 10 times comparison is as slow as update local matrix)
  • Performance gain is smaller than dirty flag approach

  1. Dirty flag ([WIP] Dirty Flag for updateMatrix optimization #14543)
objectMatrix: function () {

    if ( this.position.dirty || this.quaternion.dirty || this.scale.dirty ) {

        this.matrix.compose( this.position, this.quaternioin, this.scale );
        this.matrixWorldNeedsUpdate = true;

        this.position.dirty = false;
        this.quaternion.dirty = false;
        this.scale.dirty = false;

    }

}
// Vector3/Quaternion extension
function Vector3WithDirtyFlag( x, y, z ) {

    this._x = 0;
    this._y = 0;
    this._z = 0;

    Vector3.call( this, x, y, z );

    this.dirty = false;

}

Vector3WithDirtyFlag.prototype = Object.assign( Object.create( Vector3.prototype ), {

    constructor: Vector3WithDirtyFlag,
    isVector3WithDirtyFlag: true

} );

Object.defineProperties( Vector3WithDirtyFlag.prototype, {

    x: {

        get: function () { return this._x; },
        set: function ( value ) {

            this._x = value;
            this.dirty = true;

        }

    },

    ....

} );

function Object3D() {

    ....

    this.position = new Vector3WithDirtyFlag();

    ....

}

// or Object.defineProperties per vector3/quaternion instance.
// We don't need to make sub class

function Object3D() {

    ....

    this.position = new Vector3();
    this.position.dirty = false;

    Object.defineProperties( this.position, {

        x: {

            ....

        },

        ....

    }

    ....

}

Pros:

  • Automatically skipping unnecessary matrix update
  • Bigger performance gain for static objects than cache approach
    • 50-100% gain from cache approach in case all objects are static in a scene
  • Less performance penalty for dynamic objects than cache approach

Cons:

  • Kinda hard to maintain because of class extension and/or Object.defineProperties().
  • Performance penalty for dynamic objects (smaller than cache approach tho)
    • getter/setter is invoked when position/quaternion/scale properties are accessed

Appendix:

We can break Matrix.compose() into two parts

  1. updating matrix from quaternion and scale
  2. updating matrix from position

https://github.com/mrdoob/three.js/blob/dev/src/math/Matrix4.js#L741-L775

With cache or dirty flag, we can introduce partial local matrix update optimization. I need to test to know if it has performance benefit tho.

@takahirox
Copy link
Collaborator Author

takahirox commented Jan 31, 2019

My current thought...

  • Updating all objects' matrices in a scene each frame, even if they don't move since last frame, as default is very inefficient
  • Automatically skipping unnecessary matrices update is very useful to users so we should introduce cache or dirty flag approach
  • Probably performance gain with cache approach is still good tho it's smaller than the one with dirty flag
  • Probably some or most of objects in a scene are usually don't move so performance penalty of Cache and dirty flag approach practically looks small?
  • Going with cache approach so far because easy to maintain, and if we realize we need further optimization or performance penalty isn't small, we think of the other options; dirty flag, renaming to .static from matrixAutoUpdate, or reverting?

@takahirox
Copy link
Collaborator Author

I made position/quaternion/scale cache approach PR #15706. You can see how small the change is. And you can test the performance.

@takahirox
Copy link
Collaborator Author

takahirox commented Feb 6, 2019

I made performance test, where scene.updateMatrixWorld() can have performance impact, based on webgl_performance example to see not only unit test performance but also practical application performance.

https://raw.githack.com/takahirox/three.js/UpdateMatrixTest/examples/webgl_performance_matrices_update_cache.html

In the test a scene has num objects and they're randomly placed between +-distance / 2. (Maybe I should've removed Math.random() for the consistent performance comparison tho.) All the objects are dynamic if animation is true. Otherwise they're static.

FPS numbers gain/loss on my Windows chrome where num = 50000 and distance = 50000.

cache dirty flag
Performance gain (static) +14% +22%
Performance loss (dynamic) -10% -5%

FPS numbers gain/loss on my Windows FireFox Nightly where num = 25000 and distance = 50000. (FPS numbers are somehow kinda inconsistent a bit so hard to precisely say tho.)

cache dirty flag
Performance gain (static) +10% +26%
Performance loss (dynamic) -8% -4%

Off topics.

  • projectObject() (more precisely Frustum.intersectsObject()?) also seems CPU hot spot function. I'll look into it after we've done .updateMatrixWorld() optimization.
  • In the example, in dynamic objects case onChange of rotation also seems CPU hot spot function. Probably introducing lazy Euler update Replace bi-directional syncing with lazy Euler syncing #14910 with new API can solve.

@titansoftime
Copy link
Contributor

I implemented a variant of your dirty flag method into my build about a year ago or so. Couldn't be happier with the results.

I noticed a similar drag in projectObject, glad someone is looking into it =]

@takahirox
Copy link
Collaborator Author

What do you folks think of my comment #14138 (comment) especially

  • Going with cache approach so far because easy to maintain, and if we realize we need further optimization or performance penalty isn't small, we think of the other options; dirty flag, renaming to .static from matrixAutoUpdate, or reverting?

?

@takahirox
Copy link
Collaborator Author

@titansoftime What do you think of the cache approach? Is the performance gain not good enough or is performance penalty big in your case?

@takahirox takahirox changed the title [WIP] Introducing .needsMatrixUpdate [WIP] Matrices update optimization (Introducing .needsMatrixUpdate) Feb 6, 2019
@titansoftime
Copy link
Contributor

I honestly haven't tried it. Dirty flag worked better than I anticipated so I never looked back. I have at least 50 static and 10-20 dynamic objects on the screen at any given time, so other users may have different results.

@takahirox
Copy link
Collaborator Author

Thanks. I'd be happy if you try the cache approach #15706. I'd like to see how it looks in an actual application.

@titansoftime
Copy link
Contributor

I unfortunately cannot use the latest three build, since they killed JSONLoader (LegacyJSONLoader does not work for me as initBones is removed). Working on converting all my models to gltf, but there are some serious issues in the blender exporter. Anyway that's a whole different thing, sigh.

Do you have a build with the cache approach from pre... ehh I think r100 is where they killed the JSONLoader.

@takahirox
Copy link
Collaborator Author

takahirox commented Feb 6, 2019

I think you can locally build like this (there may be easier way).

$ git checkout r99
$ git remote add upstream https://github.com/mrdoob/three.js.git
$ git fetch upstream '+refs/pull/15706/head:refs/remotes/pr/15706'
$ git checkout -b r99withCache
$ git cherry-pick d535ef16fe49f5de20144c5906cffdddfce9461f
$ npm run build

@titansoftime
Copy link
Contributor

titansoftime commented Feb 6, 2019

I think you can locally build like this (there may be easier way

Ah, nice. Will try it out.

Note - Haven't forgotten this. My PC is still connected to my main t.v. for Resident Evil 2. Should beat that tonight and go back into dev mode this weekend =]

@takahirox
Copy link
Collaborator Author

takahirox commented Feb 19, 2019

As we discussed in #15706 (comment) #14543 (comment), both cache and dirty flag approach seem to have functionality problem,

b.updateMatrix();  // Clean
b.matrix.copy( a.matrix ); // manually update local matrix
b.updateMatrix();  // Local matrix should be updated from position/quaternion/scale
                   // but no update because position/quaternion/scale are clean

This is from our flexible API that user can directly access local matrix. So we also need to detect local matrix update. But I don't think we can do simply and performantly.

In long term, we might want to change the API, local matrix is read only or hidden from users and allowing users to update objects transform only via position/quaternion/scale. It may be clearer API.

But in short term, I don't think it's doable because it can break a lot of core/example/user code. So keep engaging .matrixAutoUpdate = false for static or rarely moving objects and manual .updateMatrix() call when their position/quaternion/scale are updated instead so far?

https://threejs.org/docs/#manual/en/introduction/Matrix-transformations

Using user defined matrices update function may be another option for users #15706 (comment)

@WestLangley
Copy link
Collaborator

So keep engaging .matrixAutoUpdate = false for static or rarely moving objects and manual .updateMatrix() call when their position/quaternion/scale are updated instead so far?

Correct.

@Mugen87
Copy link
Collaborator

Mugen87 commented Sep 17, 2019

As we discussed in #15706 (comment) #14543 (comment), both cache and dirty flag approach seem to have functionality problem,

Thus, keeping this PR open makes no sense since it can't be merged because of the current status of three.jss API. Let's reconsider matrix update optimizations iff Object3D.matrix becomes private.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants