UPBGE: Improve frustum culling performance. #701
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Frustum culling was improved in two ways. First, simplificate the bounding
box update. Second the test are parallelized using TBB range loop.
The bounding box update is simplified by adding the call to RAS_Deformer::UpdateBuckets
in KX_GameObject::UpdateBounds after we checked that the objects as a
bounding box. By doing this we avoid a call to GetDeformer for all objects
without a bounding box, with an unmodified bounding box or without auto
update.
All the computation of culling objects is moved into KX_CullingHandler,
this class construct it's own objects list and returns it in Process function.
Process function build a CullTask and launch it usign tbb::parallal_reduce.
Each CullTask have an operator() to test a range of object, any objects passing
the culling test is added in a task local objects list. Once the tests
finished the CullTask merge these objects list in function join to end
up with the list of all non-culled objects.
This technique of reduce of list is way better than using a shared object
list for all tasks and lock a mutex before adding an object. The method
with mutex was always slower than without parallelization.
This patch was tested with cube meshes :
number of object | previous time | new time
1000 | 0.06 | 0.07
8000 | 1.04 | 0.55
27000 | 3.81 | 1.90
125000 | 16.16 | 8.31