Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPBGE: Improve frustum culling performance. #701

Merged
merged 3 commits into from
Jun 15, 2018
Merged

UPBGE: Improve frustum culling performance. #701

merged 3 commits into from
Jun 15, 2018

Conversation

panzergame
Copy link
Contributor

Frustum culling was improved in two ways. First, simplificate the bounding
box update. Second the test are parallelized using TBB range loop.

The bounding box update is simplified by adding the call to RAS_Deformer::UpdateBuckets
in KX_GameObject::UpdateBounds after we checked that the objects as a
bounding box. By doing this we avoid a call to GetDeformer for all objects
without a bounding box, with an unmodified bounding box or without auto
update.

All the computation of culling objects is moved into KX_CullingHandler,
this class construct it's own objects list and returns it in Process function.
Process function build a CullTask and launch it usign tbb::parallal_reduce.

Each CullTask have an operator() to test a range of object, any objects passing
the culling test is added in a task local objects list. Once the tests
finished the CullTask merge these objects list in function join to end
up with the list of all non-culled objects.
This technique of reduce of list is way better than using a shared object
list for all tasks and lock a mutex before adding an object. The method
with mutex was always slower than without parallelization.

This patch was tested with cube meshes :
number of object | previous time | new time
1000 | 0.06 | 0.07
8000 | 1.04 | 0.55
27000 | 3.81 | 1.90
125000 | 16.16 | 8.31

@panzergame panzergame requested a review from lordloki May 28, 2018 14:36
@lordloki
Copy link
Member

lordloki commented Jun 1, 2018

Sorry, i'm trying to test this pull request but I'm not able to compile (tbb/tbb.h not found).
It is a weird error as tbb include as tbb lib are set correctly. I will try to find a solution

@panzergame
Copy link
Contributor Author

If your on linux:

TBB_INCLUDE_DIR /usr/include
TBB_LIBRARY /usr/lib/libtbb.so

The include dir is the parent directory of tbb/tbb.h

@lordloki
Copy link
Member

lordloki commented Jun 1, 2018

Ok, I think I know that what is it the issue.
In Windows TBB_INCLUDE_DIR and TBB_LIBRARY are only set if you configure CMAKE with "WITH_OPENVDB" set to ON. I will check it later and If it works I will upload a fix.

Copy link
Member

@lordloki lordloki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ok to me. Very good performance improvement.

panzergame and others added 3 commits June 13, 2018 20:40
Frustum culling was improved in two ways. First, simplificate the bounding
box update. Second the test are parallelized using TBB range loop.

The bounding box update is simplified by adding the call to RAS_Deformer::UpdateBuckets
in KX_GameObject::UpdateBounds after we checked that the objects as a
bounding box. By doing this we avoid a call to GetDeformer for all objects
without a bounding box, with an unmodified bounding box or without auto
update.

All the computation of culling objects is moved into KX_CullingHandler,
this class construct it's own objects list and returns it in Process function.
Process function build a CullTask and launch it usign tbb::parallal_reduce.

Each CullTask have an operator() to test a range of object, any objects passing
the culling test is added in a task local objects list. Once the tests
finished the CullTask merge these objects list in function join to end
up with the list of all non-culled objects.
This technique of reduce of list is way better than using a shared object
list for all tasks and lock a mutex before adding an object. The method
with mutex was always slower than without parallelization.

This patch was tested with cube meshes :
number of object | previous time | new time
1000 | 0.06 | 0.07
8000 | 1.04 | 0.55
27000 | 3.81 | 1.90
125000 | 16.16 | 8.31
@panzergame panzergame merged commit 20e9ff0 into master Jun 15, 2018
@panzergame panzergame deleted the ge_tbb_cull branch June 20, 2018 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants