Skip to content

An introduction to CUDA programming by way of a Boids Flocking simulation

Notifications You must be signed in to change notification settings

wufk/Project1-CUDA-Flocking

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
FENGKAI WU
Sep 11, 2017
29897f9 · Sep 11, 2017

History

32 Commits
Sep 1, 2016
Aug 28, 2017
Sep 11, 2017
Sep 1, 2016
Sep 11, 2017
Aug 28, 2017
Aug 29, 2017
Sep 1, 2016
Sep 1, 2017
Sep 11, 2017

Repository files navigation

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

  • Fengkai Wu
  • Tested on: Windows 10, i7-6700 @ 3.40GHz 16GB, Quadro K620 4095MB (Moore 100C Lab)

1. Final Results

test img

The simulation above is run under the following condition:

Number of Boids: 7500

Running Mode: unifor grid

Threads: (128, 1, 1)

Time step: 0.5

2. Analysis of Performance

Algorithms

Following illustrate the performance running under different algorthms and number of Boids. The x-axis is the number of boids in the system and the y-axis is the simulation time of each step of computation.

img_2

It is clearly shown that using brute force to solve the problem takes huge amount of time and the average time increases drastically. Since the brute force does not optimize at all and compute every single element, the number of threads increases so fast that it is very difficult to parallize. However if we use unform grid method, the particles we calculate at each step is only constrained in certain cells, thus decreasing the amount of threads for simulation, thus decreasing the average time.

As for the scattered mode and coherent mode, though the difference of these two is trivial, the coherent mode still outperforms. This is because by sacrificing memory to maintaining a new buffer array for position, we can save the time of looking up boid original positions and velocities.

Other factors

The output of the folloing table is runned using uniform grid.

img_3

From the table, we can see some obvious results. The max speed and dT has little to do with performance because they do not affect the simulation. A larger number of neighboring cells also increase the time because of more computation per step.

About

An introduction to CUDA programming by way of a Boids Flocking simulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • CMake 68.9%
  • Cuda 16.2%
  • C++ 14.1%
  • Other 0.8%