Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 20 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Eric Chiu
* Tested on: Windows 10 Education, Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.60GHz 32GB, NVIDIA GeForce GTX 1070 (SIGLAB)

### (TODO: Your README)
## Result

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
![](./images/Boids.gif)

## Performance Analysis

![](./images/Boids-FPS-With-Visualization.png)

![](./images/Boids-FPS-Without-Visualization.png)

As we can see from the graphs, when the number of boids increases, the frame rate and performance generally decreases for all implementations: naive, scattered, and coherent. This is probably because it is more likely to have more boids within a neighborhood distance. This means that we have to iterate over more boids in order to calculate the next position of a single boid.

There was performance improvements from the scattered uniform grid to the coherent uniform grid, but only by a small percentage (15% to 20%). I expected the outcome to be at least twice as fast. After thinking about it a little more, I realized that cutting out the middleman does make accessing data faster, but there are still roughly the same number of operations needed to calculate the next position of a single boid. Because of this, it made sense that performance was only by improved by 15% to 20%.

![](./images/Block-FPS-Without-Visualization.png)

When the block size increases from 16 to 32, the frame rate and performance is improved for all implementations: naive, scattered, and coherent. After we increase the block size further to 64, 128, and so forth, the performance generally is barely affected. I suspect the reason for this is because the warp size is set to 32.

Changing cell width and checking 27 vs 8 neighboring cells decreased the performance for all implementations. I suspect the reason for this is because the more neighboring cells are checked, the more neighboring boids must be checked to see if it is within a boid's neighborhood distance.
Binary file added images/Block-FPS-Without-Visualization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Boids-FPS-With-Visualization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Boids-FPS-Without-Visualization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Boids.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_61
)
Loading