Skip to content

sarahlc888/cellular-automata-on-rpi-gpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cellular Automata on the Raspberry Pi GPU

Team members

Avi Udash and Sarah Chen

Project description

Our project runs cellular automata simulations for Conway's Game of Life and WireWorld on the Raspberry Pi's CPU and GPU. The automata are sent to an external display via HDMI, and a hardware user interface allows users to select between types of cellular automata and various preset states (using a short button press to toggle options and a long press to select) as well as to start and stop the simulation. Users can also design a custom start state using two etch-a-sketch style knobs and four buttons that determine the current draw color.

make run begins the user interface described above, and we also include a variety of targets for make test that test specific modules.

We sped up the Game of Life by running screen refreshes on the GPU (and the QPU specifically). To do so, we wrote a QPU assembly program (src/qasm/life_driver.qasm) that handles individual screen updates by running a sliding window of size 16 over the grid. Since the QPU is a 16-way SIMD processor, this choice of vectorization was a natural step. For each refresh, we sent the assembly code to the GPU and waited for the it to finish updating the framebuffer before resuming the program. For more information about the GPU, see gpu_guide.md.

The GPU is ~5.95 (stdev = 0.39) times faster than the CPU at all resolutions from 128x128 up to 1024x1024 (see graph).

CPU vs. GPU performance graph

Our vectorization process can be adapted to WireWorld (or any other cellular automata) by updating the QPU assembly script to match the automata's rules.

Components

  • Baseline cellular automata simulation library
    • Run Game of Life and WireWorld directly in the double-buffered framebuffer
    • FAT File System support for reading/writing preset states to the SD card
    • RLE-pattern parser, so patterns from websites like "https://conwaylife.com/wiki" can easily be added.
  • Hardware control board
    • Buttons and interrupts
    • Potentiometers and SPI with MCP3008 for analog-to-digital conversion
  • Etch-a-sketch functionality to create custom preset
    • 2 potentiometers that control X and Y coordinates (with exponential smoothing implemented)
    • Switch which color (state of CA) to draw with, using 4 buttons
  • Performance optimization
    • CPU: Enabled cache, compiled with -O3, fined tuned algorithm using profiler (e.g. comparing modulo for wrapping vs. ternary conditions vs. simple if statements), unrolled update function loop
    • GPU: Vectorized Game of Life on the GPU

Member contribution

  • Avi: hardware controls, etch-a-sketch functionality, menu, and additional CA features
  • Sarah: baseline CA simulation library and performance optimizations

References

Cellular automata library

  • Directly used the system.h and randomHardware.h modules from $CS107E/src
  • Our WireWorld implementation referenced the WolframAlpha description of the cellular automaton (including the embedded GIF of OR, XOR, and AND gates).
  • We used the FAT Filesystem documentation, CS107E guide to FatFS, the CS107E ff.h module, and the example project at $CS107E/examples/sd_fatfs to write code to read and write presets.
  • We used code from Sarah's profiler extension of assignment 7 to evaluate cellular automata performance and guide algorithm adjustments
  • Presets
  • Used the provided SPI library from the $CS107E/src
    • Wrote supplemental documentation for using the SPI library with information and page references from the Broadcom Manual
  • Wrote a module to easily use the MCP3008 using the datasheet
  • Exponential smoothing implemented with information from the wikipedia

GPU

  • We built directly upon a past CS 107E GPU project, "Bare Metal C QPU Library for the Raspberry Pi" by ahconkey and JoshFrancisCodes (GitHub)
    • Before being given access to this code, we did independent research that covered many of their project's references (especially the sources whose code they leaned most heavily on).
    • Their code was an invaluable starting point because it ran on baremetal and interfaced with the CS 107E mailbox. It was particularly helpful in demonstrating:
      • How to submit a program to the GPU by directly writing to GPU registers
      • How to allocate/free/lock memory for the GPU using mailbox functions
      • How to use #include to load assembled QPU programs (and in guiding us to use the vc4asm assembler).
    • We use their mailbox_functions and qpu modules in their entirety with only small adjustments. However, we will note that their code did not compile in its given form. The mailbox_functions module also contained a nasty bug that required hours to remove. Almost all work hours involving the GPU centered around getting a simple poke program to consistently work while building up from this code.
  • Broadcom VideoCore® IV 3D Architecture Reference Guide
    • QPU Register Address Map on p. 37-38
    • Section 7: VPM and VCD (particularly the tables for QPU Registers for VPM and VCD Functions)
  • SIMD processing of AES on the Raspberry Pi’s GPU
    • P. 13-17 provide a valuable overview of the GPU, which makes reading the Broadcom Manual much more accessible
  • Our vectorized sliding window approach was inspired by Conway’s Game of Life in R: Or On the Importance of Vectorizing Your R Code, which built 8 matrices in order to vectorize neighbor calculations and handled edge-cases with zero-padding.
  • Hacking The GPU For Fun And Profit
    • Our QPU assembly code for the Game of Life was heavily based off of the accompanying GitHub repository, especially the helloworld and SHA-256 examples.
  • QPU Demo: DMA Transfers breaks down DMA transfers with useful examples. This was helpful in determining how to load/store vector data from main memory to the GPU VPM (in combination with the reference directly above).
  • We consulted code for the existing QPU libraries GPU_FFT, QPULib, and pi-gemm. The demo gpu-deadbeef demonstrates how to write to GPU registers to write from QPU registers into the VPM and then from the VPM into main memory.
  • General background/inspiration

Self-evaluation

Our team fully executed the plan outlined in our proposal, including the GPU optimization component, and we successfully gained experience with a broad spread of hardware and software elements (e.g. potentiometers/SPI, file system, cellular automata, GPU).

We had a series of breakthrough moments with the GPU-related aspects of the project: being able to send a simple peek/poke program through the mailbox, understanding the big picture of what was happening, getting the VPM load/store to work, and finally getting a vectorized version of the Game of Life to work for the first time (after hours of incremental coding, testing, and debugging).

We're most proud of the effort we put in to the project and the intense learning that happened along the way. Coming into this, we knew practically nothing about GPUs (other than the fact that they would somehow let us run our simulations faster), so the project was a true avalanche of learning in that domain. The process was intimidating but rewarding, especially because of how resources around the GPU (e.g. the manual and libraries like GPU_FFT) were somewhat inaccessible to begin with. We learned a lot about debugging in particular: how to wrangle a program that seems quite opaque, take out pieces that seemed to be causing issues, and incrementally add features back in until reaching a final product.

Photos

The breadboard and hardware controls:

Hardware components

The main menu and Game of Life submenu:

Menu start screen

Menu second screen to select preset

See the video demos in img/ for real time simulations:

  • life_random.mov: random preset for Game of Life on 1024 by 1024 screen (running on GPU)
  • ww_multiplication.mov: Golly preset for WireWorld that
  • ww_custom.mov: drawing a custom etch-a-sketch start state with a single wire loop

About

Cellular Automata Simulations on the Raspberry Pi GPU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •