Storage Informer
Storage Informer

Highlights and Challenges during Ghostbusters Development, Part 2

by on Jun.24, 2009, under Storage

Highlights and Challenges during Ghostbusters Development, Part 2

Game Loop Parallelization in the Infernal Engine

In the old days of single processor computers, your game loop would run every process for the game in single step, the results were 100% deterministic. Your game loop looked much like the following:

1. Run the tick code for every actor
2. Perform rigid body simulation
3. Process particle effects
4. Figure out what is visible
5. Render your set
6. Render your actors
7. Render your particle effects
8. Show the frame
9. Repeat

With the advent of multiprocessor computers, game programming has to be a lot more complicated in order to take full advantage of all of the processors in the system. Given a 3GHz Core2 Quad Extreme and fast enough video card, Ghostbusters will be able to keep all 4 cores 100% utilized when heavy action is occurring. We&aposll discuss how we accomplished this feat and what you can do for your own game engine.

When we started on the next generation systems four years ago, we took a good look at the PS3, 360, and PC platforms. The 360 had 3 general purpose cores, the PS3 had one general purpose core plus 6 coprocessors called SPUs, and the best PCs had two cores (we couldn&apost even imagine the Core i7 at this point). As a cross platform engine, we had to come up with a model for multiprocessing that would limit the amount of specialized coding for each system.

We used the PS3 "job" model as the basis for our multithreading model for all systems. The PS3 has one general purpose processor, which we used for our game loop, and for kicking of jobs that could run on the SPUs. Since the PC and 360 do not have SPUs, we created as many extra job threads as CPUs in the systems. Each job queue thread (whether running on the SPU on the PS3 or the PC) would sit in a suspended state, and be woken up only if there was a job ready to process. The job would then be processed, and it would check for another job to grab. If there was another job ready, it would start, otherwise the thread would go back to sleep. Jobs also need the ability to queue up more jobs. I&aposll talk about our job queue more in-depth in the future.

Our new parallel game loop looks like the following:

1. Lock our physics simulation
2. Update each actors position from physics simulation, queue up animation jobs, run tick code on each actor
3. Unlock the physics simulation
4. Kick off physics simulation
5. Process particle effects
6. Queue up visible objects in a display list
7. Kick off display list rendering job
8. Repeat

Note that when we queue up the display list, it contains the full state of the what needs to be rendered without relying on any game data. This requires copying data into the display list, such as the animation state of an actor, or instanced data for a particle effect. Actor states need to be able to change while we are rendering the previous frame&aposs data. If there were multiple rendering passes, the display list data could be reused for those passes rather than entering them multiple times.

The Infernal Engine also had the distinct advantage for actor simulation – each actor was physically simulated as rigid body or constrained system of bodies, so the collision and movement would happen inside the physics engine. To guarantee order of operations, especially for the AI, we still tick each actor in serial, but most of the actual work happens as jobs now.

Our physics engine, Velocity, was also rewritten to be massively parallel and run solely in the job queue. Before we parallelized it, it looked like the following:

1. Compute broad phase collision
2. Compute narrow phase collision one pair at a time
3. Divide up rigid bodies into islands
4. Solve islands one at a time

After converting Velocity to use jobs, it looked like this:

1. Compute broad phase collision (fast single threaded job)
2. Queue up jobs for each narrow phase collision (massively parallel)
3. Divide up rigid bodies into islands (fast single threaded job)
4. Queue up jobs for each physics island, or sub-island if we had too many bodies (massively parallel)

The results of having a massively parallel game engine were stunning. When we finally got rendering and simulation of the game in parallel in the last weeks of Ghostbusters, the game became solely render bound. Jobs were totally asynchronous, and we were able to fully utilize 3 to 4 cores. When there wasn&apost any action in the game, the game was waiting on the vertical blank. With a lot of action, the job model allowed the heavy lifting to be absorbed over as many processors as the system had.


:, , , , , , ,

Leave a Reply

Powered by WP Hashcash

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...