Someone recently asked me how particles can be updated on the GPU. I’m no expert, but know the basic concept. Basically the graphics card (GPU) has become more and more powerful, to the point that it’s now a kind of super fast programmable parallel processor, which as can do calculations which can be used for things other than blending colors together. In this post I’m going to explain one way in which the GPU can be used to update some particles.
An example of a particle system would be if you opened a treasure chest in a game, and a bunch of 2D stars sprayed out like a fountain. Each star particle is a sprite with a position and velocity, and over time the velocity of the sprite is pulled down with gravity. You get the idea…
- Imagine you have your list of positions and velocities. Each frame we do "pos = pos+vel" and "vel = vel+gravity".
- Lets ignore creating and deleting particles for now, just to get the concept of running the particles on the GPU.
- Pos and vel are each vector3's - i.e. three floats - X, Y, Z. Gravity can be represented as a vector3 too, but with two values being 0.
- A texture is made up of pixels, each of which has an R, G, B and A. We can store a position or velocity in a pixel. R=X, G=Y, B=Z.
- We can now imagine a texture two pixels wide, by NUM_PARTICLES tall. In the left column we have the position, and in the right, the velocity. So it's a texture that's storing data rather than a picture. It would look like some random colored dots - nothing recognizable.
- OK, now we want to update our texture. We run a vertex shader which overwrites the left column. It does "read column one (which is position) and two (velocity) into two variables, add them together, and write the output back into column one." We've just done ‘pos=pos+vel’ for each particle.
- Then we run a different shader on the right hand (velocity) column. It does "read column two, add the gravity constant, and write the output back to the right hand column." This did the ‘vel=vel+gravity’ calculation.
- So we used a texture that contained data, and ran shaders that were just concerned with doing math on that data, rather than the color blending and such that we normally think of shaders doing. And the output of the process was an updated data texture - which we don't show on the screen.
- So we've worked out where all our particles are, and that data is in a texture. How do we render it?
- Lets say that we'd normally do it using a buffer of point sprites. Each vertex in the buffer represents one sprite, and has a position, color and size.
- Lets ignore the color and size for now.
- Imagine we want to render 256 particles. We have a buffer of 256 point sprites. But instead of putting world positions into the position values, lets put in the texture coordinates of the pixel we want to read. So the first particle has 0,0, the second has 0,1, then 0,2 etc. (values given in pixels here)
- This then goes into our shader, which instead of using the vertex position value as the world position, uses it as texture coordinates for reading from our data texture. The color value it reads from the texture is then used as the world position of the texture. (You might need to read this point a couple more times for it to sink in.)
- Extending the above, you could have the update look after a 'life' value for each particle. The life value could be sneaked into the fourth component of either the position or velocity pixels. Using shader 3.0 you could then reset the particle if an 'if' statement said that the particle was too old.
- You could animate the color and size using the life value to index a texture which contained the color/size values over time. E.g. if life ranged from 0 to 256, then you’d read from a 1x256 texture at the coordinates 0,life to read off a color value. The texture could contain a color value in the RGB, and a size value in the alpha.
- If you were a mental giant, you could encode your world collision data into a texture, and bounce the particles off the collision mesh - all on the GPU. It has been done, and I’ve lost sleep just trying to wrap my head around how… J
- Hey presto. We moved the particle update onto the GPU, which is fast and performs operations in parallel. A GPU can process several (e.g. 8) pixels at once, and also performs SIMD vector operations, so adding velocity to position adds all three components at once.
- The vertex buffer of point sprites that we use is set up once. The texture coordinate indices don't change from one frame to the next.
- It's technically quite challenging. We have to write three shaders, and debugging is trickier than just stepping through some C code.
- Doing general computation on GPU's is tricky. It's a logic puzzle all its own just trying to work out how to code with all the quirky restrictions.
- Reading data back from textures is super slow, so once the particles are updating on the GPU, you effectively can't 'see' them from the C code anymore. E.g. you couldn't, in C, say "if (particle.y<0)>
- The complexity of coding and limitations implied currently often make GPU particles (or other processing) an unattractive choice in games.
- The GPU is often fully maxed out just rendering the graphics for the game, and the CPU is actually not fully occupied. When this is the case, it is not a speed up to shift more work onto the GPU.
There’s a lot of ways to skin a cat, and above I've done the equivalent of skinning a cat using a mallet, but you hopefully get the idea. This kind of thing would be a cool little demo to write. You could make some funky looking particle effect, show it moving thousands of particles at once, and then bring up the CPU monitor and show that the CPU is pretty much idle. Sweet!
This area is called GPGPU processing – General Purpose Graphics Processor Unit processing. There are some articles in 'GPU Gems 2' about GPGPU which give a good grounding.
It’s cool stuff, but it’s for the brave and insane!