kinkreet wrote:and in a pixel shader can affect the use of 'else' ?
I found this on a steep parallax shader:
Code: Select all
Coord = (Coord + oldCoord)*0.5;
if( Height < 0.0 )
{
Coord = oldCoord;
}
Is this the optimal way?
Shaders make flow control (like if/else and loops) a bit trickier.
A GPU has a heap (thousands) of shader units. These are broken up into blocks called warps (nvidia term, can't remember what amd calls them). Each warp has a bunch of shader units, but only a single program counter. This means every shader in the warp must run the same code at the same time, their flow can't diverge from each other.
The single program counter had a very serious effect on shader performance. Let's say you have a shader with an if/else condition. If every pixel being rendered by the warp evaluated the condition as true, then only the true code is run. If every pixel is false, then only the false code is run. (So far the same as cpu code) But if some pixels are true and some are false, then EVERY pixel must run BOTH true and false code. It's impossible for some pixels to run the true code and some to run the false code, the single program counter can't point to multiple lines of code at once. So the true pixels need to wait for the false code to finish and the false pixels need to wait for the true code to finish. If that code is big, the if/else could halve the performance. Think of a checkerboard material where half the squares are a turbulence based marble effect and half are mirrored. If a warp tries to render both kinds of squares, then every pixel must fully calculate both marble and mirrored, while only keeping the result of one of them.
A few years ago, amd had a warp size of 32 shader units, nvidia had 256. (No idea what the current numbers are) This is why amd used to be much better at dynamic flow control than nvidia, smaller warp meant less chance of the pixels in it having divergent flow paths. While a larger warp allowed more static flow to happen in parallel (so amd good at flow control, nvidia good at brute force math).
Same thing with for loops. If you have a shader that looks up a texture to get the upper limit of a for loop, all pixels in the loop need to do the same number of iterations. If one pixel read 4 and another pixel read 10, then every pixel needs to do a loop up to 10 (but ignores the results of everything above it's number).
So how an if/else affects shader performance depends on what adjacent pixels (possibly hundreds of them) are doing.