Know how to write efficient shaders (Instruction Latencies)

devsh · Post by **devsh** » Tue Jan 17, 2017 10:23 am

Since Nvidia doesnt publish much about how exactly their architecture works...

http://lpgpu.org/wp/wp-content/uploads/ ... es2014.pdf

The only thing missing from that chart is the latency of RSQRT which is 25 cycles on Tesla, you have to extrapolate to Kepler and Maxwell

so a combined latency of normalize() would be (6+2)+2*6 for 3 MULS and two ADDs or at least 2*6+1 2FMADs and one MUL to get the squared length of the vector and then ~25 for the RQSRT and finally 6+2 for the 3 MULS with the rsqrt
coming together to at least 46 cycles for a 3 component normalize

Finally you know not only which is the most expensive op, but also how much more!

robmar · Post by **robmar** » Fri Jan 20, 2017 12:09 am

interesting... but not everyone uses NVidia, some use Intel Iris or AMD GPUs.

Any data on which ops to avoid with AMD?

Vectrotek · Post by **Vectrotek** » Fri Jan 20, 2017 5:09 pm

So this could lead to a way to set shader parameters without using the names (and its related overhead)?
I think BAW already has this sorted but I'm still digging through the code..

Irrlicht Engine

Know how to write efficient shaders (Instruction Latencies)

Know how to write efficient shaders (Instruction Latencies)

Re: Know how to write efficient shaders (Instruction Latenci

Re: Know how to write efficient shaders (Instruction Latenci