Since Nvidia doesnt publish much about how exactly their architecture works...
http://lpgpu.org/wp/wp-content/uploads/ ... es2014.pdf
The only thing missing from that chart is the latency of RSQRT which is 25 cycles on Tesla, you have to extrapolate to Kepler and Maxwell
so a combined latency of normalize() would be (6+2)+2*6 for 3 MULS and two ADDs or at least 2*6+1 2FMADs and one MUL to get the squared length of the vector and then ~25 for the RQSRT and finally 6+2 for the 3 MULS with the rsqrt
coming together to at least 46 cycles for a 3 component normalize
Finally you know not only which is the most expensive op, but also how much more!
Know how to write efficient shaders (Instruction Latencies)
Re: Know how to write efficient shaders (Instruction Latenci
interesting... but not everyone uses NVidia, some use Intel Iris or AMD GPUs.
Any data on which ops to avoid with AMD?
Any data on which ops to avoid with AMD?
Re: Know how to write efficient shaders (Instruction Latenci
So this could lead to a way to set shader parameters without using the names (and its related overhead)?
I think BAW already has this sorted but I'm still digging through the code..
I think BAW already has this sorted but I'm still digging through the code..