I have created a grouped list of estimated cost of instructions according to this Chinese reference.

This may not be accurate, but is mostly correct from my experience.

Some intuitions are:

  • Abs, saturate are free (Why is clamp in GLSL not free? I doubt it)
  • Log, exp, sqrt are almost free! (That’s why Kernel Foveated Rendering is fast)
  • Sin, cos are super fast!
  • smoothstep is more expensive than expected. 
    • I would suggest a cheap replace for Guassians:

Here is the full grouped list:

  • Cost 0 (Almost free)
    • abs(x), saturate (x)
  • Cost 1
    • floor(x), ceil(x), round(x), frac(x), exp2(x), dot(a, b), min(a, b), max(a, b), sin(x), cos(x), sincos(x), sqrt(x), rsqrt(x)
  • Cost 1.5
    • faceforward(n, i, ng)
  • Cost 2
    • clamp(a, b), exp(x), log(x), log10(x), cross(a, b), step(a, x), lerp(a, b, f), length(v), distance(a, b)
  • Cost 2.5
    • reflect(i, n)
  • Cost 3
    • any(x), pow(x, y), sign(x), normalize(v), 
  • Cost 4
    • all(x), fmod(4), mul(m, pos), transpose(M)
  • Greater or equal to 5
    • 7: smoothstep(min, max, x)
    • 10: acos
    • 11: asin
    • 16: atan
    • 22: atan2


One of my remaining question is: 

  • How fast is texture sampling on modern GPU?
  • One option is to measure by Nvidia Perf https://developer.nvidia.com/nvidia-shaderperf 
  • I guess it’s 20

Any further experiments and feedbacks are welcome.