I have created a grouped list of estimated cost of instructions according to this Chinese reference.
This may not be accurate, but is mostly correct from my experience.
Some intuitions are:
- Abs, saturate are free (Why is clamp in GLSL not free? I doubt it)
- Log, exp, sqrt are almost free! (That’s why Kernel Foveated Rendering is fast)
- Sin, cos are super fast!
- smoothstep is more expensive than expected.
- I would suggest a cheap replace for Guassians:
float cubicPulse( float c, float w, float x )
x = fabs(x - c);
if( x>w ) return 0.0;
x /= w;
return 1.0 - x*x*(3.0-2.0*x);
Here is the full grouped list:
- Cost 0 (Almost free)
- abs(x), saturate (x)
- Cost 1
- floor(x), ceil(x), round(x), frac(x), exp2(x), dot(a, b), min(a, b), max(a, b), sin(x), cos(x), sincos(x), sqrt(x), rsqrt(x)
- Cost 1.5
- faceforward(n, i, ng)
- Cost 2
- clamp(a, b), exp(x), log(x), log10(x), cross(a, b), step(a, x), lerp(a, b, f), length(v), distance(a, b)
- Cost 2.5
- reflect(i, n)
- Cost 3
- any(x), pow(x, y), sign(x), normalize(v),
- Cost 4
- all(x), fmod(4), mul(m, pos), transpose(M)
- Greater or equal to 5
- 7: smoothstep(min, max, x)
- 10: acos
- 11: asin
- 16: atan
- 22: atan2
One of my remaining question is:
- How fast is texture sampling on modern GPU?
- One option is to measure by Nvidia Perf https://developer.nvidia.com/nvidia-shaderperf
- I guess it’s 20
Any further experiments and feedbacks are welcome.