Here are some random thoughts and summaries on deferred rendering pipeline.

Deferred shading is generally faster than forward shading

According to Unity3D,

As a general rule, Deferred Rendering is likely to be a better choice if our game runs on higher-end hardware and uses a lot of realtime lights, shadows and reflections. Forward Rendering is likely to be more suitable if our game runs on lower-end hardware and does not use these features.

In forward shading, with M lights and N objects, we use a O(MN) loop for rendering them:

There are three problems:

  1. Ineffective light culling
  2. Large memory footprint of all geometries, lights (shadow maps, environment maps), and textures, must be allocated, initialized, and accessed.
  3. Shading small triangles is inefficient
  4. Divergence in the fragment shader: we have to test if this fragment is illuminated by the current light in the O(MN) loop


However, in deferred shading, this is mostly fixed:


Usually, the geometry buffer is organized as follows,

so that 1000 lights can be achieved with deferred shading but not forward shading.

deferred shading example from W. Engel, “Light-Prepass Renderer Mark III” SIGGRAPH 2009 Talks


As for single texture sampling, the modern GPUs are really fast.

According to Xiaoxu Meng, randomly sampling the texture for 100 times is like nothing at a resolution of 1920×1080.

Counts of random texture sampling Time
1 0.25ms
10 0.28ms
100 0.45ms
1000 2.43ms
10000 20.15ms

For instance, I could do Gaussian blur with video input in 60 FPS at ShaderToy:

Therefore, I am more likely to believe that texture sampling is NOT the bottleneck of the modern rendering pipeline, but lighting is.

For instance, the Falcor engine has very complicated lighting, click this shader to have a sense: