I just started to learn CUDA this summer (isn’t it too late? no, it is never to late to learn!)

The processing today gets faster because we have more transistors available for computation, thus learning parallel computing is essential for GPGPU usage.

Smaller and more efficient processors are used to increase the computing power.

A great site to learn CUDA at the Udacity: https://classroom.udacity.com/courses/cs344/

A simple CUDA program is something like this:

 

KERNEL <<< GRID OF BLOCKS, BLOCK OF THREADS,  >>> ( … )

dim3(x, y, z)

dim3(w, 1, 1) == dim3(w) == w

square <<5,256>> (…) == square << dim3(5,1,1), dim3(256,1,1) >>

for 128 * 128 image,