We study 5 kinds of matrix vector products:

 

  1. Row based access of the matrix
  2. Column based access of the matrix
  3. Row oriented vector operations
  4. Column oriented vector operations
  5. Direct matrix vector product through MATLAB’s call to BLAS

Using the above methods, we compute matrix vectors products with random matrices in four sizes: 101, 102, 103, 104, each size we run 10 trials and compute the average time spent. Based on a Intel i7 2.7GHz processor with 6MB L3 cache, the results are shown in Table 1 and the log-lgo graph is displayed in Figure 1. The code is attached in 4.1. The speed, from fastest to slowest, is:

 

Table 1. Time taken in Matlab matrix vector products (s)

    Size A B C  

D

 

E
101 0.0000 0.0000 0.0000 0.0004 0.0229
102 0.0002 0.0002 0.0002 0.0022 0.0002
103 0.0339 0.0173 0.0031 0.0331 0.0004
104 4.7069 1.6411 0.1472 1.7983 0.0421

 

Figure 1. Log-log graph time taken for various matrix sizes for matrix vector products

 

From Figure 1, we could clearly figure out the following conclusions:

  • Rank 1: Direct matrix vector product through Matlab call
    • Utilize native C code and row-oriented vector. The for loop inside the C code is optimized for better performace.
  • Rank 2: Row-oriented vector operations.
    • Almost the same as native Matlab. A bit slower then #1 since the for-loop is not fully optimized.
    • The row-oriented vector operation takes advantage of data stored column-by-column in the memory. Continuous memory can be loaded easier and faster.
  • Rank 3: Column based access of the matrix
    • It’s the same idea as rank 2. The additional for-loop slows it down.
  • Rank 4: Column oriented vector operations
    • Though there’s only one for-loop. The matrix is accessed by row, and the row is not continuous memory blocks. Pointer jumping is slow.
  • Rank 5: Row based access of the matrix
  • Same idea as Rank 4 with an additional slow for-loop.

Conclusion: Native Matlab calls C function is the fastest. Matrix is stored as column vectors in memory. Unnecessary for-loop is not optimized and slow.