basic

This graph shows the relative speed up of many CULA functions when compared to the Intel's MKL 10.2 implementation of LAPACK.

Please note: Complex, Double, and Double Complex performance will be posted soon!

benchmark

All benchmarks were obtained using:

CPU: Quad-core Intel Core i7 @ 2.8 GHZ CPU
GPU: NVIDIA Tesla C1060
OS: Windows 7 (64-bit)

CULA speed calculations include the data transfer time to and from the GPU.  
MKL speed calculations were obtained with all cores and hyper-threading active.