
This graph shows the relative speed up of many CULA functions when compared to the Intel's MKL 10.2 implementation of LAPACK.
Please note: Complex, Double, and Double Complex performance will be posted soon!

All benchmarks were obtained using:
CPU: Quad-core Intel Core i7 @ 2.8 GHZ CPU
GPU: NVIDIA Tesla C1060
OS: Windows 7 (64-bit)
CULA speed calculations include the data transfer time to and from the GPU.
MKL speed calculations were obtained with all cores and hyper-threading active.
