We've had plenty of questions regarding the performance of the upcoming CULA Sparse package - hopefully the following performance plot will answer some of those questions!
Here, we have plotted the performance of CULA Sparse (beta-1) against the performance of another GPU library, CUSP (0.2), and an optimized CPU library, Intel MKL (10.3). As you can see, the GPU accelerated libraries perform over a magnitude faster than the CPU counterpart with CULA coming out about 10-20% faster than CUSP!
For this benchmark, we measured the throughput of the conjugate gradient (CG) iterative solver in GB/s such that the execution time is related to the size of the matrix. The CPU benchmarks were obtained using dual hex-core Intel Xeon X5560s (all 12 cores active) and the GPU benchmarks were obtained using an NVIDIA C2050. No preconditioners were used and all solvers converged within very similar iteration counts.
Stay tuned for more performance numbers and the upcoming CULA Sparse (beta-2) release!