One of the best things about CULA is that it features many algorithms that are GPU-accelerated nowhere else. I wanted to take a moment to highlight our CUDA-Accelerated Singular Value Decomposition (SVD) algorithm. While others have implemented more special purpose SVD algorithms, no one has implemented a fully LAPACK conformant, GPU-Accelerated SVD that performs as well as ours does. Our SVD currently exhibits a performance that is 5 times as fast as CPU implementations. This is especially useful because SVD is a long running algorithm, taking up to 15 minutes in CPU codes for matrices of moderate size. Being able to shrink that time down to 3 minutes opens up new possibilities for codes that make heavy use of SVD.
This is a screenshot from a demo we gave at GTC 2009. The demo used SVD to embed an invisible, hard-to-detect digital watermark in a movie. Our CUDA accelerated SVD led to a factor of 5 increase in performance over a CPU-only solution.
Try it for now, for free, in CULA Basic.