CULA Dense R14 and Sparse S2 - Now Supporting CUDA 4.1

Postby john » Tue Jan 31, 2012 2:00 pm

We're pleased to announce the release of our latest CULA Dense and Sparse versions, with full compatibility for CUDA 4.1. A major highlight of R14 is the inclusion of a preview of multi-GPU LAPACK routines, hereby called the pCULA branch of CULA Dense. Again, this is a preview designed to show potential performance as well as an interface which will likely continue to evolve over time. The new multi-GPU routines are:
pculaGetrf (LU decomposition)
pculaGetrs (LU solve)
pculaGesv (general system solve via LU)
pculaPotrf (Cholesky decomposition)
pculaPotrs (Cholesky solve)
pculaPosv (hermitian/symmetric postive-definite system solve)
pculaTrsm (BLAS triangular system solve)
pculaGemm (BLAS general matrix multiply)

An upcoming blog post will contain more on the usage and expectations of these routines, but a simple example is quite easy to create:

pculaConfig config;
// some users may wish to tweak the default options here
// the default is to use all CUDA devices and to allow the routine
// to select the parameters it feels is best

culaStatus status = pculaPotrf(&config, m, n, A, lda);

As always, in addition to new features are bug fixes and speed/stability improvements. The full release notes for both R14 and S2 are available at the dense downloads page and the sparse downloads page, respectively.
