CULA Dense R14 and Sparse S2 – Now Supporting CUDA 4.1

by John

We're pleased to announce the release of our latest CULA Dense and Sparse versions, with full compatibility for CUDA 4.1. A major highlight of R14 is the inclusion of a preview of multi-GPU LAPACK routines, hereby called the pCULA branch of CULA Dense. Again, this is a preview designed to show potential performance as well as an interface which will likely continue to evolve over time. The new multi-GPU routines are:

pculaGetrf (LU decomposition)
pculaGetrs (LU solve)
pculaGesv (general system solve via LU)
pculaPotrf (Cholesky decomposition)
pculaPotrs (Cholesky solve)
pculaPosv (hermitian/symmetric postive-definite system solve)
pculaTrsm (BLAS triangular system solve)
pculaGemm (BLAS general matrix multiply)

An upcoming blog post will contain more on the usage and expectations of these routines, but a simple example is quite easy to create:


pculaConfig config;
// some users may wish to tweak the default options here
// the default is to use all CUDA devices and to allow the routine
// to select the parameters it feels is best

culaStatus status = pculaPotrf(&config, m, n, A, lda);

As always, in addition to new features are bug fixes and speed/stability improvements. The full release notes for both R14 and S2 are available at the dense downloads page and the sparse downloads page, respectively.

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.