CULA Dense R14 and Sparse S2 – Now Supporting CUDA 4.1

by John

We're pleased to announce the release of our latest CULA Dense and Sparse versions, with full compatibility for CUDA 4.1. A major highlight of R14 is the inclusion of a preview of multi-GPU LAPACK routines, hereby called the pCULA branch of CULA Dense. Again, this is a preview designed to show potential performance as well as an interface which will likely continue to evolve over time. The new multi-GPU routines are:

pculaGetrf (LU decomposition)
pculaGetrs (LU solve)
pculaGesv (general system solve via LU)
pculaPotrf (Cholesky decomposition)
pculaPotrs (Cholesky solve)
pculaPosv (hermitian/symmetric postive-definite system solve)
pculaTrsm (BLAS triangular system solve)
pculaGemm (BLAS general matrix multiply)

An upcoming blog post will contain more on the usage and expectations of these routines, but a simple example is quite easy to create:


pculaConfig config;
// some users may wish to tweak the default options here
// the default is to use all CUDA devices and to allow the routine
// to select the parameters it feels is best

culaStatus status = pculaPotrf(&config, m, n, A, lda);

As always, in addition to new features are bug fixes and speed/stability improvements. The full release notes for both R14 and S2 are available at the dense downloads page and the sparse downloads page, respectively.


Introducing the CULA Sparse Demo

by Dan

We are very pleased to announce that we have recently released a free demo for CULA Sparse. This demo is manifested in a standalone, command line driven program with which you can choose your options and see the performance for a particular routine. All solvers and most features that are provided by CULA Sparse are supported.

For example, to run the demo with a cg solver and jacobi preconditioner, you can use the command below. The demo accepts matrices that are in the matrix market format (.mtx). For information on this format, see the resources provided by this NIST site.

iterativeBenchmark solver=cg preconditioner=jacobi A=myfile.mtx b=ones tolerance=1e-5

The CULA Sparse demo is powerful because it allows you to easily try our several different solvers, preconditioners, and other features without coding or building any software. And once you’ve found out the combination of inputs that is ideal for you, you can easily transition this knowledge into your CULA Sparse implementation.

Download the CULA Sparse demo today and see how our GPU-accelerated solvers can work for you.


CULA Sparse Available!

by John

After several months of valuable Beta testing, we are pleased to announce the release and immediate availability of CULA Sparse. Our first release contains 6 solvers, 3 preconditioners, and supports double-precision and double-precision complex in a variety of matrix formats. Performance of 10x or more versus a fully threaded CPU solution is now available in an easy to use package!

CULA Dense R13 is a simultaneous release, also available now, and features three new routines (potri, gesdd, and geqrfp) as well as explicit compatibility with CULA Sparse.

For current users, we have changed the name of CULA Premium to CULA Dense, and CULA Basic is now CULA Dense Free Edition.