is a GPU-accelerated implementation of dense linear algebra routines

Providing a wide set of LAPACK and BLAS capability

CULA Dense provides accelerated implementations of the most popular and essential routines for dense linear algebra in a prepackaged library. If you are already using LAPACK or BLAS in your existing codes, you can even use the library to get acceleration with absolutely no changes to your source code.

LU factorization Cholesky factorization Matrix-matrix multiply
QR decomposition Orthogonal factorization Matrix-vector multiply
Least squares System solve Rank updates
Eigenvalue routines Matrix inversion Conjugate
Singular value decomposition Auxiliary routines Transpose

And offering your software superior performance

CULA Dense's performance is up to an order of magnitude faster than optimized CPU-based linear algebra solvers. Using CULA will allow your software to simply run faster.

While working with you, not against you.

Programmers can easily call CULA Dense from their C/C++, FORTRAN, MATLAB, or Python codes. CULA works with all GPUs supported by NVIDIA's CUDA and is built for all standard CUDA platforms so that you can be assured that your solution runs wherever you need it to.

More Information

» Review the programmer's guide
» See more performance charts
» See the full function list
» Learn about different interfaces
» Read the FAQ
» Downloads
» Visit the forums