R10 BLAS Support

by Dan

New to the R10 Release is support for BLAS Level 3 routines. We added these routines to CULA so that you can use CULA as a stand-alone linear algebra package without requiring several other packages to provide a capable development system. Additionally, in many cases, we have implemented performance tweaks to get even more performance out of these functions than is available in CUBLAS.

Matrix Type Operation S C D Z
General Matrix-matrix multiply SGEMM CGEMM DGEMM ZGEMM
Matrix-vector multiply SGEMV CGEMV DGEMV ZGEMV
Triangular Triangular matrix-matrix multiply STRMM CTRMM DTRMM ZTRMM
Triangular matrix solve STRSM CTRSM DTRSM ZTRSM
Symmetric Symmetric matrix-matrix multiply SSYMM CSYMM DSYMM ZSYMM
Symmetric rank 2k update SSYR2K CSYR2K DSYR2K ZSYR2K
Symmetric rank k update SSYRK CSYRK DSYRK ZSYRK
Hermitian Hermitian matrix-matrix multiply CHEMM ZHEMM
Hermitian rank 2k update CHER2K ZHER2K
Hermitian rank k update CHERK ZHERK

Like our LAPACK routines, in addition to our Standard, Device, and Fortran interfaces, CULA's BLAS routines support the bridge interface for automatically switching between CPU and GPU execution.


CULA R10 (CUDA 3.2) Released!

by John

It is my pleasure to announce that CULA R10 is now available for download. As always, the downloads are available at http://www.culatools.com/get-cula/downloads/

This version is intended for use with CUDA 3.2. The highlights of the new version include:

Again, we have changed the way we denote new versions. Please see this post for more details: http://www.culatools.com/blog/2010/12/03/new-version-numbering-scheme/


New Version Numbering Scheme

by John

With the upcoming new version of CULA, I would like to take a moment to announce that we will be changing the way we number our versions. CULA version numbers have gotten confusingly close to CUDA version numbers, and we often find ourselves writing sentences such as "CULA 2.2 is compatible with CUDA 3.1." Admittedly that's a mouthful, and so we will be from now on using a new naming scheme where each new release of CULA is an increasing integer number. As this coming release is our tenth (not counting beta/preview and small bug patches), it will be referred to as CULA R10. In general, we will also affix the CUDA version so it is quite obvious which CUDA version it is intended to work with. So the full title of the next version is to be CULA R10 (CUDA 3.2) and will contain the banded solvers and speedups that we have previously previewed under the name CULA 2.2.