New to the R10 Release is support for BLAS Level 3 routines. We added these routines to CULA so that you can use CULA as a stand-alone linear algebra package without requiring several other packages to provide a capable development system. Additionally, in many cases, we have implemented performance tweaks to get even more performance out of these functions than is available in CUBLAS.

Matrix Type Operation S C D Z
General Matrix-matrix multiply SGEMM CGEMM DGEMM ZGEMM
Matrix-vector multiply SGEMV CGEMV DGEMV ZGEMV
Triangular Triangular matrix-matrix multiply STRMM CTRMM DTRMM ZTRMM
Triangular matrix solve STRSM CTRSM DTRSM ZTRSM
Symmetric Symmetric matrix-matrix multiply SSYMM CSYMM DSYMM ZSYMM
Symmetric rank 2k update SSYR2K CSYR2K DSYR2K ZSYR2K
Symmetric rank k update SSYRK CSYRK DSYRK ZSYRK
Hermitian Hermitian matrix-matrix multiply CHEMM ZHEMM
Hermitian rank 2k update CHER2K ZHER2K
Hermitian rank k update CHERK ZHERK

Like our LAPACK routines, in addition to our Standard, Device, and Fortran interfaces, CULA's BLAS routines support the bridge interface for automatically switching between CPU and GPU execution.

