New to the R10 Release is support for BLAS Level 3 routines. We added these routines to CULA so that you can use CULA as a stand-alone linear algebra package without requiring several other packages to provide a capable development system. Additionally, in many cases, we have implemented performance tweaks to get even more performance out of these functions than is available in CUBLAS.
|Triangular||Triangular matrix-matrix multiply||STRMM||CTRMM||DTRMM||ZTRMM|
|Triangular matrix solve||STRSM||CTRSM||DTRSM||ZTRSM|
|Symmetric||Symmetric matrix-matrix multiply||SSYMM||CSYMM||DSYMM||ZSYMM|
|Symmetric rank 2k update||SSYR2K||CSYR2K||DSYR2K||ZSYR2K|
|Symmetric rank k update||SSYRK||CSYRK||DSYRK||ZSYRK|
|Hermitian||Hermitian matrix-matrix multiply||CHEMM||ZHEMM|
|Hermitian rank 2k update||CHER2K||ZHER2K|
|Hermitian rank k update||CHERK||ZHERK|
Like our LAPACK routines, in addition to our Standard, Device, and Fortran interfaces, CULA's BLAS routines support the bridge interface for automatically switching between CPU and GPU execution.