R10 BLAS Support
New to the R10 Release is support for BLAS Level 3 routines. We added these routines to CULA so that you can use CULA as a stand-alone linear algebra package without requiring several other packages to provide a capable development system. Additionally, in many cases, we have implemented performance tweaks to get even more performance out of these functions than is available in CUBLAS.
| Matrix Type | Operation | S | C | D | Z |
| General | Matrix-matrix multiply | SGEMM | CGEMM | DGEMM | ZGEMM |
| Matrix-vector multiply | SGEMV | CGEMV | DGEMV | ZGEMV | |
| Triangular | Triangular matrix-matrix multiply | STRMM | CTRMM | DTRMM | ZTRMM |
| Triangular matrix solve | STRSM | CTRSM | DTRSM | ZTRSM | |
| Symmetric | Symmetric matrix-matrix multiply | SSYMM | CSYMM | DSYMM | ZSYMM |
| Symmetric rank 2k update | SSYR2K | CSYR2K | DSYR2K | ZSYR2K | |
| Symmetric rank k update | SSYRK | CSYRK | DSYRK | ZSYRK | |
| Hermitian | Hermitian matrix-matrix multiply | CHEMM | ZHEMM | ||
| Hermitian rank 2k update | CHER2K | ZHER2K | |||
| Hermitian rank k update | CHERK | ZHERK |
Like our LAPACK routines, in addition to our Standard, Device, and Fortran interfaces, CULA's BLAS routines support the bridge interface for automatically switching between CPU and GPU execution.
CULA R10 (CUDA 3.2) Released!
It is my pleasure to announce that CULA R10 is now available for download. As always, the downloads are available at http://www.culatools.com/get-cula/downloads/
This version is intended for use with CUDA 3.2. The highlights of the new version include:
- Banded matrix factorizations (http://www.culatools.com/blog/2010/12/02/banded-solvers/)
- New BLAS interface (details to follow)
- More performance! (http://www.culatools.com/blog/2010/09/10/cula-2-2-sneak-preview/)
Again, we have changed the way we denote new versions. Please see this post for more details: http://www.culatools.com/blog/2010/12/03/new-version-numbering-scheme/
New Version Numbering Scheme
With the upcoming new version of CULA, I would like to take a moment to announce that we will be changing the way we number our versions. CULA version numbers have gotten confusingly close to CUDA version numbers, and we often find ourselves writing sentences such as "CULA 2.2 is compatible with CUDA 3.1." Admittedly that's a mouthful, and so we will be from now on using a new naming scheme where each new release of CULA is an increasing integer number. As this coming release is our tenth (not counting beta/preview and small bug patches), it will be referred to as CULA R10. In general, we will also affix the CUDA version so it is quite obvious which CUDA version it is intended to work with. So the full title of the next version is to be CULA R10 (CUDA 3.2) and will contain the banded solvers and speedups that we have previously previewed under the name CULA 2.2.
