2Dec/10Off

New CULA Feature! Banded Solvers

by Kyle
Bande Matrix

A banded matrix only has non-zero values above and below the diagonal

In the upcoming CULA release, we are pleased to announce our first offering of GPU accelerated banded matrix solvers! As far as we know, these are the first GPU accelerated banded solvers publicly available. The new functions of interest are based upon the LAPACK functions xGBTRF and xPBTRF.  These two routines perform triangular factorization on general band matrices and positive definite matrices, respectively. Once factorized, these matrices can be easily solved by xTBTRS and xPBTRS.

Unlike the general matrix solvers, these banded matrix solvers scale with the bandwidth of the matrix and not the size of the matrix.  This scaling is a result of the BLAS based implementation which breaks the band into large square and triangular chunks to be worked on separately.  This segmentation causes the performance curve to look very similar to that of the general matrix solver, xGETRF.  You'll need a bandwidth of at least 700 before the GPU outperforms the CPU.  However, at large bandwidths over 5000, the GPU reaches speedups over 10x that of a CPU!

Since performance of these functions scale with bandwidth, we are calling these solvers the "large band solvers".  In the future, we plan on releasing other banded solvers that use different algorithms that scale on matrix size rather than bandwidth. These solvers will be known as the "thin band solvers" and will be available in a future CULA release.

Banded matrix are common in many fields of scientific computing that requires the solving of large coupled system such as computation fluid dynamics, optimizations, and structural engineering.  If you find any of these solvers useful, please leave us feedback and let us know!