by John

We get many questions on our forums described as CULA issues but which are in fact misunderstandings of the LAPACK syntax. Admittedly, it's difficult for a newcomer to the library! When we set out to write CULA, we made a conscious decision to follow the LAPACK interface because it's so pervasive in numerical computing. I wanted to go over a couple of the more difficult points for new programmers.

Data is Column Major

C/C++ programmers are used to data stored in row major format, that is, that the items in a row are stored contiguously in memory. LAPACK, being a FORTRAN package, uses column major notation. For some, this will require reworking their code to translate it to column major or to transpose a matrix prior to calling the CULA routine. Any code that calls CUBLAS or CULA will have this restriction, and so should be written primarily for column-major data layouts.

LD* Paramaters

Each matrix parameter passed to a CULA/LAPACK routine is inevitably followed by an integer parameter called LDA, LDB, etc. This parameter signifies the physical size of the matrix, while parameters such as M and N describe the size of the data to be operated on. Such a way of specifying is useful for describing submatrices or in the case of padded allocations. This figure describes the relationship of these parameters when the valid data (blue) is a region of a physically larger data allocation (green).

Outputs are Shared with Inputs

Simple enough, in most cases, the data is operated on inplace. So the LU decomposition, which reduces a matrix, A, to two triangular matrices A=L*U, those two "output" matrices are stored quite economically in the same storage where matrix A was located on input. The bottom line is that if you want to preserve your data, you will often need to copy it to a new matrix prior to calling CULA routines.


Finding the correct routine in LAPACK can be a challenge. Take, for instance, inverting a matrix. The signature is simple enough: culaGetri(N,A,LDA,IPIV), and we have already covered the conventions governing the first three parameters. The challenge is in the finer points of the documents, where it is noted that the input matrix, A, is not an arbitrary matrix. Is is instead "On entry, the factors L and U from the factorization A = P*L*U as computed by GETRF." This is a way of saying that the input matrix A should be the result of first calling the routine GETRF (LU decomposition) on the original matrix.


CULA Talk at SC10, New Orleans

by Liana

We are looking forward to this year's Super Computing conference in New Orleans. We love the location and the food, but our excitement has to do with the opportunities we have lined up for CULA during that week.

Meetings and networking aside, we hope to meet more users face-to-face and hear about their experience with CULA.  We get a lot of input in our forums, but never enough.  GPU computing has impacted the performance of so many applications that we can't help but wonder which ones in particular are benefiting the most from our optimized routines.

We will be at booth number 1229, which is very close to the NVIDIA booth.

In addition to our plans around CULA, feel free to ask us about the other GPU library and non-GPU-related projects we're working on.

Last, we hope you will have time to join our presentation on CULA, scheduled for Tuesday, at 11:00 am. For details, just check out the Exhibitor Forums.