We get many questions on our forums described as CULA issues but which are in fact misunderstandings of the LAPACK syntax. Admittedly, it's difficult for a newcomer to the library! When we set out to write CULA, we made a conscious decision to follow the LAPACK interface because it's so pervasive in numerical computing. I wanted to go over a couple of the more difficult points for new programmers.
Data is Column Major
C/C++ programmers are used to data stored in row major format, that is, that the items in a row are stored contiguously in memory. LAPACK, being a FORTRAN package, uses column major notation. For some, this will require reworking their code to translate it to column major or to transpose a matrix prior to calling the CULA routine. Any code that calls CUBLAS or CULA will have this restriction, and so should be written primarily for column-major data layouts.
Each matrix parameter passed to a CULA/LAPACK routine is inevitably followed by an integer parameter called LDA, LDB, etc. This parameter signifies the physical size of the matrix, while parameters such as M and N describe the size of the data to be operated on. Such a way of specifying is useful for describing submatrices or in the case of padded allocations. This figure describes the relationship of these parameters when the valid data (blue) is a region of a physically larger data allocation (green).
Simple enough, in most cases, the data is operated on inplace. So the LU decomposition, which reduces a matrix, A, to two triangular matrices A=L*U, those two "output" matrices are stored quite economically in the same storage where matrix A was located on input. The bottom line is that if you want to preserve your data, you will often need to copy it to a new matrix prior to calling CULA routines.
Finding the correct routine in LAPACK can be a challenge. Take, for instance, inverting a matrix. The signature is simple enough: culaGetri(N,A,LDA,IPIV), and we have already covered the conventions governing the first three parameters. The challenge is in the finer points of the documents, where it is noted that the input matrix, A, is not an arbitrary matrix. Is is instead "On entry, the factors L and U from the factorization A = P*L*U as computed by GETRF." This is a way of saying that the input matrix A should be the result of first calling the routine GETRF (LU decomposition) on the original matrix.