21Jul/10Off

## Using CULA in MATLAB, Part 2

by Kyle

In part one of this three part series, we introduced a method using C++ templates to support all four major MATLAB data types. In today's part, we'll detail the specifics of how to integrate CULA's SVD algorithm into MATLAB. Finally, part three will give some tips on error checking, compilation, linking, usage, and benchmarking.

To recap, we left off from part one with a working code base that will enter a templated function dependent on the MATLAB data type.  From here, we'd like to write one templated function that can support single, double, complex single, and complex double datatypes. However, because CULA and MATLAB store complex data differently, we are going to have to create a number of helper functions to convert between the two types.

Internally, MATLAB stores it's complex data using a structure of arrays (SoA) format where the real data and complex data are stored in separate memory locations. In CULA, or any other LAPACK implementation, complex data is stored using an array of structures (AoS) format where the real and complex data are interleaved within a single large array. Therefore, in order to support complex data, we'll need to create two helper functions that convert from AoS to SoA and vice-versa. This unfortunately introduces the need for twice as much allocation when dealing with complex data.  However, MATLAB itself uses LAPACK under the hood and natively suffers from the same plight.

The following code sample illustrates, using C++ templates, how to perform this conversion. I'd recommend putting this function in a header such as 'culamex.hpp' as it will be needed for every function where you wish to support complex data.

```// culamex.hpp
// Common helper functions for integrating CULA into MATLAB

#ifndef __CULAMEX_HPP__
#define __CULAMEX_HPP__

// Used to find real type associated with complex type
template<class T> struct ToReal { typedef T type; };
template<>        struct ToReal<culaFloatComplex> { typedef float type; };
template<>        struct ToReal<culaDoubleComplex> { typedef double type; };

// Convert from MATLAB complex format to CULA complex format
template <typename FloatType>
void MatToCula(FloatType* buf, const mxArray* src)
{
typedef typename ToReal<FloatType>::type RealType;

const RealType* r0 = (const RealType*) mxGetPr(src);
const RealType* i0 = (const RealType*) mxGetPi(src);

int M = (int)mxGetM(src);
int N = (int)mxGetN(src);

for(int j = 0; j < N; ++j)
{
for(int i = 0; i < M; ++i)
{
int p  = j*M+i;
buf[p].x = r0[p];
buf[p].y = i0[p];
}
}
}

// Convert from CULA complex format to MATLAB complex format
template <typename FloatType>
void CulaToMat(mxArray* src, FloatType* buf)
{
typedef typename ToReal<FloatType>::type RealType;

RealType* r0 = (RealType*) mxGetPr(src);
RealType* i0 = (RealType*) mxGetPi(src);

int M = mxGetM(src);
int N = mxGetN(src);

for(int j = 0; j < N; ++j)
{
for(int i = 0; i < M; ++i)
{
int p = j*M+i;
r0[p] = buf[p].x;
i0[p] = buf[p].y;
}
}
}

// Do nothing for non-complex cases
template <> void MatToCula(float* buf, const mxArray* src) {}
template <> void MatToCula(double* buf, const mxArray* src) {}
template <> void CulaToMat(mxArray* src, float* buf) {}
template <> void CulaToMat(mxArray* src, double* buf) {}

#endif //__CULAMEX_HPP__
```

Now, that we have our generic helper functions, we can start writing some code that acts just like MATLAB's svd function.  Upon examining the documentation for MATLAB's svd and CULA's xGESVD function, you'll notice two major differences.

1. MATLAB returns the singular values in diagonal array whereas CULA returns a vector
2. MATLAB return V whereas CULA return V' (transposed)

In order to preserve MATLAB's interface, we'll have to do some extra house keeping to convert from the singular value vector into to a diagonal matrix and also perform a transpose (or for complex data, conjugate transpose) on VT.

```// culaSvd.cpp
// Implements CULA accelerated SVD to be called within MATLAB

#include <algorithm>

#include "mex.h"
#include "culapack.hpp"
#include "culamex.hpp"

using std::max;
using std::min;

// Complex conjugation of complex data
template<class T> void Conjugate(T* a) { /* Do nothing */ };
template<> void Conjugate(culaFloatComplex* a) { a->y = -(a->y); }
template<> void Conjugate(culaDoubleComplex* a) { a->y = -(a->y); }

template <typename T>
void mexCulaGesvd(int nlhs,           /* number of expected outputs */
mxArray* plhs[],        /* output pointer array */
int nrhs,               /* number of inputs */
const mxArray* prhs[],  /* input pointer array */
mxClassID id,
mxComplexity complexity)
{
// Initialize flags and types
typedef typename ToReal<T>::type RealType;
bool isReal = (complexity == mxREAL);
bool isComplex = (complexity == mxCOMPLEX);

// Initialize sizes
int M = (int) mxGetM(prhs[0]);
int N = (int) mxGetN(prhs[0]);
int K = min(M,N);
int L = max(M,N);

// Allocate a temporary to not destroy input data
T* A = (T*) mxMalloc( M * N * sizeof(T) );

if (isReal)
{
// Copy input data directly into temporary
memcpy( A, mxGetPr( prhs[0] ), M * N * sizeof(T) );
}
else if (isComplex)
{
// If complex, convert from MATLAB format into CULA format
MatToCula( A, prhs[0] );
}

// Create MATLAB output matrices
plhs[0] = mxCreateNumericMatrix(M, M, id, complexity);  // U (M x M)
plhs[1] = mxCreateNumericMatrix(M, N, id, 0);           // S (M x N, Real)
plhs[2] = mxCreateNumericMatrix(N, N, id, complexity);  // V (N x N)

// Allocate CULA intermediate
RealType* SVEC = (RealType*) mxMalloc( K * sizeof(RealType) );

// CULA Memory Pointers
T* U;
T* VT;

if (isReal)
{
// Get CULA memory pointers from allocated MATLAB matrices
U = (T*) mxGetPr( plhs[0] );
VT = (T*) mxGetPr( plhs[2] );
}
else if (isComplex)
{
// If complex, allocate an AoS complex buffer for CULA
U = (T*) mxMalloc( M * M * sizeof(T) );
VT = (T*) mxMalloc( N * N * sizeof(T) );
}

// Initialize CULA
culaInitialize();

// CULA SVD Factorization
culaGesvd('A', 'A', M, N, A, M, SVEC, U, M, VT, N);

// Shutdown CULA
culaShutdown();

// Get pointer to output matrix, S
RealType* S = (RealType*) mxGetPr( plhs[1] );

// Copy SVEC to diagonal of S
for (int i=0; i<K; i++)
S[i*M+i] = SVEC[i];

// Inplace transpose of VT
for (int i=0; i<N; i++)
{
for (int j=i; j<N; j++)
{
T temp = VT[j+i*N];
VT[j+i*N] = VT[i+j*N];
VT[i+j*N] = temp;
}
}

// If complex, conjugate VT
if (isComplex)
for (int i=0; i<N; i++)
for (int j=0; j<N; j++)
Conjugate( &VT[j+i*N] );

if (isComplex)
{
// If complex, convert from CULA format into MATLAB format
CulaToMat( plhs[0], U );
CulaToMat( plhs[2], VT );

// Free MATLAB buffers
mxFree(U);
mxFree(VT);
}

// Free allocate data
mxFree(A);
mxFree(SVEC);
}

// MATLAB Gateway Function
void mexFunction(int nlhs,              /* number of expected outputs */
mxArray* plhs[],       /* output pointer array */
int nrhs,              /* number of inputs */
const mxArray* prhs[]  /* input pointer array */ )
{
// See Part 1
// mexCulaGesvd(...)
}
```

When combined with the gateway function introduced in part one, you should now have enough code to compile a working CULA accelerated, MATLAB callable SVD routine that support all four major datatypes. In the next part of this series, we'll discuss how to error check, compile, and use this routine in MATLAB.

While I've tried to keep the comments in the code quite verbose, if you have any questions regarding this code please ask in the comments section or in our user forums.

CULA Programmers Guide: http://www.culatools.com/html_guide/
MATLAB MEX-file Guide: http://www.mathworks.com/support/tech-notes/1600/1605.html
C++ Templates: http://en.wikipedia.org/wiki/Template_(programming)

14Jul/10Off

## Using CULA in MATLAB, Part 1

by Kyle

It is possible to use CULA to accelerate a large number of popular linear algebra routines in MATLAB. With some relatively simple interface code, it's easy to seamlessly accelerate popular MATLAB routines such as: qr, lu, svd, eig, and mldivide.

When writing your own MATLAB code that uses an external library such as CULA, there are a few common hurdles to overcome. In the next few blog posts, we'll introduce these problems and discuss how to solve them. By the end of the series, we'll have enough code to implement a CULA accelerated Singular Value Decomposition (SVD), culasvd, that matches the functionality of MATLAB's svd routine.

Part 1 of this series will explain how to handle multiple CULA data types in MATLAB. Part 2 will detail some tips and tricks on interface wrapping. Finally, Part 3 will show how to compile, link, and benchmark CULA within MATLAB.

Excluding sparse, there are four main data types supported by MATLAB and CULA: single, double, single complex, and double complex. When writing a CULA accelerated MATLAB routine, your function should be able to support all of these types invisibly to the end user. Since both the MATLAB MEX compiler and CULA support C++ templates, we can easily implement all four datatypes through a single code path.

The following code snippet shows how to use C++ templates to handle all four MATLAB types. We use the MATLAB functions, mxGetClassID() and mxIsComplex(), to determine the precision and complexity of the input matrix and then switch into an explicitly typed CULA wrapper.

```// culaSvd.cpp
// Implements CULA accelerated SVD to be called within MATLAB

#include "mex.h"
#include "culapack.hpp"

// Templated wrapper
template <typename T>
void mexCulaGesvd(int nlhs,           /* number of expected outputs */
mxArray* plhs[],        /* output pointer array */
int nrhs,               /* number of inputs */
const mxArray* prhs[],  /* input pointer array */
mxClassID id,
mxComplexity complexity)
{
// Function core, details in "Using CULA in MATLAB, Part 2"
// culaGesvd(...)
}

// MATLAB Gateway Function
void mexFunction(int nlhs,              /* number of expected outputs */
mxArray* plhs[],       /* output pointer array */
int nrhs,              /* number of inputs */
const mxArray* prhs[]  /* input pointer array */ )
{
// We only support a full SVD in this example
if(nrhs != 1)
mexErrMsgTxt("culasvd: Must have 1 input argument [X]");
if(nlhs != 3)
mexErrMsgTxt("culasvd: Must have 3 output arguments [U,S,V]");

// Get precision (single or double)
mxClassID classID = mxGetClassID(prhs[0]);

// Get complexity (real or complex)
mxComplexity complexity = mxIsComplex(prhs[0]) ? mxCOMPLEX : mxREAL;

// Switch based on data type
if ( classID == mxSINGLE_CLASS && complexity == mxREAL)
mexCulaGesvd<culaFloat>(nlhs, plhs, nrhs, prhs, classID, complexity);
else if (classID == mxDOUBLE_CLASS && complexity == mxREAL )
mexCulaGesvd<culaDouble>(nlhs, plhs, nrhs, prhs, classID, complexity);
else if ( classID == mxSINGLE_CLASS && complexity == mxCOMPLEX )
mexCulaGesvd<culaFloatComplex>(nlhs, plhs, nrhs, prhs, classID, complexity);
else if ( classID == mxDOUBLE_CLASS && complexity == mxCOMPLEX )
mexCulaGesvd<culaDoubleComplex>(nlhs, plhs, nrhs, prhs, classID, complexity);
else
mexErrMsgTxt("culasvd: Unknown or unsupported data type");
}
```

As a quick overview, this file includes the 3 main components:

1) The CULA header for our SVD function and the MATLAB header for the MATLAB data types and API
2) A templated function that will support all four MATLAB data types
3) The MATLAB gateway function, 'mexFunction', required by all C/C++ programs called by MATLAB

These are the three main components needed by any MATLAB wrapper for CULA. As you can see, we really haven't gotten into any SVD specific code with the exception of the error checking.

Check back soon for Part 2 where we'll go into more details about how to wrap MATLAB's matrix data into CULA's SVD function. If you have any questions, feel free to ask in a comment here or on our forums.