27Jul/10Off

Using CULA in MATLAB, Part 3

by Kyle

In part one of this three part series, we introduced a method using C++ templates to support all four major MATLAB data types. In part two, we detailed the specifics of how to integrate CULA's SVD algorithm into MATLAB. Finally, in todays section we'll give some tips on error checking, compilation, linking, usage, and benchmarking.

The code posted in the previous two examples didn't include any error checking. For example, if an allocation on the device failed because your GPU doesn't have enough memory, the error will be silently ignored and MATLAB will most likely return blank answers. Similarly, if no CUDA enable GPU is found, the original code will continue with no visible problem. These potential errors can all be handled by the culaStatus variable and the MATLAB error handler, mexErrMsgIdAndTxt(). By using these two tools, we can detect a CULA error and safely return control to MATLAB with a visible error. Another option, which I won't outline here would be fall back original MATLAB built in function.

The following addition to the header provides are nice parser of culaStatus errors. If no error is found, the code returns immediately. Otherwise, we describe the error to MATLAB.

#ifndef __CULAMEX_HPP__
#define __CULAMEX_HPP__

// Header code from Part 2

void checkStatus(culaStatus status, const char* funcname)
{
    if(!status)
        return;

    culaShutdown();

    char id[128];
    sprintf(id, "CULA:%s:", funcname);

    if(status == culaArgumentError)
    {
        strcat(id, "culaArgumentError");
        mexErrMsgIdAndTxt(id, "%s: Invalid value for parameter %d\n", funcname, culaGetErrorInfo());
    }
    else if(status == culaDataError)
    {
        strcat(id, "culaDataError");
        mexErrMsgIdAndTxt(id, "%s: Data error (%d)\n", funcname, culaGetErrorInfo());
    }
    else if(status == culaBlasError)
    {
        strcat(id, "culaBlasError");
        mexErrMsgIdAndTxt(id, "%s: Blas error (%d)\n", funcname, culaGetErrorInfo());
    }
    else if(status == culaRuntimeError)
    {
        strcat(id, "culaRuntimeError");
        mexErrMsgIdAndTxt(id, "%s: Runtime error (%d)\n", funcname, culaGetErrorInfo());
    }
    else if(status == culaNotInitialized)
        strcat(id, "culaNotInitialized");
    else if(status == culaNoHardware)
        strcat(id, "culaNoHardware");
    else if(status == culaInsufficientRuntime)
        strcat(id, "culaInsufficientRuntime");
    else if(status == culaInsufficientComputeCapability)
        strcat(id, "culaInsufficientComputeCapability");
    else if(status == culaInsufficientMemory)
        strcat(id, "culaInsufficientMemory");
    else if(status == culaFeatureNotImplemented)
        strcat(id, "culaFeatureNotImplemented");
    else
        strcat(id, "unknown");

    // Message that don't have error info fall through to here
    mexErrMsgIdAndTxt(id, "%s: %s\n", funcname, culaGetStatusString(status));
}

#endif // __CULAMEX_HPP__

In the main code, simply call the checkStatus() function after any GPU call that can fail.

// Initialize CULA
culaStatus status = culaInitialize();
checkStatus(status, "culaInitialize");

// SVD Factorization
status = culaGesvd('A', 'A', M, N, A, M, SVEC, U, M, VT, N);
checkStatus(status, "culaGesvd");

Now we'll move onto some basic MATLAB compilation.  At the MATLAB command line simply type,

mex -setup

and you'll see a list of compilers available on your machine. Select your compiler of choice and continue. Please note that the default compiler included with MATLAB on Windows, lcc, does not support all of the C++ functionality needed to compile the file examples we have provided. However, Visual Studio Express 2008 and 2010 are free of charge and will get the job done.

Next, to call your newly configured compiler type,

mex( ['-I' getenv('CULA_INC_PATH')], ['-L' getenv('CULA_LIB_PATH_64')], '-lcula', 'culasvd.cpp' )

where the CULA_INC_PATH and CULA_LIB_PATH_64 environment variables are set to the location of the CULA headers and libraries. These are typically set by the CULA installer. If everything goes successfully, you've now generated a file named culasvd.mexa64, where the suffix is dependent on your system. The function will now be usable by simply calling:

[u,s,v] = culasvd(A)

If you see an error: "The specified module could not be found," a shared CULA library could not be loaded by MATLAB. The solution to this varies from platform to platform, but a surefire fix is to simply copy all of the shared libraries in your CULA bin/bin64 folder into the folder containing your newly created mex functions.

Try benchmarking your code and see what kind of results you get! We've seen upwards of 5-10x speed ups for a number of CULA functions.

N = 2048;
A = rand(N);
tic; [u,s,v] = culasvd(A); toc;
Elapsed time is 14.432616 seconds.
tic; [u,s,v] = svd(A); toc;
Elapsed time is 103.646813 seconds.

I hope this example proved useful to you. At sometime in the near future, we'll be posting information on how to use a number of other functions within MATLAB. Again, if you have any questions or comments, please visit our forums!

More Information:
CULA Programmers Guide: http://www.culatools.com/html_guide/
MATLAB MEX-file Guide: http://www.mathworks.com/support/tech-notes/1600/1605.html
C++ Templates: http://en.wikipedia.org/wiki/Template_(programming)

21Jul/10Off

Using CULA in MATLAB, Part 2

by Kyle

In part one of this three part series, we introduced a method using C++ templates to support all four major MATLAB data types. In today's part, we'll detail the specifics of how to integrate CULA's SVD algorithm into MATLAB. Finally, part three will give some tips on error checking, compilation, linking, usage, and benchmarking.

To recap, we left off from part one with a working code base that will enter a templated function dependent on the MATLAB data type.  From here, we'd like to write one templated function that can support single, double, complex single, and complex double datatypes. However, because CULA and MATLAB store complex data differently, we are going to have to create a number of helper functions to convert between the two types.

Internally, MATLAB stores it's complex data using a structure of arrays (SoA) format where the real data and complex data are stored in separate memory locations. In CULA, or any other LAPACK implementation, complex data is stored using an array of structures (AoS) format where the real and complex data are interleaved within a single large array. Therefore, in order to support complex data, we'll need to create two helper functions that convert from AoS to SoA and vice-versa. This unfortunately introduces the need for twice as much allocation when dealing with complex data.  However, MATLAB itself uses LAPACK under the hood and natively suffers from the same plight.

The following code sample illustrates, using C++ templates, how to perform this conversion. I'd recommend putting this function in a header such as 'culamex.hpp' as it will be needed for every function where you wish to support complex data.

// culamex.hpp
// Common helper functions for integrating CULA into MATLAB

#ifndef __CULAMEX_HPP__
#define __CULAMEX_HPP__

// Used to find real type associated with complex type
template<class T> struct ToReal { typedef T type; };
template<>        struct ToReal<culaFloatComplex> { typedef float type; };
template<>        struct ToReal<culaDoubleComplex> { typedef double type; };

// Convert from MATLAB complex format to CULA complex format
template <typename FloatType>
void MatToCula(FloatType* buf, const mxArray* src)
{
    typedef typename ToReal<FloatType>::type RealType;

    const RealType* r0 = (const RealType*) mxGetPr(src);
    const RealType* i0 = (const RealType*) mxGetPi(src);

    int M = (int)mxGetM(src);
    int N = (int)mxGetN(src);

    for(int j = 0; j < N; ++j)
    {
        for(int i = 0; i < M; ++i)
        {
            int p  = j*M+i;
            buf[p].x = r0[p];
            buf[p].y = i0[p];
        }
    }
}

// Convert from CULA complex format to MATLAB complex format
template <typename FloatType>
void CulaToMat(mxArray* src, FloatType* buf)
{
    typedef typename ToReal<FloatType>::type RealType;

    RealType* r0 = (RealType*) mxGetPr(src);
    RealType* i0 = (RealType*) mxGetPi(src);

    int M = mxGetM(src);
    int N = mxGetN(src);

    for(int j = 0; j < N; ++j)
    {
        for(int i = 0; i < M; ++i)
        {
            int p = j*M+i;
            r0[p] = buf[p].x;
            i0[p] = buf[p].y;
        }
    }
}

// Do nothing for non-complex cases
template <> void MatToCula(float* buf, const mxArray* src) {}
template <> void MatToCula(double* buf, const mxArray* src) {}
template <> void CulaToMat(mxArray* src, float* buf) {}
template <> void CulaToMat(mxArray* src, double* buf) {}

#endif //__CULAMEX_HPP__

Now, that we have our generic helper functions, we can start writing some code that acts just like MATLAB's svd function.  Upon examining the documentation for MATLAB's svd and CULA's xGESVD function, you'll notice two major differences.

  1. MATLAB returns the singular values in diagonal array whereas CULA returns a vector
  2. MATLAB return V whereas CULA return V' (transposed)

In order to preserve MATLAB's interface, we'll have to do some extra house keeping to convert from the singular value vector into to a diagonal matrix and also perform a transpose (or for complex data, conjugate transpose) on VT.

// culaSvd.cpp
// Implements CULA accelerated SVD to be called within MATLAB

#include <algorithm>

#include "mex.h"
#include "culapack.hpp"
#include "culamex.hpp"

using std::max;
using std::min;

// Complex conjugation of complex data
template<class T> void Conjugate(T* a) { /* Do nothing */ };
template<> void Conjugate(culaFloatComplex* a) { a->y = -(a->y); }
template<> void Conjugate(culaDoubleComplex* a) { a->y = -(a->y); }

template <typename T>
void mexCulaGesvd(int nlhs,           /* number of expected outputs */
              mxArray* plhs[],        /* output pointer array */
              int nrhs,               /* number of inputs */
              const mxArray* prhs[],  /* input pointer array */
              mxClassID id,
              mxComplexity complexity)
{
    // Initialize flags and types
    typedef typename ToReal<T>::type RealType;
    bool isReal = (complexity == mxREAL);
    bool isComplex = (complexity == mxCOMPLEX);

    // Initialize sizes
    int M = (int) mxGetM(prhs[0]);
    int N = (int) mxGetN(prhs[0]);
    int K = min(M,N);
    int L = max(M,N);

    // Allocate a temporary to not destroy input data
    T* A = (T*) mxMalloc( M * N * sizeof(T) );

    if (isReal)
    {
        // Copy input data directly into temporary
        memcpy( A, mxGetPr( prhs[0] ), M * N * sizeof(T) );
    }
    else if (isComplex)
    {
        // If complex, convert from MATLAB format into CULA format
        MatToCula( A, prhs[0] );
    }

    // Create MATLAB output matrices
    plhs[0] = mxCreateNumericMatrix(M, M, id, complexity);  // U (M x M)
    plhs[1] = mxCreateNumericMatrix(M, N, id, 0);           // S (M x N, Real)
    plhs[2] = mxCreateNumericMatrix(N, N, id, complexity);  // V (N x N)

    // Allocate CULA intermediate
    RealType* SVEC = (RealType*) mxMalloc( K * sizeof(RealType) );

    // CULA Memory Pointers
    T* U;
    T* VT;

    if (isReal)
    {
        // Get CULA memory pointers from allocated MATLAB matrices
        U = (T*) mxGetPr( plhs[0] );
        VT = (T*) mxGetPr( plhs[2] );
    }
    else if (isComplex)
    {
        // If complex, allocate an AoS complex buffer for CULA
        U = (T*) mxMalloc( M * M * sizeof(T) );
        VT = (T*) mxMalloc( N * N * sizeof(T) );
    }

    // Initialize CULA
    culaInitialize();

    // CULA SVD Factorization
    culaGesvd('A', 'A', M, N, A, M, SVEC, U, M, VT, N);

    // Shutdown CULA
    culaShutdown();

    // Get pointer to output matrix, S
    RealType* S = (RealType*) mxGetPr( plhs[1] );

    // Copy SVEC to diagonal of S
    for (int i=0; i<K; i++)
        S[i*M+i] = SVEC[i];

    // Inplace transpose of VT
    for (int i=0; i<N; i++)
    {
        for (int j=i; j<N; j++)
        {
            T temp = VT[j+i*N];
            VT[j+i*N] = VT[i+j*N];
            VT[i+j*N] = temp;
        }
    }

    // If complex, conjugate VT
    if (isComplex)
        for (int i=0; i<N; i++)
            for (int j=0; j<N; j++)
                Conjugate( &VT[j+i*N] );

    if (isComplex)
    {
        // If complex, convert from CULA format into MATLAB format
        CulaToMat( plhs[0], U );
        CulaToMat( plhs[2], VT );

        // Free MATLAB buffers
        mxFree(U);
        mxFree(VT);
    }

    // Free allocate data
    mxFree(A);
    mxFree(SVEC);
}

// MATLAB Gateway Function
void mexFunction(int nlhs,              /* number of expected outputs */
                 mxArray* plhs[],       /* output pointer array */
                 int nrhs,              /* number of inputs */
                 const mxArray* prhs[]  /* input pointer array */ )
{
    // See Part 1
    // mexCulaGesvd(...)
}

When combined with the gateway function introduced in part one, you should now have enough code to compile a working CULA accelerated, MATLAB callable SVD routine that support all four major datatypes. In the next part of this series, we'll discuss how to error check, compile, and use this routine in MATLAB.

While I've tried to keep the comments in the code quite verbose, if you have any questions regarding this code please ask in the comments section or in our user forums.

More Information:
CULA Programmers Guide: http://www.culatools.com/html_guide/
MATLAB MEX-file Guide: http://www.mathworks.com/support/tech-notes/1600/1605.html
C++ Templates: http://en.wikipedia.org/wiki/Template_(programming)

14Jul/10Off

Using CULA in MATLAB, Part 1

by Kyle

It is possible to use CULA to accelerate a large number of popular linear algebra routines in MATLAB. With some relatively simple interface code, it's easy to seamlessly accelerate popular MATLAB routines such as: qr, lu, svd, eig, and mldivide.

When writing your own MATLAB code that uses an external library such as CULA, there are a few common hurdles to overcome. In the next few blog posts, we'll introduce these problems and discuss how to solve them. By the end of the series, we'll have enough code to implement a CULA accelerated Singular Value Decomposition (SVD), culasvd, that matches the functionality of MATLAB's svd routine.

Part 1 of this series will explain how to handle multiple CULA data types in MATLAB. Part 2 will detail some tips and tricks on interface wrapping. Finally, Part 3 will show how to compile, link, and benchmark CULA within MATLAB.

Excluding sparse, there are four main data types supported by MATLAB and CULA: single, double, single complex, and double complex. When writing a CULA accelerated MATLAB routine, your function should be able to support all of these types invisibly to the end user. Since both the MATLAB MEX compiler and CULA support C++ templates, we can easily implement all four datatypes through a single code path.

The following code snippet shows how to use C++ templates to handle all four MATLAB types. We use the MATLAB functions, mxGetClassID() and mxIsComplex(), to determine the precision and complexity of the input matrix and then switch into an explicitly typed CULA wrapper.

// culaSvd.cpp
// Implements CULA accelerated SVD to be called within MATLAB

#include "mex.h"
#include "culapack.hpp"

// Templated wrapper
template <typename T>
void mexCulaGesvd(int nlhs,           /* number of expected outputs */
              mxArray* plhs[],        /* output pointer array */
              int nrhs,               /* number of inputs */
              const mxArray* prhs[],  /* input pointer array */
              mxClassID id,
              mxComplexity complexity)
{
    // Function core, details in "Using CULA in MATLAB, Part 2"
    // culaGesvd(...)
}

// MATLAB Gateway Function
void mexFunction(int nlhs,              /* number of expected outputs */
                 mxArray* plhs[],       /* output pointer array */
                 int nrhs,              /* number of inputs */
                 const mxArray* prhs[]  /* input pointer array */ )
{
    // We only support a full SVD in this example
    if(nrhs != 1)
        mexErrMsgTxt("culasvd: Must have 1 input argument [X]");
    if(nlhs != 3)
        mexErrMsgTxt("culasvd: Must have 3 output arguments [U,S,V]");

    // Get precision (single or double)
    mxClassID classID = mxGetClassID(prhs[0]);

    // Get complexity (real or complex)
    mxComplexity complexity = mxIsComplex(prhs[0]) ? mxCOMPLEX : mxREAL;

    // Switch based on data type
    if ( classID == mxSINGLE_CLASS && complexity == mxREAL)
        mexCulaGesvd<culaFloat>(nlhs, plhs, nrhs, prhs, classID, complexity);
    else if (classID == mxDOUBLE_CLASS && complexity == mxREAL )
        mexCulaGesvd<culaDouble>(nlhs, plhs, nrhs, prhs, classID, complexity);
    else if ( classID == mxSINGLE_CLASS && complexity == mxCOMPLEX )
        mexCulaGesvd<culaFloatComplex>(nlhs, plhs, nrhs, prhs, classID, complexity);
    else if ( classID == mxDOUBLE_CLASS && complexity == mxCOMPLEX )
        mexCulaGesvd<culaDoubleComplex>(nlhs, plhs, nrhs, prhs, classID, complexity);
    else
        mexErrMsgTxt("culasvd: Unknown or unsupported data type");
}

As a quick overview, this file includes the 3 main components:

1) The CULA header for our SVD function and the MATLAB header for the MATLAB data types and API
2) A templated function that will support all four MATLAB data types
3) The MATLAB gateway function, 'mexFunction', required by all C/C++ programs called by MATLAB

These are the three main components needed by any MATLAB wrapper for CULA. As you can see, we really haven't gotten into any SVD specific code with the exception of the error checking.

Check back soon for Part 2 where we'll go into more details about how to wrap MATLAB's matrix data into CULA's SVD function. If you have any questions, feel free to ask in a comment here or on our forums.

More Information:
CULA Programmers Guide: http://www.culatools.com/html_guide/
MATLAB MEX-file Guide: http://www.mathworks.com/support/tech-notes/1600/1605.html
C++ Templates: http://en.wikipedia.org/wiki/Template_(programming)