## Accelerate MATLAB with the CULA Link Interface

One the exciting new features in CULA R12 is the link interface. In a previous blog post we introduced the features of this new tool and today we'll demonstrate how to easily use this interface with the popular computing tool MATLAB.

MATLAB has a feature that allows you to externally specify a library for your LAPACK and BLAS calls. Typically this feature is used if your architecture does not perform well with the libraries included with MATLAB. However, you can also use this feature to utilize GPU accelerated CULA libraries to boost performance! This is achieved by simply changing a few environment variables -- there are no MEX files to compile, no clunky gpuArray objects, and no changes MATLAB function names!

The first variables that need to be set are: LAPACK_VERSION and BLAS_VERSION. These are specific to MATLAB and should each be pointed to the 'cula_lapack_link.dll' file (cula_lapack_link.so on Linux).

The next variables that should be set are related to the CULA link library. A useful option is the CULA_DEBUG_LOG environment variable, which when set will write messages to a log file that will allow you see to see for which functions the CULA library is called. For 64-bit versions of MATLAB, set the CULA_ILP64 flag because MATLAB uses 64-bit integers internally.

On Windows, an easy way to use CULA-accelerated MATLAB is through the use of a batch file. Simply create a new .bat file with to set the environment variables and launch the MATLAB executable. For convenience, we have provided a Windows batch file to do just that. Simply place this file in your MATLAB bin folder alongside the standard matlab.bat file. Be sure that the CULA bin path is also in your Windows path so the appropriate libraries can be loaded.

Running the new batch file will launch MATLAB with CULA acceleration enabled. Running a few simple commands we can see that our linear algebra functions (matrix multiplication, QR, and SVD decomposition) are running faster:

>> tic; A = A*A'; toc; Elapsed time is 3.414187 seconds. >> tic; [q,r] = qr(B); toc; Elapsed time is 11.318329 seconds. >> tic; x = C \ b; toc; Elapsed time is 19.133406 seconds.

Contrast this to the CPU implementation where the operations take up to 8x as long to complete!

>> tic; C = A*A'; toc; Elapsed time is 7.035089 seconds. >> tic; [q,r] = qr(B); toc; Elapsed time is 49.837156 seconds. >> tic; x = C \ b; toc; Elapsed time is 151.153907 seconds.

Many functions in MATLAB use LAPACK under the hood. Other MATLAB routines that will automatically be accelerated include (but are not limited to):

- matrix multiply (*)
- matrix solve (\)
- svd
- eig
- inv

More information about the link interface can be found in the link_interface.txt file contained in the doc folder of your CULA install.

If you have any questions, please ask on our forums!

Edited on January 23, 2012 to update all occurrences of cula_link to cula_lapack_link.

## GPU Computing in Matlab

One of the big announcements at GTC was Matlab's integrated GPU computing toolbox and this generated considerable buzz. And one of the questions we receive most often is regarding the potential for Matlab to experience speedups from GPU computing. Matlab is one of those great products in terms of usability but the most common complaint is that it's too slow, so GPUs are an obvious fit here. Our friends over at Accelereyes have put together a nice summary on the state of GPU computing in Matlab, and we wanted to share that. For the advanced CULA and Matlab users out there, it is also worth checking out our recent blog series where we describe the process of manually integrating CULA routines into Matlab code.

## Using CULA in MATLAB, Part 3

In part one of this three part series, we introduced a method using C++ templates to support all four major MATLAB data types. In part two, we detailed the specifics of how to integrate CULA's SVD algorithm into MATLAB. Finally, in todays section we'll give some tips on error checking, compilation, linking, usage, and benchmarking.

The code posted in the previous two examples didn't include any error checking. For example, if an allocation on the device failed because your GPU doesn't have enough memory, the error will be silently ignored and MATLAB will most likely return blank answers. Similarly, if no CUDA enable GPU is found, the original code will continue with no visible problem. These potential errors can all be handled by the culaStatus variable and the MATLAB error handler, mexErrMsgIdAndTxt(). By using these two tools, we can detect a CULA error and safely return control to MATLAB with a visible error. Another option, which I won't outline here would be fall back original MATLAB built in function.

The following addition to the header provides are nice parser of culaStatus errors. If no error is found, the code returns immediately. Otherwise, we describe the error to MATLAB.

#ifndef __CULAMEX_HPP__ #define __CULAMEX_HPP__ // Header code from Part 2 void checkStatus(culaStatus status, const char* funcname) { if(!status) return; culaShutdown(); char id[128]; sprintf(id, "CULA:%s:", funcname); if(status == culaArgumentError) { strcat(id, "culaArgumentError"); mexErrMsgIdAndTxt(id, "%s: Invalid value for parameter %d\n", funcname, culaGetErrorInfo()); } else if(status == culaDataError) { strcat(id, "culaDataError"); mexErrMsgIdAndTxt(id, "%s: Data error (%d)\n", funcname, culaGetErrorInfo()); } else if(status == culaBlasError) { strcat(id, "culaBlasError"); mexErrMsgIdAndTxt(id, "%s: Blas error (%d)\n", funcname, culaGetErrorInfo()); } else if(status == culaRuntimeError) { strcat(id, "culaRuntimeError"); mexErrMsgIdAndTxt(id, "%s: Runtime error (%d)\n", funcname, culaGetErrorInfo()); } else if(status == culaNotInitialized) strcat(id, "culaNotInitialized"); else if(status == culaNoHardware) strcat(id, "culaNoHardware"); else if(status == culaInsufficientRuntime) strcat(id, "culaInsufficientRuntime"); else if(status == culaInsufficientComputeCapability) strcat(id, "culaInsufficientComputeCapability"); else if(status == culaInsufficientMemory) strcat(id, "culaInsufficientMemory"); else if(status == culaFeatureNotImplemented) strcat(id, "culaFeatureNotImplemented"); else strcat(id, "unknown"); // Message that don't have error info fall through to here mexErrMsgIdAndTxt(id, "%s: %s\n", funcname, culaGetStatusString(status)); } #endif // __CULAMEX_HPP__

In the main code, simply call the checkStatus() function after any GPU call that can fail.

// Initialize CULA culaStatus status = culaInitialize(); checkStatus(status, "culaInitialize"); // SVD Factorization status = culaGesvd('A', 'A', M, N, A, M, SVEC, U, M, VT, N); checkStatus(status, "culaGesvd");

Now we'll move onto some basic MATLAB compilation. At the MATLAB command line simply type,

mex -setup

and you'll see a list of compilers available on your machine. Select your compiler of choice and continue. Please note that the default compiler included with MATLAB on Windows, lcc, does not support all of the C++ functionality needed to compile the file examples we have provided. However, Visual Studio Express 2008 and 2010 are free of charge and will get the job done.

Next, to call your newly configured compiler type,

mex( ['-I' getenv('CULA_INC_PATH')], ['-L' getenv('CULA_LIB_PATH_64')], '-lcula', 'culasvd.cpp' )

where the CULA_INC_PATH and CULA_LIB_PATH_64 environment variables are set to the location of the CULA headers and libraries. These are typically set by the CULA installer. If everything goes successfully, you've now generated a file named culasvd.mexa64, where the suffix is dependent on your system. The function will now be usable by simply calling:

[u,s,v] = culasvd(A)

If you see an error: "The specified module could not be found," a shared CULA library could not be loaded by MATLAB. The solution to this varies from platform to platform, but a surefire fix is to simply copy all of the shared libraries in your CULA bin/bin64 folder into the folder containing your newly created mex functions.

Try benchmarking your code and see what kind of results you get! We've seen upwards of 5-10x speed ups for a number of CULA functions.

N = 2048; A = rand(N); tic; [u,s,v] = culasvd(A); toc; Elapsed time is 14.432616 seconds. tic; [u,s,v] = svd(A); toc; Elapsed time is 103.646813 seconds.

I hope this example proved useful to you. At sometime in the near future, we'll be posting information on how to use a number of other functions within MATLAB. Again, if you have any questions or comments, please visit our forums!

**More Information:**

CULA Programmers Guide: http://www.culatools.com/html_guide/

MATLAB MEX-file Guide: http://www.mathworks.com/support/tech-notes/1600/1605.html

C++ Templates: http://en.wikipedia.org/wiki/Template_(programming)