<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CULA</title>
	<atom:link href="http://www.culatools.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.culatools.com</link>
	<description>GPU Accelerated Linear Algebra</description>
	<lastBuildDate>Tue, 07 Sep 2010 17:31:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>CULA Team Working with University of Delaware</title>
		<link>http://www.culatools.com/blog/2010/09/07/air-force-project-and-our-partnership-with-the-university-of-delaware/</link>
		<comments>http://www.culatools.com/blog/2010/09/07/air-force-project-and-our-partnership-with-the-university-of-delaware/#comments</comments>
		<pubDate>Tue, 07 Sep 2010 01:30:26 +0000</pubDate>
		<dc:creator>Liana</dc:creator>
				<category><![CDATA[GPGPU Industry]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=1092</guid>
		<description><![CDATA[Last week we announced a partnership with the Global Computing Lab at the University of Delaware for a  GPU-related initiatives with the Air Force.  You may have already seen the news on HPCWire or InsideHPC blog, but we wanted to share here what makes this project exciting for us. This project entails the development of [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_1095" class="wp-caption alignright" style="width: 310px"><a href="http://www.culatools.com/wp-content/uploads/2010/08/Taufer_GPU_Algorithms_038.jpg"><img class="size-medium wp-image-1095 " src="http://www.culatools.com/wp-content/uploads/2010/08/Taufer_GPU_Algorithms_038-300x199.jpg" alt="" width="300" height="199" /></a><p class="wp-caption-text">John Humphrey of EM Photonics working with Dr. Michela Taufer&#039;s group at UD</p></div>
<p>Last week we announced a partnership with the Global Computing Lab at the University of Delaware for a  GPU-related initiatives with the Air Force.  You may have already seen the news on <a href="http://www.hpcwire.com/offthewire/EM-Photonics-University-of-Delaware-Team-Up-to-Develop-Advanced-Algorithms-for-Air-Force-101309774.html">HPCWire</a> or <a href="http://insidehpc.com/2010/08/23/em-photonics-and-university-of-delaware-collaborate-on-air-force-project/">InsideHPC</a> blog, but we wanted to share here what makes this project exciting for us.</p>
<p>This project entails the development of innovative parallel algorithms for scientific computing, modeling and simulation for a multi-GPU, multi-node environment. Air Force applications to benefit from this research include electromagnetic modeling, computational fluid dynamics, structural mechanics, and radiation transport, to name a few. This work will run on the University's largest GPU machine, which features the  NVIDIA's Tesla-brand Fermi GPU computing technology.  The CULA library is going to be a direct beneficiary of this work!</p>
<p>As a University of Delaware Alumni, John Humphrey, who will be speaking  about CULA at GTC 2010, understands the impact of such a project for  university students that are seeking hands-on GPU experience while  getting their degree. Speaking about this project, John said, "This is a  valuable opportunity for EM Photonics. We have successfully  collaborated in the past with Dr. Taufer’s Group and their familiarity  with our CULA library adds great value to this project. We look forward  to extending our work in dense matrix solvers on multiple GPUs, as well  as researching the feasibility of multi-GPU sparse solvers."</p>
<p>The story was also featured on <a href="http://www.udel.edu/udaily/2011/aug/taufer-em-photonics082310.html">UDaily</a> and we look forward to sharing the results of this work in a few months! Stay tuned for more!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/09/07/air-force-project-and-our-partnership-with-the-university-of-delaware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDA Certification and EM Photonics Training</title>
		<link>http://www.culatools.com/blog/2010/08/20/cuda-certification-and-em-photonics-training/</link>
		<comments>http://www.culatools.com/blog/2010/08/20/cuda-certification-and-em-photonics-training/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 15:27:14 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[GPGPU Industry]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=1070</guid>
		<description><![CDATA[Last month NVIDIA unveiled their CUDA certification program. This is exciting news because many of our CULA users and training customers have been asking for an official recognition to signify their CUDA experience. Having been GPU developers for over 5 years, we're excited to see the field maturing to this level. Alongside the announcement of [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-1082" title="frontcover_100px" src="http://www.culatools.com/wp-content/uploads/2010/08/frontcover_100px.jpg" alt="" width="100" height="125" />Last month NVIDIA unveiled their <a title="CUDA certification" href="http://www.nvidia.com/object/certification.html" target="_blank">CUDA certification</a> program. This is exciting news because many of our CULA users and training customers have been asking for an official recognition to signify their CUDA experience. Having been GPU developers for over 5 years, we're excited to see the field maturing to this level.</p>
<p>Alongside the announcement of the certification program was the unveiling of the official <a title="CUDA Syllabus" href="http://www.nvidia.com/object/io_1266605227307.html" target="_blank">syllabus </a>for the CUDA certification exam. We're happy to say that our course covers all of the expected topics as well as many more! Not only do we cover the fundamentals as outlined in this syllabus, but we've created several modules that allow us to tailor our course to your specific needs. Are you doing high-performance image processing? We've got you covered. Scientific computing with a basis in linear algebra? We've got a module that covers all of the tools you'll need (especially CULA) to make your simulations run as fast as possible. Of course, if you're already experienced and you'd like to take your skills to the next level, we've got several advanced modules just for people like you.</p>
<p>If you'd like more information about our training program, <a title="EM Photonics CUDA Training" href="http://www.emphotonics.com/services/cuda-training">send us a note</a>.</p>
<p><br class="spacer_" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/08/20/cuda-certification-and-em-photonics-training/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summer 2010 CULA News</title>
		<link>http://www.culatools.com/blog/2010/08/13/summer-2010-cula-news/</link>
		<comments>http://www.culatools.com/blog/2010/08/13/summer-2010-cula-news/#comments</comments>
		<pubDate>Fri, 13 Aug 2010 13:30:31 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Release Notes & News]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=1039</guid>
		<description><![CDATA[Now that we're getting close to NVIDIA's GTC again, I wanted to update here with some exciting CULA news. First up is GTC itself, dated September 20-23 in San Jose.  The CULA team will be there in full force, as we were last year.  We are presenting two papers this year: one on GPU linear [...]]]></description>
			<content:encoded><![CDATA[<p>Now that we're getting close to NVIDIA's GTC again, I wanted to update here with some exciting CULA news.</p>
<p><img class="alignright size-medium wp-image-1065" title="Capture" src="http://www.culatools.com/wp-content/uploads/2010/08/Capture-300x91.png" alt="" width="300" height="91" />First up is GTC itself, dated September 20-23 in San Jose.  The CULA team will be there in full force, as we were last year.  We are presenting two papers this year: one on GPU linear algebra, where we will talk about the current and upcoming features of CULA.  If you are interested in sparse matrices and are curious about our plans, then you should get a seat in this session.  Remember that you must sign up for GTC talks ahead of time in order to guarantee admittance - last year we packed the room for our CULA talk and this year we have much much more to say!  Our other talk is about other GPU work that takes place here at EM Photonics.  A large portion of our work is focused on the needs of the military and government, and our other session will discuss a range of these applications, including: computational fluid dynamics (CFD), embedded systems - now with GPUs, and both embedded and data-center embodiments of image processing.</p>
<p>We will have a table at GTC as well, just as last year.  You might remember that last year we served cupcakes (which we called CULAcakes!)  Cupcakes aren't on this year's menu since the GTC catering kept us stuffed with delicious food, but we hope to be demoing on some some of our GPU software featuring some guest hardware that you probably haven't seen before.</p>
<p>We also have started planning our program for Supercomputing 2010, held this year in New Orleans.  We certainly hope to see you all there!</p>
<p>In other news, we are working hard on a chapter for GPU Computing Gems 2.  It details some of the underlying pieces of CULA that help get excellent speedups.  CPUs do so well at linear algebra that getting 7x or more is quite the feat.  That may not sound like much compared to the 100-200x speedups you see at the NVIDIA CUDA Zone, but remember that the CPU will take many years to reach our speeds. Is it too cheesy to say "tomorrow's computing today?"</p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/08/13/summer-2010-cula-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GPU Test Drive</title>
		<link>http://www.culatools.com/blog/2010/08/06/gpu-test-drive/</link>
		<comments>http://www.culatools.com/blog/2010/08/06/gpu-test-drive/#comments</comments>
		<pubDate>Fri, 06 Aug 2010 20:59:07 +0000</pubDate>
		<dc:creator>Liana</dc:creator>
				<category><![CDATA[GPGPU Industry]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=1026</guid>
		<description><![CDATA[Are you working with AMBER, NAMD or GROMACS? If yes, you may want to take a look at this limited time opportunity to simulate your molecule file using a Tesla GPU at no cost to you! We heard it from PSSC Labs, one of the first HPC vendors to become a CULA Channel Partner. This [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-1027" title="tesla_bio_workbench" src="http://www.culatools.com/wp-content/uploads/2010/08/tesla_bio_workbench-150x150.gif" alt="" width="150" height="150" />Are you working with AMBER, NAMD or GROMACS? If yes, you may want to take a look at this limited time opportunity to simulate your molecule file using a Tesla GPU at no cost to you!</p>
<p>We heard it from <a href="http://www.pssclabs.com" target="_blank">PSSC Labs</a>, one of the first HPC vendors to become a <a href="http://www.culatools.com/about-us/press-releases/">CULA Channel Partner</a>. This is a very cool initiative led by NVIDIA. As we understand, anyone using <a href="http://www.nvidia.com/object/amber_on_tesla.html" target="_blank">AMBER</a>, <a href="http://www.nvidia.com/object/namd_on_tesla.html" target="_blank">NAMD</a> or <a href="http://www.nvidia.com/object/gromacs_on_tesla.html" target="_blank">CROMACS</a> can basically register and test out NVIDIA’s Tesla GPUs, which claim to be capable of 10X or more speed ups for these molecular dynamic applications.</p>
<p>If you are interested, you can <a href="http://www.pssclabs.com/tesla_bio_workbench.asp" target="_blank">register</a> through PSSC Labs and you can also email any questions about this GPU test drive to 4sales@pssclabs.com.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/08/06/gpu-test-drive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PyCULA – Python bindings for CULA</title>
		<link>http://www.culatools.com/blog/2010/08/03/pycula-%e2%80%93-python-bindings-for-cula/</link>
		<comments>http://www.culatools.com/blog/2010/08/03/pycula-%e2%80%93-python-bindings-for-cula/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 21:51:23 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[CULA Applications]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=998</guid>
		<description><![CDATA[Louis Theran and Garrett Wright at Temple University have been working hard on some sophisticated Python bindings for CULA, called PyCULA. Our work with Python has been a simple ctypes binding to make direct calls into the CULA interface. Their PyCULA work adds a much more Python-like interface that is compatible with NumPy, which every [...]]]></description>
			<content:encoded><![CDATA[<p>Louis Theran and Garrett Wright at Temple University have been working hard on some sophisticated Python bindings for CULA, called <a href="http://math.temple.edu/research/geometry/PyCULA/">PyCULA</a>.  <a href="http://www.culatools.com/features/interfaces/">Our work</a> with Python has been a simple ctypes binding to make direct calls into the CULA interface.  Their PyCULA work adds a much more Python-like interface that is compatible with NumPy, which every Python user is likely to be familiar with.  The work is in Alpha state at the time but it looks very promising.</p>
<p>URL: <a href="http://math.temple.edu/research/geometry/PyCULA/">http://math.temple.edu/research/geometry/PyCULA/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/08/03/pycula-%e2%80%93-python-bindings-for-cula/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using CULA in MATLAB, Part 3</title>
		<link>http://www.culatools.com/blog/2010/07/27/using-cula-in-matlab-part-3/</link>
		<comments>http://www.culatools.com/blog/2010/07/27/using-cula-in-matlab-part-3/#comments</comments>
		<pubDate>Tue, 27 Jul 2010 21:40:32 +0000</pubDate>
		<dc:creator>Kyle</dc:creator>
				<category><![CDATA[MATLAB Integration]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=1003</guid>
		<description><![CDATA[In part one of this three part series, we introduced a method using C++ templates to support all four major MATLAB data types. In part two, we detailed the specifics of how to integrate CULA's SVD algorithm into MATLAB. Finally, in todays section we'll give some tips on error checking, compilation, linking, usage, and benchmarking. The [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.culatools.com/blog/2010/07/14/using-cula-in-matlab-part-1/">part one</a> of this three part series, we introduced a method using C++ templates to support all four major MATLAB data types. In <a href="http://www.culatools.com/blog/2010/07/21/using-cula-in-matlab-part-2/">part two</a>, we detailed the specifics of how to integrate CULA's SVD algorithm into MATLAB. Finally, in todays section we'll give some tips on error checking, compilation, linking, usage, and benchmarking.</p>
<p>The code posted in the previous two examples didn't include any error checking. For example, if an allocation on the device failed because your GPU doesn't have enough memory, the error will be silently ignored and MATLAB will most likely return blank answers. Similarly, if no CUDA enable GPU is found, the original code will continue with no visible problem. These potential errors can all be handled by the culaStatus variable and the MATLAB error handler, mexErrMsgIdAndTxt(). By using these two tools, we can detect a CULA error and safely return control to MATLAB with a visible error. Another option, which I won't outline here would be fall back original MATLAB built in function.</p>
<p>The following addition to the header provides are nice parser of culaStatus errors.  If no error is found, the code returns immediately. Otherwise, we describe the error to MATLAB.</p>
<pre class="brush: cpp;">
#ifndef __CULAMEX_HPP__
#define __CULAMEX_HPP__

// Header code from Part 2

void checkStatus(culaStatus status, const char* funcname)
{
    if(!status)
        return;

    culaShutdown();

    char id[128];
    sprintf(id, &quot;CULA:%s:&quot;, funcname);

    if(status == culaArgumentError)
    {
        strcat(id, &quot;culaArgumentError&quot;);
        mexErrMsgIdAndTxt(id, &quot;%s: Invalid value for parameter %d\n&quot;, funcname, culaGetErrorInfo());
    }
    else if(status == culaDataError)
    {
        strcat(id, &quot;culaDataError&quot;);
        mexErrMsgIdAndTxt(id, &quot;%s: Data error (%d)\n&quot;, funcname, culaGetErrorInfo());
    }
    else if(status == culaBlasError)
    {
        strcat(id, &quot;culaBlasError&quot;);
        mexErrMsgIdAndTxt(id, &quot;%s: Blas error (%d)\n&quot;, funcname, culaGetErrorInfo());
    }
    else if(status == culaRuntimeError)
    {
        strcat(id, &quot;culaRuntimeError&quot;);
        mexErrMsgIdAndTxt(id, &quot;%s: Runtime error (%d)\n&quot;, funcname, culaGetErrorInfo());
    }
    else if(status == culaNotInitialized)
        strcat(id, &quot;culaNotInitialized&quot;);
    else if(status == culaNoHardware)
        strcat(id, &quot;culaNoHardware&quot;);
    else if(status == culaInsufficientRuntime)
        strcat(id, &quot;culaInsufficientRuntime&quot;);
    else if(status == culaInsufficientComputeCapability)
        strcat(id, &quot;culaInsufficientComputeCapability&quot;);
    else if(status == culaInsufficientMemory)
        strcat(id, &quot;culaInsufficientMemory&quot;);
    else if(status == culaFeatureNotImplemented)
        strcat(id, &quot;culaFeatureNotImplemented&quot;);
    else
        strcat(id, &quot;unknown&quot;);

    // Message that don't have error info fall through to here
    mexErrMsgIdAndTxt(id, &quot;%s: %s\n&quot;, funcname, culaGetStatusString(status));
}

#endif // __CULAMEX_HPP__
</pre>
<p>In the main code, simply call the checkStatus() function after any GPU call that can fail.</p>
<pre class="brush: cpp;">
// Initialize CULA
culaStatus status = culaInitialize();
checkStatus(status, &quot;culaInitialize&quot;);

// SVD Factorization
status = culaGesvd('A', 'A', M, N, A, M, SVEC, U, M, VT, N);
checkStatus(status, &quot;culaGesvd&quot;);
</pre>
<p>Now we'll move onto some basic MATLAB compilation.  At the MATLAB command line simply type,</p>
<pre class="brush: plain;">
mex -setup
</pre>
<p>and you'll see a list of compilers available on your machine.  Select your compiler of choice and continue.  Please note that the default compiler included with MATLAB on Windows, lcc, does not support all of the C++ functionality needed to compile the file examples we have provided. However, Visual Studio Express 2008 and 2010 are free of charge and will get the job done.</p>
<p>Next, to call your newly configured compiler type,</p>
<pre class="brush: plain;">
mex( ['-I' getenv('CULA_INC_PATH')], ['-L' getenv('CULA_LIB_PATH_64')], '-lcula', 'culasvd.cpp' )
</pre>
<p>where the CULA_INC_PATH and CULA_LIB_PATH_64 environment variables are set to the location of the CULA headers and libraries. These are typically set by the CULA installer. If everything goes successfully, you've now generated a file named culasvd.mexa64, where the suffix is dependent on your system.  The function will now be usable by simply calling:</p>
<pre class="brush: plain;">
[u,s,v] = culasvd(A)
</pre>
<p>If you see an error: "The specified module could not be found," a shared CULA library could not be loaded by MATLAB. The solution to this varies from platform to platform, but a surefire fix is to simply copy all of the shared libraries in your CULA bin/bin64 folder into the folder containing your newly created mex functions.</p>
<p>Try benchmarking your code and see what kind of results you get!  We've seen upwards of 5-10x speed ups for a number of CULA functions.</p>
<pre class="brush: plain;">
N = 2048;
A = rand(N);
tic; [u,s,v] = culasvd(A); toc;
Elapsed time is 14.432616 seconds.
tic; [u,s,v] = svd(A); toc;
Elapsed time is 103.646813 seconds.
</pre>
<p>I hope this example proved useful to you. At sometime in the near future, we'll be posting information on how to use a number of other functions within MATLAB. Again, if you have any questions or comments, please visit our forums!</p>
<p><strong>More Information:</strong><br />
CULA Programmers Guide: <a href="http://www.culatools.com/html_guide/">http://www.culatools.com/html_guide/</a><br />
MATLAB MEX-file Guide: <a href="http://www.mathworks.com/support/tech-notes/1600/1605.html">http://www.mathworks.com/support/tech-notes/1600/1605.html</a><br />
C++ Templates: <a href="http://en.wikipedia.org/wiki/Template_(programming)">http://en.wikipedia.org/wiki/Template_(programming)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/07/27/using-cula-in-matlab-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using CULA in MATLAB, Part 2</title>
		<link>http://www.culatools.com/blog/2010/07/21/using-cula-in-matlab-part-2/</link>
		<comments>http://www.culatools.com/blog/2010/07/21/using-cula-in-matlab-part-2/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 18:02:41 +0000</pubDate>
		<dc:creator>Kyle</dc:creator>
				<category><![CDATA[MATLAB Integration]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=984</guid>
		<description><![CDATA[In part one of this three part series, we introduced a method using C++ templates to support all four major MATLAB data types. In today's part, we'll detail the specifics of how to integrate CULA's SVD algorithm into MATLAB. Finally, part three will give some tips on error checking, compilation, linking, usage, and benchmarking. To [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.culatools.com/blog/2010/07/14/using-cula-in-matlab-part-1/">part one</a> of this three part series, we introduced a method using C++ templates to support all four major MATLAB data types. In today's part, we'll detail the specifics of how to integrate CULA's SVD algorithm into MATLAB. Finally, part three will give some tips on error checking, compilation, linking, usage, and benchmarking.</p>
<p>To recap, we left off from part one with a working code base that will enter a templated function dependent on the MATLAB data type.  From here, we'd like to write one templated function that can support single, double, complex single, and complex double datatypes. However, because CULA and MATLAB store complex data differently, we are going to have to create a number of helper functions to convert between the two types.</p>
<p>Internally, MATLAB stores it's complex data using a structure of arrays (SoA) format where the real data and complex data are stored in separate memory locations. In CULA, or any other LAPACK implementation, complex data is stored using an array of structures (AoS) format where the real and complex data are interleaved within a single large array. Therefore, in order to support complex data, we'll need to create two helper functions that convert from AoS to SoA and vice-versa. This unfortunately introduces the need for twice as much allocation when dealing with complex data.  However, MATLAB itself uses LAPACK under the hood and natively suffers from the same plight.</p>
<p>The following code sample illustrates, using C++ templates, how to perform this conversion. I'd recommend putting this function in a header such as 'culamex.hpp' as it will be needed for every function where you wish to support complex data.</p>
<pre class="brush: cpp;">
// culamex.hpp
// Common helper functions for integrating CULA into MATLAB

#ifndef __CULAMEX_HPP__
#define __CULAMEX_HPP__

// Used to find real type associated with complex type
template&lt;class T&gt; struct ToReal { typedef T type; };
template&lt;&gt;        struct ToReal&lt;culaFloatComplex&gt; { typedef float type; };
template&lt;&gt;        struct ToReal&lt;culaDoubleComplex&gt; { typedef double type; };

// Convert from MATLAB complex format to CULA complex format
template &lt;typename FloatType&gt;
void MatToCula(FloatType* buf, const mxArray* src)
{
    typedef typename ToReal&lt;FloatType&gt;::type RealType;

    const RealType* r0 = (const RealType*) mxGetPr(src);
    const RealType* i0 = (const RealType*) mxGetPi(src);

    int M = (int)mxGetM(src);
    int N = (int)mxGetN(src);

    for(int j = 0; j &lt; N; ++j)
    {
        for(int i = 0; i &lt; M; ++i)
        {
            int p  = j*M+i;
            buf[p].x = r0[p];
            buf[p].y = i0[p];
        }
    }
}

// Convert from CULA complex format to MATLAB complex format
template &lt;typename FloatType&gt;
void CulaToMat(mxArray* src, FloatType* buf)
{
    typedef typename ToReal&lt;FloatType&gt;::type RealType;

    RealType* r0 = (RealType*) mxGetPr(src);
    RealType* i0 = (RealType*) mxGetPi(src);

    int M = mxGetM(src);
    int N = mxGetN(src);

    for(int j = 0; j &lt; N; ++j)
    {
        for(int i = 0; i &lt; M; ++i)
        {
            int p = j*M+i;
            r0[p] = buf[p].x;
            i0[p] = buf[p].y;
        }
    }
}

// Do nothing for non-complex cases
template &lt;&gt; void MatToCula(float* buf, const mxArray* src) {}
template &lt;&gt; void MatToCula(double* buf, const mxArray* src) {}
template &lt;&gt; void CulaToMat(mxArray* src, float* buf) {}
template &lt;&gt; void CulaToMat(mxArray* src, double* buf) {}

#endif //__CULAMEX_HPP__
</pre>
<p>Now, that we have our generic helper functions, we can start writing some code that acts just like MATLAB's svd function.  Upon examining the documentation for <a href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/svd.html">MATLAB's svd</a> and <a href="http://www.culatools.com/html_api/#gesvd">CULA's xGESVD</a> function, you'll notice two major differences.</p>
<ol>
<li>MATLAB returns the singular values in diagonal array whereas CULA returns a vector</li>
<li>MATLAB return V whereas CULA return V' (transposed)</li>
</ol>
<p>In order to preserve MATLAB's interface, we'll have to do some extra house keeping to convert from the singular value vector into to a diagonal matrix and also perform a transpose (or for complex data, conjugate transpose) on VT.</p>
<pre class="brush: cpp;">
// culaSvd.cpp
// Implements CULA accelerated SVD to be called within MATLAB

#include &lt;algorithm&gt;

#include &quot;mex.h&quot;
#include &quot;culapack.hpp&quot;
#include &quot;culamex.hpp&quot;

using std::max;
using std::min;

// Complex conjugation of complex data
template&lt;class T&gt; void Conjugate(T* a) { /* Do nothing */ };
template&lt;&gt; void Conjugate(culaFloatComplex* a) { a-&gt;y = -(a-&gt;y); }
template&lt;&gt; void Conjugate(culaDoubleComplex* a) { a-&gt;y = -(a-&gt;y); }

template &lt;typename T&gt;
void mexCulaGesvd(int nlhs,           /* number of expected outputs */
              mxArray* plhs[],        /* output pointer array */
              int nrhs,               /* number of inputs */
              const mxArray* prhs[],  /* input pointer array */
              mxClassID id,
              mxComplexity complexity)
{
    // Initialize flags and types
    typedef typename ToReal&lt;T&gt;::type RealType;
    bool isReal = (complexity == mxREAL);
    bool isComplex = (complexity == mxCOMPLEX);

    // Initialize sizes
    int M = (int) mxGetM(prhs[0]);
    int N = (int) mxGetN(prhs[0]);
    int K = min(M,N);
    int L = max(M,N);

    // Allocate a temporary to not destroy input data
    T* A = (T*) mxMalloc( M * N * sizeof(T) );

    if (isReal)
    {
        // Copy input data directly into temporary
        memcpy( A, mxGetPr( prhs[0] ), M * N * sizeof(T) );
    }
    else if (isComplex)
    {
        // If complex, convert from MATLAB format into CULA format
        MatToCula( A, prhs[0] );
    }

    // Create MATLAB output matrices
    plhs[0] = mxCreateNumericMatrix(M, M, id, complexity);  // U (M x M)
    plhs[1] = mxCreateNumericMatrix(M, N, id, 0);           // S (M x N, Real)
    plhs[2] = mxCreateNumericMatrix(N, N, id, complexity);  // V (N x N)

    // Allocate CULA intermediate
    RealType* SVEC = (RealType*) mxMalloc( K * sizeof(RealType) );

    // CULA Memory Pointers
    T* U;
    T* VT;

    if (isReal)
    {
        // Get CULA memory pointers from allocated MATLAB matrices
        U = (T*) mxGetPr( plhs[0] );
        VT = (T*) mxGetPr( plhs[2] );
    }
    else if (isComplex)
    {
        // If complex, allocate an AoS complex buffer for CULA
        U = (T*) mxMalloc( M * M * sizeof(T) );
        VT = (T*) mxMalloc( N * N * sizeof(T) );
    }

    // Initialize CULA
    culaInitialize();

    // CULA SVD Factorization
    culaGesvd('A', 'A', M, N, A, M, SVEC, U, M, VT, N);

    // Shutdown CULA
    culaShutdown();

    // Get pointer to output matrix, S
    RealType* S = (RealType*) mxGetPr( plhs[1] );

    // Copy SVEC to diagonal of S
    for (int i=0; i&lt;K; i++)
        S[i*M+i] = SVEC[i];

    // Inplace transpose of VT
    for (int i=0; i&lt;N; i++)
    {
        for (int j=i; j&lt;N; j++)
        {
            T temp = VT[j+i*N];
            VT[j+i*N] = VT[i+j*N];
            VT[i+j*N] = temp;
        }
    }

    // If complex, conjugate VT
    if (isComplex)
        for (int i=0; i&lt;N; i++)
            for (int j=0; j&lt;N; j++)
                Conjugate( &amp;VT[j+i*N] );

    if (isComplex)
    {
        // If complex, convert from CULA format into MATLAB format
        CulaToMat( plhs[0], U );
        CulaToMat( plhs[2], VT );

        // Free MATLAB buffers
        mxFree(U);
        mxFree(VT);
    }

    // Free allocate data
    mxFree(A);
    mxFree(SVEC);
}

// MATLAB Gateway Function
void mexFunction(int nlhs,              /* number of expected outputs */
                 mxArray* plhs[],       /* output pointer array */
                 int nrhs,              /* number of inputs */
                 const mxArray* prhs[]  /* input pointer array */ )
{
    // See Part 1
    // mexCulaGesvd(...)
}
</pre>
<p>When combined with the gateway function introduced in part one, you should now have enough code to compile a working CULA accelerated, MATLAB callable SVD routine that support all four major datatypes. In the next part of this series, we'll discuss how to error check, compile, and use this routine in MATLAB.</p>
<p>While I've tried to keep the comments in the code quite verbose, if you have any questions regarding this code please ask in the comments section or in our user forums.</p>
<p><strong>More Information:</strong><br />
CULA Programmers Guide: <a href="http://www.culatools.com/html_guide/">http://www.culatools.com/html_guide/</a><br />
MATLAB MEX-file Guide: <a href="http://www.mathworks.com/support/tech-notes/1600/1605.html">http://www.mathworks.com/support/tech-notes/1600/1605.html</a><br />
C++ Templates: <a href="http://en.wikipedia.org/wiki/Template_(programming)">http://en.wikipedia.org/wiki/Template_(programming)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/07/21/using-cula-in-matlab-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using CULA in MATLAB, Part 1</title>
		<link>http://www.culatools.com/blog/2010/07/14/using-cula-in-matlab-part-1/</link>
		<comments>http://www.culatools.com/blog/2010/07/14/using-cula-in-matlab-part-1/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 16:08:22 +0000</pubDate>
		<dc:creator>Kyle</dc:creator>
				<category><![CDATA[MATLAB Integration]]></category>
		<category><![CDATA[Code Samples]]></category>
		<category><![CDATA[MATLAB]]></category>
		<category><![CDATA[Singular Value Decomposition]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=873</guid>
		<description><![CDATA[It is possible to use CULA to accelerate a large number of popular linear algebra routines in MATLAB. With some relatively simple interface code, it's easy to seamlessly accelerate popular MATLAB routines such as: qr, lu, svd, eig, and mldivide. When writing your own MATLAB code that uses an external library such as CULA, there [...]]]></description>
			<content:encoded><![CDATA[<p>It is possible to use CULA to accelerate a large number of popular linear algebra routines in MATLAB. With some relatively simple interface code, it's easy to seamlessly accelerate popular MATLAB routines such as: qr, lu, svd, eig, and mldivide.</p>
<p>When writing your own MATLAB code that uses an external library such as CULA, there are a few common hurdles to overcome. In the next few blog posts, we'll introduce these problems and discuss how to solve them. By the end of the series, we'll have enough code to implement a CULA accelerated Singular Value Decomposition (SVD), culasvd, that matches the functionality of MATLAB's svd routine.</p>
<p>Part 1 of this series will explain how to handle multiple CULA data types in MATLAB. Part 2 will detail some tips and tricks on interface wrapping. Finally, Part 3 will show how to compile, link, and benchmark CULA within MATLAB.</p>
<p>Excluding sparse, there are four main data types supported by MATLAB and CULA: single, double, single complex, and double complex. When writing a CULA accelerated MATLAB routine, your function should be able to support all of these types invisibly to the end user. Since both the MATLAB MEX compiler and CULA support C++ templates, we can easily implement all four datatypes through a single code path.</p>
<p>The following code snippet shows how to use C++ templates to handle all four MATLAB types. We use the MATLAB functions, mxGetClassID() and mxIsComplex(), to determine the precision and complexity of the input matrix and then switch into an explicitly typed CULA wrapper.</p>
<pre class="brush: cpp;">
// culaSvd.cpp
// Implements CULA accelerated SVD to be called within MATLAB

#include &quot;mex.h&quot;
#include &quot;culapack.hpp&quot;

// Templated wrapper
template &lt;typename T&gt;
void mexCulaGesvd(int nlhs,           /* number of expected outputs */
              mxArray* plhs[],        /* output pointer array */
              int nrhs,               /* number of inputs */
              const mxArray* prhs[],  /* input pointer array */
              mxClassID id,
              mxComplexity complexity)
{
    // Function core, details in &quot;Using CULA in MATLAB, Part 2&quot;
    // culaGesvd(...)
}

// MATLAB Gateway Function
void mexFunction(int nlhs,              /* number of expected outputs */
                 mxArray* plhs[],       /* output pointer array */
                 int nrhs,              /* number of inputs */
                 const mxArray* prhs[]  /* input pointer array */ )
{
    // We only support a full SVD in this example
    if(nrhs != 1)
        mexErrMsgTxt(&quot;culasvd: Must have 1 input argument [X]&quot;);
    if(nlhs != 3)
        mexErrMsgTxt(&quot;culasvd: Must have 3 output arguments [U,S,V]&quot;);

    // Get precision (single or double)
    mxClassID classID = mxGetClassID(prhs[0]);

    // Get complexity (real or complex)
    mxComplexity complexity = mxIsComplex(prhs[0]) ? mxCOMPLEX : mxREAL;

    // Switch based on data type
    if ( classID == mxSINGLE_CLASS &amp;&amp; complexity == mxREAL)
        mexCulaGesvd&lt;culaFloat&gt;(nlhs, plhs, nrhs, prhs, classID, complexity);
    else if (classID == mxDOUBLE_CLASS &amp;&amp; complexity == mxREAL )
        mexCulaGesvd&lt;culaDouble&gt;(nlhs, plhs, nrhs, prhs, classID, complexity);
    else if ( classID == mxSINGLE_CLASS &amp;&amp; complexity == mxCOMPLEX )
        mexCulaGesvd&lt;culaFloatComplex&gt;(nlhs, plhs, nrhs, prhs, classID, complexity);
    else if ( classID == mxDOUBLE_CLASS &amp;&amp; complexity == mxCOMPLEX )
        mexCulaGesvd&lt;culaDoubleComplex&gt;(nlhs, plhs, nrhs, prhs, classID, complexity);
    else
        mexErrMsgTxt(&quot;culasvd: Unknown or unsupported data type&quot;);
}
</pre>
<p>As a quick overview, this file includes the 3 main components:</p>
<p>1) The CULA header for our SVD function and the MATLAB header for the MATLAB data types and API<br />
2) A templated function that will support all four MATLAB data types<br />
3) The MATLAB gateway function, 'mexFunction', required by all C/C++ programs called by MATLAB</p>
<p>These are the three main components needed by any MATLAB wrapper for CULA.  As you can see, we really haven't gotten into any SVD specific code with the exception of the error checking.</p>
<p>Check back soon for Part 2 where we'll go into more details about how to wrap MATLAB's matrix data into CULA's SVD function.  If you have any questions, feel free to ask in a comment here or on our forums.</p>
<p><strong>More Information:</strong><br />
CULA Programmers Guide: <a href="http://www.culatools.com/html_guide/">http://www.culatools.com/html_guide/</a><br />
MATLAB MEX-file Guide: <a href="http://www.mathworks.com/support/tech-notes/1600/1605.html">http://www.mathworks.com/support/tech-notes/1600/1605.html</a><br />
C++ Templates: <a href="http://en.wikipedia.org/wiki/Template_(programming)">http://en.wikipedia.org/wiki/Template_(programming)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/07/14/using-cula-in-matlab-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Generic Programming</title>
		<link>http://www.culatools.com/blog/2010/07/14/generic-programming-2/</link>
		<comments>http://www.culatools.com/blog/2010/07/14/generic-programming-2/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 16:02:42 +0000</pubDate>
		<dc:creator>CULA Dev Team</dc:creator>
				<category><![CDATA[CULA Design Principals]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Coding Tips]]></category>
		<category><![CDATA[Template]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=903</guid>
		<description><![CDATA[When we first announced the series on the engineering philosophy behind CULA, we said that code is at the heart of a software product, but that there many other factors to consider when creating software.  Since we've covered many of these other factors since then, it's time to actually talk about some code related issues.  [...]]]></description>
			<content:encoded><![CDATA[<p>When we first announced the series on the engineering philosophy behind CULA, we said that code is at the heart of a software product, but that there many other factors to consider when creating software.  Since we've covered many of these other factors since then, it's time to actually talk about some code related issues.  For the next few posts, we'll be talking specifically about the programming practices we follow when writing code for CULA.</p>
<p>One of the most important techniques we use is called Generic Programming.  Generic programming is a facility provided by some programming languages that allows you to avoid duplicating code.  For those of your familiar with C++, you might know this feature as 'templates'.  Rather than talking about it further, I think an example would show this concept most clearly.</p>
<p>Without generic programming techniques, you might find code that looks like this:</p>
<pre class="brush: cpp;">
function addFloat(float x, float y) { return x + y; }
function addDouble(double x, double y) { return x + y; }
function addInt(int x, int y) { return x + y; }
function addLong(long x, long y) { return x + y; }
function addLongLong(long long x, long long y) { return x + y; }
</pre>
<p>Using generic programming, you might instead choose to program like this:</p>
<pre class="brush: cpp;">template &lt;typename T&gt;
function add (T x, T y) { return x + y; }
</pre>
<p>Which group of code do you prefer?  Each function is virtually identical, with the only exception that the data type is different.  Although these contrived examples are short, you can imagine just much code might be duplicated for routines that stretch into thousands of lines.</p>
<p>Why is duplicate code such a problem?  In short, when writing code, any time you want to make a change to a function you have to make these changes for as many different variants of a function as you have.  If you have four variants, that is four variants to edit.  That's four times as much work and four opportunities to introduce errors.  I can't count the number of times I've made a mistake while these kinds of edits.  Worst of all, errors introduced in this way can be especially hard to debug because they could introduce small and hard-to-find differences in behavior between the variants.  (Hopefully you've implemented a good revision control strategy to help you if you find yourself in this situation ... maybe we'll talk revision control in a later post).</p>
<p>While the functions above are completely contrived, my example of four different variants isn't just made up; it's the actual complexity of a package like CULA.  For those of you familiar with CULA's data types, there are four to consider: single, double, single-complex, and double-complex.  That's four different variants of functions to handle (e.g. SGETRF, DGETRF, CGETRF, ZGETRF).  While there are in fact differences between these routines, they are typically very few compared to the length of the code.  For example, here is a visual diff between two variants of the GETRF function in the Netlib LAPACK package.  Although these files are actually 5000 characters each, they only differ by 50 characters, or 1%.</p>
<p style="text-align: center;"><a href="http://www.culatools.com/wp-content/uploads/2010/07/trf_diff.png"><img class="size-medium wp-image-881 aligncenter" title="trf_diff" src="http://www.culatools.com/wp-content/uploads/2010/07/trf_diff-300x280.png" alt="" width="300" height="280" /></a></p>
<p>You'd be surprised with how many people in industry still program like this.  Many of them do so because they first learned to program in a language without generic programming support and have never learned what generic programming is or how to use it.  Others have an awareness of generic programming but have misconceptions as to its utility or its effects on your compiled code.  Or have you ever looked through some code to see two separate linked list implementations to hold pointers to two different structs?</p>
<p>When designing CULA, we use generic programming to its full potential.  The cool thing about mathy code in different precisions is that there are rarely differences between single/double cases, and even the real/complex differeces are sometimes small or absent.  We even write our CUDA kernels this way  Although at the surface it appears as if we implement different routines (culaSgetrf, culaDgetrf, culaCgetrf, culaZgetrf), each of our routines is actually designed in a generic way under the hood.  This allows us to better manage our code and also has the benefit of allowing us to rapidly implement each variant in parallel.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/07/14/generic-programming-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CULA Tools Website Redesign</title>
		<link>http://www.culatools.com/blog/2010/06/22/cula-tools-website-redesign/</link>
		<comments>http://www.culatools.com/blog/2010/06/22/cula-tools-website-redesign/#comments</comments>
		<pubDate>Tue, 22 Jun 2010 20:56:35 +0000</pubDate>
		<dc:creator>CULA Dev Team</dc:creator>
				<category><![CDATA[Release Notes & News]]></category>

		<guid isPermaLink="false">http://www.culatools.com/?p=792</guid>
		<description><![CDATA[Greetings CULA users!  I'm sure you've noticed our recently revamped webpage to correspond with the soon-to-be-released CULA 2.0. Let me take a minute to run you through the new features: 1) New forum software. We've changed our forum software to the tried-and-tested phpBB3. These widely used forums are much more powerful than our previous offering. They [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings CULA users!  I'm sure you've noticed our recently revamped webpage to correspond with the soon-to-be-released CULA 2.0. Let me take a minute to run you through the new features:</p>
<p>1) <strong>New forum software.</strong> We've changed our forum software to the tried-and-tested phpBB3. These widely used forums are much more powerful than our previous offering. They have much better support for RSS feeds, private messaging, file attachment, user management, search engine indexing, and code snippet support. We feel that they'll improve the user experience of interacting directly with the CULA team. Additionally, we have opened a new forum known as the "Private Support" forum. This area is designed as a support section for CULA Premium users. Here, your posts are only visible to yourself and the developers so you can ask questions without being visible to the entire internet.</p>
<p>2) <strong>New blog software. </strong>We've also upgraded our blog software to the incredibly versatile WordPress. This popular blogging software features a vastly improved comment system, post tags, and blog-to-blog communication. Expect many more blog updates in the near future!</p>
<p>3) <strong>New membership software.</strong> We are using new software to manage CULA Premium and Academic subscriptions. This new software should streamline the sign-up, renewal, and download process. Additionally, we've upgraded to an encrypted web server to take accept secure credit card payments without leaving our webpage.</p>
<p>4) <strong>Updated content. </strong>We've begun to update a number of the content pages here as well. For example, the <a href="/features/performance/">performance page</a>, now includes updated Fermi performance using CULA 2.0. We'll be adding a lot more information in the following weeks as well.</p>
<p>Thanks for reading. Please take a few minutes to browse the new webpage and let us know what you think!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.culatools.com/blog/2010/06/22/cula-tools-website-redesign/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
