CULA in GPU Computing Gems 2

by John

It's been over a year in the making, but the final copy of our chapter contributed to GPU Computing Gems 2 has been submitted. The topic is a deeper look into how our routines function, with emphasis on the ever-popular LU decomposition. The book isn't due out yet for a while longer, but we hope you'll enjoy the article when the book is released. The best part is that we have permission to post the chapter here on our website at that time!



by Kyle

EM Photonics and a few members of the CULA team will be attending SPIE's Defense, Security, & Sensing (DSS) conference next week in Orlando, Florida.  In addition to a booth in the exhibit hall, we'll be presenting a number of papers including one detailing the latest work involving our sparse linear algebra solvers.  If you are attending the conference, please stop by our booth or visit one of our talks!

Here is the abstract for our sparse linear algebra talk:

The modern graphics processing unit (GPU) found in many standard personal computers is a highly parallel math processor capable of over 1 TFLOPS of peak computational throughput at a cost similar to a high-end CPU with excellent FLOPS-to-watt ratio.  High-level sparse linear algebra operations are computationally intense, often requiring large amounts of parallel operations and would seem a natural fit for the processing power of the GPU.  Our work is on a GPU accelerated implementation of sparse linear algebra routines.  We present results from both direct and iterative sparse system solvers.

The GPU execution model featured by NVIDIA GPUs based on CUDA demands very strong parallelism, requiring between hundreds and thousands of simultaneous operations to achieve high performance.  Some constructs from linear algebra map extremely well to the GPU and others map poorly.  CPUs, on the other hand, do well at smaller order parallelism and perform acceptably during low-parallelism code segments.  Our work addresses this via hybrid a processing model, in which the CPU and GPU work simultaneously to produce results.  In many cases, this is accomplished by allowing each platform to do the work it performs most naturally.  For example, the CPU is responsible for graph theory portion of the direct solvers while the GPU simultaneously performs the low level linear algebra routines.

We'll also be presenting and demonstrating work from our image processing and fluid dynamics teams.


CUDA 4.0 RC2 Status Update

by Dan

Last week NVIDIA released their second release candidate for CUDA 4.0. As soon as they did we got our builders up and running on the new platform. Just like RC1, this version passed all of our tests with flying colors. What this means is that immediately after CUDA 4.0 final is released, you can look forward to having a new version of CULA available to use it with.

One thing to note in the new release is that NVIDIA has dropped support for Red Hat Enterprise Linux 4.8 in 32-bit builds. As such, we will no longer be providing updates for this system. If you are on a 64-bit RHEL 4.8 platform, we will still provide a build for this system as long as NVIDIA does.