CULA Dense R14 and Sparse S2 – Now Supporting CUDA 4.1

by John

We're pleased to announce the release of our latest CULA Dense and Sparse versions, with full compatibility for CUDA 4.1. A major highlight of R14 is the inclusion of a preview of multi-GPU LAPACK routines, hereby called the pCULA branch of CULA Dense. Again, this is a preview designed to show potential performance as well as an interface which will likely continue to evolve over time. The new multi-GPU routines are:

pculaGetrf (LU decomposition)
pculaGetrs (LU solve)
pculaGesv (general system solve via LU)
pculaPotrf (Cholesky decomposition)
pculaPotrs (Cholesky solve)
pculaPosv (hermitian/symmetric postive-definite system solve)
pculaTrsm (BLAS triangular system solve)
pculaGemm (BLAS general matrix multiply)

An upcoming blog post will contain more on the usage and expectations of these routines, but a simple example is quite easy to create:


pculaConfig config;
// some users may wish to tweak the default options here
// the default is to use all CUDA devices and to allow the routine
// to select the parameters it feels is best

culaStatus status = pculaPotrf(&config, m, n, A, lda);

As always, in addition to new features are bug fixes and speed/stability improvements. The full release notes for both R14 and S2 are available at the dense downloads page and the sparse downloads page, respectively.


Debugging with CULA Sparse

by Dan

CULA Sparse offers a unique debugging feature. When enabled, this feature allows you to perform extra checks on your matrix. Our recommended use case is to use debugging mode when getting started running the library or if you run into a problem. Once you have fixed any any issues you might encounter (if you encounter none, good for you!), you can switch off debugging mode to make sure you are running at full performance.

Currently, one of the most important things that debugging mode enables is a check to ensure that your matrix is well-formed. In a previous post, I discussed sparse matrix formats. CULA Sparse, being flexible, provides an indexing parameter for you to specify whether your data is one- or zero-based. It is a very common error, however, that users do not specify their index or matrix data correctly when they use the library. Debugging mode helps here because it can identify when there is a mismatch between the actual matrix data and the specified indexing.

In future revisions of CULA Sparse, there is an opportunity to introduce even more options, such as introducing a check that helps to steer you towards a good solver. For example, BiCG is intended only for symmetric matrices; if you use a non-symmetric matrix with it, you are likely to get poor performance. In a future release, we may check for this case and report to you if you are using a solver incorrectly.

We think that providing developer-oriented features and ease-of-use features are just as important as performance, although of course we provide that in spades. If you haven’t tried CULA Sparse yet, try out the demo and see how our combination or performance and ease-of-use work for you!


Not enough HPC programmers. How to fill the gap?

by Liana

Engineers with top notch parallel programming experience are highly in demand in the U.S.  This fact was recently pointed out in stories published by the mainstream Daily Beast, as well as HPC Wire. A quote from Stan Ahalt in the Daily Beast story caught my attention: “It’s not enough to keep building powerful supercomputers unless we have the brains. Think of a supercomputer as a very fast racing engine. We need more drivers to use those engines." Stan is the director of a supercomputing center at the University of North Carolina at Chapel Hill.

Programming supercomputers is hard work. Those involved in programming large HPC systems go through in-depth training and spend months (sometimes years) fine-tuning their algorithms until they are fully leveraging the massive computing power these machines offer. There is a growing number of tools and libraries for HPC programmers, but not necessarily suitable for all levels of computer engineers. For non HPC-experts, programming small to mid-scale systems can be a pretty challenging and time-consuming task, something we hear quite often from our customers and partners.

Where EM Photonics Can Make a Difference

Companies with recently installed small- to mid-scale supercomputing systems often need help porting their applications to their new machines. This is where we bring tremendous value. We are easy to engage with and offer in-depth understanding of parallel architectures. On top of parallel programming expertise, we bring knowledge and experience in physics-based modeling and simulation, image processing, life sciences, finance, military and defense applications. (Typically, the bigger the problem, the greater the fun!)

We encourage you to take a peak at our EM Photonics site to learn more about our consulting services, as well as current research projects and published papers. We have a team of talented engineers looking forward to tackling new challenges. Just let us know how we can help!