CULA R10 Versus MAGMA 1.0 (Part 2)

by John

In the previous post, we took the time to describe the performance of the state of the art in GPU-assisted linear algebra computations. While performance is a huge motivating factor for the adoption of GPU code, there is also a lot to be said for the usability and capabilities of that library. We will take this post to highlight some of our favorite features.

Equally important to speed is the question "are all my routines supported?" - a critical first question when evaluating an alternative library. Counting precision variants, our present routine roster is at 158 LAPACK routines and 34 BLAS routines (see here for info on our BLAS system.) In comparison, MAGMA has roughly 100 routines (ignoring non-LAPACK variants) and not all of them have both CPU and GPU interfaces, which all CULA routines do. This is a point of pride for us; we want to provide a consistent and confusion-free experience across all platforms, all interfaces, and across as many languages as possible.

Speaking of interfaces, we provide many interfaces into CULA in order to best match as many programming styles as possible. We have the basic bindings in C that most libraries support, and also do type overloaded calls in the C++ headers - and both of these have host memory and device memory interfaces too. We have Fortran bindings too for gfortran, Intel Fortran, and PGI Fortran. We have a Bridge interface that is a very low effort interface to quickly try out CULA's host interface for ALL of the supported LAPACK and BLAS calls in your whole program! For all the Matlab users out there, we have demonstrated how to call CULA functions in your Matlab Mex routines. In comparison, Magma supports only plain C interfaces for host and device calls, so the integration effort is placed on the user.

This isn't to say we're perfect, but if you check out our forums, you will see that we make an effort to aid users in their integration, and when bugs are discovered we attempt to correct them very quickly (see, as an example, this post). We feel strongly that CULA provides the best user experience, and heartily encourage you to take it for a test drive, starting with the free CULA Basic version.

Number of Unique LAPACK Routines 100 158
Number of Unique BLAS Routines 36 34
Optimized SVD Solver
Optimized Symmetric Eigenvalue Solver
Banded Solvers
Check and Report Errors
Benchmark Program
Has Examples
Host Memory Interface Partial
Device Memory Interface Partial
Bridge Interface
Fortran Support Partial
Compiles Easily Requires edits
Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.