1. General

What is CULA Dense?

CULA Dense is EM Photonics' GPU-accelerated numerical linear algebra library that contains a growing list of dense linear algebra functions.

Why GPUs?

Modern GPUs can process more than graphics. The massively parallel architectures of graphics processors have the capacity to run computationally intensive, general-purpose software. With proper implementation, many applications can see large speed-ups when ported to the GPU.

Which GPUs does CULA run on?

CULA requires a CUDA-enabled NVIDIA GPU.  Additionally, the entire NVIDIA Tesla line supports CULA as well as a number of the newer NVIDIA Quadro workstation graphics cards. The full list of supported devices can be found here.

What is CUDA?

CUDA is NVIDIA’s framework for using their GPUs as general purpose computing devices. More information can be found in NVIDIA’s CUDA FAQ.

How fast is CULA?

The actual speed-up depends heavily on the algorithm, the size of your data set, and what you are benchmarking against. For detailed performance results, see this chart.

How accurate are the results?

Comparing results from linear algebra routines can be a non-trivial task. Different algorithms need to be validated in different fashions. However, when comparing various norms, residuals, and reconstructions, we typically see results that are accurate to machine precision.

I don’t know anything about GPU programming. Can CULA help me?

The CULA interface was designed with simplicity in mind. There is no need for the user to manage GPU memory or know any specifics about GPU programming. Simply call a CULA function and the framework will do all the GPU management for you. The allocations, workspace creations, and memory transfers are all taken care of under the hood.

Who developed CULA?

CULA was developed by EM Photonics in a partnership with NVIDIA. More information about the CULA Team can be found on the about us page.

3. Technical

Which operating systems are supported?

We have installers available for 32-bit and 64-bit versions of Windows and Linux. Our development team has tested CULA on 32-bit and 64-bit versions of the following operating systems:

  • Windows XP
  • Windows Vista
  • Windows 7
  • Ubuntu 10.10
  • Red Hat Enterprise Linux 5.3
  • Fedora 11
  • Mac OS X

Is CULA's source code available?

No. Our installers only provide pre-compiled libraries and the headers needed to develop a CULA application.

Where can I find out more about how to program with CULA?

A programmer's guide is included in the CULA distribution. For an online version of this guide, see the CULA Programmer's Guide.

Where can I find a function reference?

A full reference manual is included in the CULA distribution. For an online version of this guide, see the CULA Reference Manual.

What tools were used to create CULA?

CULA was developed using components from NVIDIA's CUDA and CUBLAS libraries, elements from Intel's MKL, and is based on algorithms from the original Netlib LAPACK implementation.

What is LAPACK?

LAPACK stands for Linear Algebra PACKage. It is an industry standard computational library that has been in development for over 15 years and provides a large number of routines for factorization, decomposition, system solvers, and eigenvalue problems.

Is double precision supported?

Any NVIDIA device since the 9000-series will support double precision. Previous architecture generations only support single precision. The full list of GPUs that support double precision can be found in the CUDA Programming Guide.

Are complex and double-complex data types supported?

Yes, however only GPUs that support double-precision can work with double-precision complex data.

How does CULA store data?

In order to maintain compatibility with the original FORTRAN interface of LAPACK, all CULA functions use column-major data. For more information about this, see the matrix storage section of the CULA Programmer's Guide.

Why do CULA functions take pointers to host memory instead of GPU memory?

For the vast majority of CULA functions, memory management accounts for an insignificant portion of total processing time. Our CULA framework has an optimized memory management system that attempts to maximize memory bandwidth by properly aligning memory boundaries. More information can be found in the "Compiling with CULA" section of the CULA Programmer's Guide.

Why is the CULA interface different from other CPU-based implementations?

LAPACK, the primary library upon which CULA Dense is based, was originally designed for systems running FORTRAN over 15 years ago. Since then, many programming techniques have drastically changed and bottlenecks have shifted. For these reasons, we have made some slight changes to simplify the interface of LAPACK while adding functionality. For more details about these changes, see the "Differences Between CULA and LAPACK" section of the CULA Programmer's Guide.

How much GPU memory do I need?

The exact memory size you need depends on your algorithm, but you will typically need enough memory to hold the data set and a few extra megabytes for various workspaces. For example, a GeForce™ card with 1 GB of memory can store up to a 11k by 11k double precision or a 16k by 16k single precision matrix.

4. Troubleshooting

I’m having problems with my GPU and/or video drivers. Can you help?

Problems specific to GPU devices and their drivers should be handled by NVIDIA. The latest drivers can always be obtained directly from NVIDIA on their download page.

A routine I need is missing. What should I do?

CULA Dense is constantly growing as new functions are added. If there is function you would like to see added, contact us on our forums and voice your opinion on what the developers should work on next.

I’m having problems. Where can I get help?

We offer community-based support on our forums to all of our users. Paid users can submit issues to the private support forums to receive personalized help.

I think I found a bug. What should I do?

First, be sure that you are experiencing a software bug and not an issue related to faulty hardware or drivers. If you are still reasonably sure you have encountered a bug, please post on our forums with a description of the problem, a snippet of code demonstrating the error, and information about the system where the bug occurred.

5. Other Routines

I’d like to accelerate something not contained in CULA Dense to run on GPUs. Can you help me?

The CULA Team has experience in optimizing and implementing a wide spectrum of algorithms using GPU-acceleration. Visit our consulting page to see how we can help you.