15Sep/10Off

Performance Over Time

by Dan

One of the challenges of creating a performance-sensitive library is ensuring that as you make changes to the library, the performance, correctness, and accuracy of your library does not suffer. In large and complex systems with many dependencies, it is especially challenging to ensure that a change that helps one module does not hurt another.

To solve this problem, integrated into our build process is a system to capture performance of our solvers over time. Whenever we make a change, we record the performance of each and every test and store the results in a database. Alongside each test we record problem size, job codes and parameters, source code revision, tool versions, machine name, functions that failed, and many more criteria.

We then make all of this information searchable and sort-able with a simple web-app -- a simple query we can quickly recall the exact performance of any test we've ever run.

The final piece of the puzzle is that our performance tracking system compares the results of an individul test run with the last time it ran and warns us when the performance has changed significantly. With this tool we ensure that our changes do not lead to any unexpected performance regressions.

To solve this problem, integrated into our build process is a system to capture performance of our solvers over time. Whenever we make a change, and run through our tests, we record the performance of each and every routine we run and store the results in a database. Alongside each test we record problem size, variant, source code revision, tool versions, machine name, functions that failed, and many more criteria. We then make all of this information searchable and sort-able with a simple web-app -- a simple query we can quickly recall the exact performance of any test we've ever run.
10Sep/10Off

CULA 2.2 Sneak Preview

by John

We will have a lot more to say about CULA 2.2 at GTC and immediately thereafter, but for the moment we wanted to share some of the latest performance on the very important DGETRF (that is: double precision LU decomposition) routine from CULA Premium.

CULA 2.2 DGETRF Preview

In this graph, the dashed gray line represents the performance of an Intel Core i7 920 chip with Intel MKL 10.2, which performs pretty consistently across the problem sizes. The blue line is the same machine, equipped with a Tesla (Fermi) C2050 card, running the new CULA 2.2 Preview. These numbers far exceed what has been seen before, even prior versions of CULA!

Expect many more speedups of this nature from CULA 2.2, which will be released in the weeks following GTC 2010. For those coming to GTC, please be sure to check out our session, number 2153.

10Sep/10Off

CULA & PGI Fortran

by John

In case you are not on their mailing list, we wanted to call attention to the excellent PGInsider newsletter. For those not familiar, The Portland Group puts out a quarterly newsletter with highly technical articles related to their software. The relevant software to the CULA effort is the PGI CUDA Fortran Compiler, which is the official compiler to support writing and compiling NVIDIA CUDA kernels directly in Fortran code.

In the September issue of the PGInsider newsletter, myself and Kyle Spagnoli authored an article about using CULA in PGI Fortran programs. CULA versions 2.1 - available now! - and greater support this feature. Coding and compilation are simple and results are immediate; try PGI with both CULA Basic and CULA Premium today!