Distribution is Engineering Too

by Dan

In the last post I announced a series on CULA's engineering philosophy. And the first post in our series on engineering philosophy is. . .distribution? That doesn't seem very engineering oriented. Why would we start here?

We choose to lead off our series with distribution for a variety of reasons. For starters, without distribution, there would be no product. Well, there might be a product, but you wouldn't be able to get it. Which I'd say is as good as no product at all. Secondly, although the details of our distribution policies are guided by business decisions, the implementation of these policies turns out to be a very technical task. Recently, these technical tasks came up front and center when we had a problem with the distribution of one CULA 1.3's packages. We wanted to use this time to explain how this problem came about and what we did to fix it, not because it's necessarily relevant to CULA or GPUs or linear algebra, but just because we found it interesting.

So here we are, starting off with a discussion on distribution. When we released CULA, we split our capability set into Basic and Premium versions, and we needed a distribution mechanism that could handle these differences. The distribution mechanism required many different parts, including a user database, purchase tracking system, and download manager, all of which needed to interoperate in a manageable way. We used many different off-the-shelf components when building our system, but wrote custom code to tie these components together. Which brings us back to our CULA 1.3 release, where a problem in the way that these different parts were interoperating prevented users from downloading our 64-bit Linux package.

We never had a download problem before and this one seemed to come out of nowhere. As we began to unravel the long chain of tools between the download manager and the user, including our CMS, PHP, Apache, and our webhost's servers/configuration, we became increasingly convinced that this problem was going to be very difficult to diagnose and fix. After many hours of debugging and several calls to our hosting provider's tech support, the bottom line was that between the download manager and our CMS, a set of incompatible HTTP headers were issued that caused some security component (in Apache ... we think) to rewrite the download length to zero, preventing a user from successfully downloading.

Oddly, this error only happened if a file's size was greater than 16 MB. We found that this problem was unique to the download manager we had chosen, but unfortunately also found that the problem wasn't fixable in the code for that module. Given this, our options were to integrate a different download manager, to make edits in our CMS (making it hard to update with new security fixes later), or to roll our own. We choose to roll our own, and in doing so we cut out several of the extra layers of complexity, fixed the download problem, and ended up with a design that we think is better than the one we started with.

So, I guess that leads to one more technical aspect of distribution. If you're in a small but growing company, like us, you might find that despite your mastery of computer architecture and GPU programming, you'll be doing web programming from time to time. We each wear many hats, but we always bring a technical perspective to whatever we do, including this blog.


A Series on CULA’s Engineering Philosophy

by Dan

Over the past few weeks we've had a number of exciting announcements.  Among them have been the release of CULA 1.3, the first published performance results of Fermi, and the unveiling of our CUDA training program.  We'd like to continue along this path by announcing a series of blog posts on the engineering philosophy behind CULA.

There is much more to a software product than just code.  Although the code is at the heart of most software projects, a software developer must also consider build systems, revision control, testing, documentation, quality assurance, support, and distribution mechanisms. Typically, a user only sees a small portion of this overall effort, but the lack of strong processes will greatly impact the end product that a user does see.

With CULA, our goal is to create the best performing linear algebra library that is also easy-to-use and available on as many platforms and systems as is possible.  It is for these reasons that we apply a rigorous software engineering philosophy to our CULA library. Over the next few weeks we'll be talking about several of the development practices and systems we have in place with specific examples of how these processes impact the development of CULA.   Check back soon for the first entry in this exciting series.


Initial Fermi Performance

by CULA Dev Team

Hot off the heels of a 1.3a service release, we've got some brand new information on the future directions of CULA.  Today we'll be talking about Fermi, NVIDIA's next-generation GPU architecture that was announced in September at the GPU Technology Conference.  At that time, we shared our thoughts on the new and exciting performance we hoped Fermi would bring.  After 6 months of anticipation, we're very proud today to debut the first performance results for CULA running on Fermi.  To our knowledge, these results are the first published double-precision performance results for Fermi running real-world code.

As NVIDIA discussed at Fermi's unveiling, their next-generation part brings an increase in double-precision performance.  When we received our Fermi based Tesla C2050, we didn't hesitate to port CULA to the new platform.  All that was required to get CULA up and running on Fermi was to set a few compiler flags for the SM 2.0 model, upgrade our graphics driver, and make a few small code changes for the new architecture (more on this in a later post).  Once that was done, we ran through our publicly available benchmark suite to bring you the numbers you see below:

As you can see, Fermi is no slouch!  We're reporting performance gains for doubles up to 3x over the previous generation of Tesla GPUs.  It's also very important to note that these gains are achieved with no Fermi-specific optimizations added -- these are practically plug-and-play performance enhancements.  We have every expectation that with a little time and effort we can improve significantly upon these already impressive numbers.

Well, there you have it.  Fermi is here and NVIDIA has delivered considerable double-precision gains.  We'll be releasing a Fermi-enabled version of CULA very soon so check back often for the latest and greatest in GPU computation.  Until then, enjoy these graphs and get your systems prepared for CULA 2.0 and this must-have hardware upgrade.  As an aside, for those of you wondering why we haven't released a Fermi-supporting version of CULA just yet, it is important to note that there is much more to a release than just code or compiler flags, including: upgrading all of our builders to CUDA 3.0, updating packaging scripts, testing across all operating systems, etc.