Debugging a CUDA Memory Leak

by Kyle

This week, I'd like to share a CUDA debugging story that some might find helpful or insightful.

Recently, while testing the latest nightly CULA build on our buildfarm, we noticed we were running into an "CUDA out of GPU memory" error after a large number of tests on certain GPUs. This was a new occurrence, and was happening on test sizes that should have easily fit within the memory of the GPU. Our first reaction was to believe that we had erroneously introduced a memory leak into the system. However, when running the cudaMemGetInfo() routine we saw that we still had well over 98% of our GPU memory available. Certainly more investigation needed to be done...

After some more testing, we quickly discovered that NVIDIA's CUBLAS helper routine, cublasDestroy_v2() was leaking a few KB of memory with every call. We promptly submitted a bug report to NVIDIA which was confirmed as an issue and is slated for a fix in the next CUDA release - pweh. However, that doesn't explain why we were still getting an "out of memory" error when it would take literally millions of calls to the leaky NVIDIA routine before we exhausted the memory on a Tesla C2070 with 6 GB.

After some more debugging, we determined we were running into the somewhat rare problem know as memory fragmentation where the GPU could not find a contiguous portion of memory to store the requested block of memory despite having a large amount of free memory overall.

Due to the way one of our tests was structured, we'd create a context, allocate a large chunk of memory, create another context, and then allocate another large chunk of memory. Upon freeing these resources, because of the NVIDIA leak, we had the contexts still floating around - one of which was now located in the middle of our memory space. After repeating this process a number of times, we ended up with a handful of small zombie contexts scattered about memory space. Because of this, it eventually became impossible to allocate a large chunk of contiguous memory.

As a work around, we were able to alleviate this problem by allocating the contexts when there was nothing else in the memory space. This caused the contexts to leak on the edge of memory rather than scattered in the middle.

This just goes to show that memory leaks of any size can be incredibly dangerous!  I'd also like to note that CULA users should unaffected by this NVIDIA bug if you are using the interface properly. In the rare event you are running for multiple days, initializing the library millions of times, a call to cudaDeviceReset() will free the memory caused by the bug.

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.