Page 1 of 1

Does CULA allow only one thread per GPU at any instant?

PostPosted: Mon Sep 05, 2011 3:34 am
by ahskipton
This is my first foray into GPU programming via CULA, so I apologise in advance if the answer to this is obvious...

I have a multi-threaded application that uses all 8 cores (total CPU usage is typically 95-100%) on a hyper-threaded quad-core i7. Most of the threads in the application (of which there are many) use maths kernel library BLAS/LAPACK functions, so there can be (and frequently are) many threads executing BLAS/LAPACK routines in parallel. I am interested in using the GPU to accelerate this application. My first attempt at acceleration using CULA has failed. There is a bottleneck when multiple application threads attempt to use GPU accelerated versions of the BLAS/LAPACK routines in parallel. So, I have 2 questions:

1) With CULA, is it true that only one application thread can be active on a given GPU at any one time?
2) If the answer to (1) is 'yes', is this a restriction imposed by CULA, or is it a basic architectural restriction associated with CUDA?

Re: Does CULA allow only one thread per GPU at any instant?

PostPosted: Tue Sep 06, 2011 6:18 am
by john
Your architecture doesn't really sound like it is set up for using CUDA. CUDA is a finer-grained parallelism than even threads are. Multiple threads can share a GPU, but the kernels from each of those threads are largely serialized; ie you can't allocate a fixed portion of the CUDA cores to each thread. I have to assume that these limitations will be relaxed as time goes on, especially because CUDA 4.0 made it much easier to write multithreaded CUDA programs.