Multi-GPU CULA

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

Multi-GPU CULA

Postby wgomez » Thu Jul 08, 2010 11:52 am

We've been trying to get a multi-GPU program working but have been running into inconsistent errors with CULA. We run into problems if we try to run the program with 6 or 10 threads to create. It consistently completes if asked to do 5 threads, but will crash on the fifth or sixth thread when asked to do 6 threads. For some reason it also completes fine when asked to do 9 threads, but crashes with 10. The error is a culaRuntimeError with error code 17.

We are using CULA 1.2 and calling syev on the device. We've tried moving data around, switching to the host interface, and moving our calls to culaInitialize() and culaShutdown(). Our machine is using 4 Tesla C1060s. We also checked that the input data is correct.

Since we got the most stability when calling the initialization and shutdown immediately before and after the culaDeviceDsyev function call, we were wondering what the best practices for using the initialization and shutdown functions are. When and where should they be used? We found that if a thread called the shutdown function during another thread's run, the thread would return a culaNotInitialized error. Shouldn't the calls be thread specific?

Any help would be appreciated. Really frustrated at this point.
wgomez
 
Posts: 3
Joined: Thu Jul 08, 2010 11:35 am

Re: Multi-GPU CULA

Postby kyle » Thu Jul 08, 2010 1:13 pm

Have you tried the multi-gpu example provided with CULA? By default it's configured to launch one thread per device. You can easily configure it to run multiple threads per device though; just change line 99 to have a scalar like 1.5 or 2 threads per GPU.

Here is the results of me running 5 threads on 2 devices.

Code: Select all
Found 2 devices, will launch 5 threads

Thread 0 - Launched
Thread 0 - Binding to device 0
Thread 1 - Launched
Thread 1 - Binding to device 1
Thread 2 - Launched
Thread 2 - Binding to device 0
Thread 4 - Launched
Thread 4 - Binding to device 0
Thread 3 - Launched
Thread 3 - Binding to device 1
Thread 0 - Allocating matrices
Thread 0 - Initializing CULA
Thread 2 - Allocating matrices
Thread 2 - Initializing CULA
Thread 1 - Allocating matrices
Thread 1 - Initializing CULA
Thread 3 - Allocating matrices
Thread 4 - Allocating matrices
Thread 3 - Initializing CULA
Thread 4 - Initializing CULA
Thread 4 - Calling culaSgeqrf
Thread 2 - Calling culaSgeqrf
Thread 1 - Calling culaSgeqrf
Thread 3 - Calling culaSgeqrf
Thread 0 - Calling culaSgeqrf
Thread 3 - Shutting down CULA
Thread 1 - Shutting down CULA
Thread 2 - Shutting down CULA
Thread 4 - Shutting down CULA
Thread 0 - Shutting down CULA


Your assumptions about the calls being thread specific are correct; you should have to "initialize --> bind --> call --> shutdown" from every thread.

We are trying to replicate your "culaNotInitialized" error at this point. I'll let you know if we find anymore information out.
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Multi-GPU CULA

Postby kyle » Thu Jul 08, 2010 3:09 pm

We have narrowed down the error to a context management bug in culaShutdown(). A fix is in the works, but in the mean time you can most likely ignore culaShutdown() without problem.
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Multi-GPU CULA

Postby john » Fri Jul 09, 2010 9:04 am

kyle wrote:We have narrowed down the error to a context management bug in culaShutdown(). A fix is in the works, but in the mean time you can most likely ignore culaShutdown() without problem.

Indeed, you'll leak a small amount of memory this way, but unless the program is running continuously (and constantly launching / killing CULA threads) this shouldn't be an issue for now. Next release will correct culaShutdown.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Multi-GPU CULA

Postby wgomez » Fri Jul 09, 2010 12:23 pm

Thanks for the replies.

We got our code to work without errors for multiple runs. We ended up adding a mutex that made sure that only one thread was using CULA at a time, and it hasn't crashed since. Every thread that is about to call a CULA function locks the mutex, initializes CULA, calls CULA, shuts down CULA, and then frees the mutex. Not a big fan of the solution, but for now it works.

We tried taking out the culaShutdown call at one point, but that didn't fix our particular problem. Unfortunately, we are planning on consistently creating and shutting down threads that will use CULA, so just ignoring the shutdown call won't work.

One more question, is there a reason that a call to culaDeviceDsyev() would spawn several extra threads? It appears to spawn 4 threads the first time it is called (whatever thread gets there first), and then 3 threads the first time a subsequent thread calls it. I don't think it's causing a problem, but I was wondering why those threads are appearing.
wgomez
 
Posts: 3
Joined: Thu Jul 08, 2010 11:35 am

Re: Multi-GPU CULA

Postby john » Mon Jul 12, 2010 8:46 am

Thank you for the reply and thank you also for the input. It has been quite valuable. We believe that we have fixed the issue sufficiently and that in the next CULA version you will not need to apply such workarounds. In future releases, each thread should call culaInitialize/culaShutdown as appropriate and you will not see CUBLAS errors. Please keep in mind that CULA uses CUBLAS internally, so cublasShutdown should not be called if you have any upcoming CULA calls.

For the DSYEV question, please keep in mind that CULA is a hybrid CPU/GPU library and as such we use both the CPU and GPU for certain portions of the code. In the case of a multicore CPU, we will also attempt to use as many cores as necessary. That extra thread you have observed is likely to be for one-time bookkeeping and allocation.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Multi-GPU CULA

Postby wgomez » Mon Jul 12, 2010 10:15 am

We're glad we could help improve CULA.

Just to be complete, our solution to the problem still throws a cudaError sometime during each culaInitialize() call. The call itself returns culaNoError and our program functions correctly, though, so we are moving forward.

Thanks for the reply about our DSYEV question. We expected it to have to do with having a multicore CPU, but we weren't sure if they were meant to stick around until the entire program completes.
wgomez
 
Posts: 3
Joined: Thu Jul 08, 2010 11:35 am

Re: Multi-GPU CULA

Postby dan » Tue Jul 13, 2010 7:40 am

Hi wgomez,

With regard to the exception being thrown, this is perfectly fine because no exception will ever propagate beyond the API boundary. Microsoft Visual Studio does list every exception that it sees (ever if it's not in your code) and this is likely what you're observing. This is actually expected behavior -- it's actually the CUDA runtime that is throwing the exception, not us.

Thanks for your input,

Dan
dan
Administrator
 
Posts: 61
Joined: Thu Jul 23, 2009 2:29 pm

Re: Multi-GPU CULA

Postby john » Tue Jul 13, 2010 8:31 am

Just a small comment to add to Dan's post, which was that we once looked into this exception because we noticed it just as you did. It seems that it's thrown and handled several times by CUDA during normal operations. I'd only worry if one was unhandled, but that won't occur from CULA.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm


Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 3 guests

cron