Multi-GPU CULA
9 posts
• Page 1 of 1
Multi-GPU CULA
We've been trying to get a multi-GPU program working but have been running into inconsistent errors with CULA. We run into problems if we try to run the program with 6 or 10 threads to create. It consistently completes if asked to do 5 threads, but will crash on the fifth or sixth thread when asked to do 6 threads. For some reason it also completes fine when asked to do 9 threads, but crashes with 10. The error is a culaRuntimeError with error code 17.
We are using CULA 1.2 and calling syev on the device. We've tried moving data around, switching to the host interface, and moving our calls to culaInitialize() and culaShutdown(). Our machine is using 4 Tesla C1060s. We also checked that the input data is correct.
Since we got the most stability when calling the initialization and shutdown immediately before and after the culaDeviceDsyev function call, we were wondering what the best practices for using the initialization and shutdown functions are. When and where should they be used? We found that if a thread called the shutdown function during another thread's run, the thread would return a culaNotInitialized error. Shouldn't the calls be thread specific?
Any help would be appreciated. Really frustrated at this point.
We are using CULA 1.2 and calling syev on the device. We've tried moving data around, switching to the host interface, and moving our calls to culaInitialize() and culaShutdown(). Our machine is using 4 Tesla C1060s. We also checked that the input data is correct.
Since we got the most stability when calling the initialization and shutdown immediately before and after the culaDeviceDsyev function call, we were wondering what the best practices for using the initialization and shutdown functions are. When and where should they be used? We found that if a thread called the shutdown function during another thread's run, the thread would return a culaNotInitialized error. Shouldn't the calls be thread specific?
Any help would be appreciated. Really frustrated at this point.
- wgomez
- Posts: 3
- Joined: Thu Jul 08, 2010 11:35 am
Re: Multi-GPU CULA
Have you tried the multi-gpu example provided with CULA? By default it's configured to launch one thread per device. You can easily configure it to run multiple threads per device though; just change line 99 to have a scalar like 1.5 or 2 threads per GPU.
Here is the results of me running 5 threads on 2 devices.
Your assumptions about the calls being thread specific are correct; you should have to "initialize --> bind --> call --> shutdown" from every thread.
We are trying to replicate your "culaNotInitialized" error at this point. I'll let you know if we find anymore information out.
Here is the results of me running 5 threads on 2 devices.
- Code: Select all
Found 2 devices, will launch 5 threads
Thread 0 - Launched
Thread 0 - Binding to device 0
Thread 1 - Launched
Thread 1 - Binding to device 1
Thread 2 - Launched
Thread 2 - Binding to device 0
Thread 4 - Launched
Thread 4 - Binding to device 0
Thread 3 - Launched
Thread 3 - Binding to device 1
Thread 0 - Allocating matrices
Thread 0 - Initializing CULA
Thread 2 - Allocating matrices
Thread 2 - Initializing CULA
Thread 1 - Allocating matrices
Thread 1 - Initializing CULA
Thread 3 - Allocating matrices
Thread 4 - Allocating matrices
Thread 3 - Initializing CULA
Thread 4 - Initializing CULA
Thread 4 - Calling culaSgeqrf
Thread 2 - Calling culaSgeqrf
Thread 1 - Calling culaSgeqrf
Thread 3 - Calling culaSgeqrf
Thread 0 - Calling culaSgeqrf
Thread 3 - Shutting down CULA
Thread 1 - Shutting down CULA
Thread 2 - Shutting down CULA
Thread 4 - Shutting down CULA
Thread 0 - Shutting down CULA
Your assumptions about the calls being thread specific are correct; you should have to "initialize --> bind --> call --> shutdown" from every thread.
We are trying to replicate your "culaNotInitialized" error at this point. I'll let you know if we find anymore information out.
- kyle
- Administrator
- Posts: 301
- Joined: Fri Jun 12, 2009 7:47 pm
Re: Multi-GPU CULA
We have narrowed down the error to a context management bug in culaShutdown(). A fix is in the works, but in the mean time you can most likely ignore culaShutdown() without problem.
- kyle
- Administrator
- Posts: 301
- Joined: Fri Jun 12, 2009 7:47 pm
Re: Multi-GPU CULA
kyle wrote:We have narrowed down the error to a context management bug in culaShutdown(). A fix is in the works, but in the mean time you can most likely ignore culaShutdown() without problem.
Indeed, you'll leak a small amount of memory this way, but unless the program is running continuously (and constantly launching / killing CULA threads) this shouldn't be an issue for now. Next release will correct culaShutdown.
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: Multi-GPU CULA
Thanks for the replies.
We got our code to work without errors for multiple runs. We ended up adding a mutex that made sure that only one thread was using CULA at a time, and it hasn't crashed since. Every thread that is about to call a CULA function locks the mutex, initializes CULA, calls CULA, shuts down CULA, and then frees the mutex. Not a big fan of the solution, but for now it works.
We tried taking out the culaShutdown call at one point, but that didn't fix our particular problem. Unfortunately, we are planning on consistently creating and shutting down threads that will use CULA, so just ignoring the shutdown call won't work.
One more question, is there a reason that a call to culaDeviceDsyev() would spawn several extra threads? It appears to spawn 4 threads the first time it is called (whatever thread gets there first), and then 3 threads the first time a subsequent thread calls it. I don't think it's causing a problem, but I was wondering why those threads are appearing.
We got our code to work without errors for multiple runs. We ended up adding a mutex that made sure that only one thread was using CULA at a time, and it hasn't crashed since. Every thread that is about to call a CULA function locks the mutex, initializes CULA, calls CULA, shuts down CULA, and then frees the mutex. Not a big fan of the solution, but for now it works.
We tried taking out the culaShutdown call at one point, but that didn't fix our particular problem. Unfortunately, we are planning on consistently creating and shutting down threads that will use CULA, so just ignoring the shutdown call won't work.
One more question, is there a reason that a call to culaDeviceDsyev() would spawn several extra threads? It appears to spawn 4 threads the first time it is called (whatever thread gets there first), and then 3 threads the first time a subsequent thread calls it. I don't think it's causing a problem, but I was wondering why those threads are appearing.
- wgomez
- Posts: 3
- Joined: Thu Jul 08, 2010 11:35 am
Re: Multi-GPU CULA
Thank you for the reply and thank you also for the input. It has been quite valuable. We believe that we have fixed the issue sufficiently and that in the next CULA version you will not need to apply such workarounds. In future releases, each thread should call culaInitialize/culaShutdown as appropriate and you will not see CUBLAS errors. Please keep in mind that CULA uses CUBLAS internally, so cublasShutdown should not be called if you have any upcoming CULA calls.
For the DSYEV question, please keep in mind that CULA is a hybrid CPU/GPU library and as such we use both the CPU and GPU for certain portions of the code. In the case of a multicore CPU, we will also attempt to use as many cores as necessary. That extra thread you have observed is likely to be for one-time bookkeeping and allocation.
For the DSYEV question, please keep in mind that CULA is a hybrid CPU/GPU library and as such we use both the CPU and GPU for certain portions of the code. In the case of a multicore CPU, we will also attempt to use as many cores as necessary. That extra thread you have observed is likely to be for one-time bookkeeping and allocation.
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: Multi-GPU CULA
We're glad we could help improve CULA.
Just to be complete, our solution to the problem still throws a cudaError sometime during each culaInitialize() call. The call itself returns culaNoError and our program functions correctly, though, so we are moving forward.
Thanks for the reply about our DSYEV question. We expected it to have to do with having a multicore CPU, but we weren't sure if they were meant to stick around until the entire program completes.
Just to be complete, our solution to the problem still throws a cudaError sometime during each culaInitialize() call. The call itself returns culaNoError and our program functions correctly, though, so we are moving forward.
Thanks for the reply about our DSYEV question. We expected it to have to do with having a multicore CPU, but we weren't sure if they were meant to stick around until the entire program completes.
- wgomez
- Posts: 3
- Joined: Thu Jul 08, 2010 11:35 am
Re: Multi-GPU CULA
Hi wgomez,
With regard to the exception being thrown, this is perfectly fine because no exception will ever propagate beyond the API boundary. Microsoft Visual Studio does list every exception that it sees (ever if it's not in your code) and this is likely what you're observing. This is actually expected behavior -- it's actually the CUDA runtime that is throwing the exception, not us.
Thanks for your input,
Dan
With regard to the exception being thrown, this is perfectly fine because no exception will ever propagate beyond the API boundary. Microsoft Visual Studio does list every exception that it sees (ever if it's not in your code) and this is likely what you're observing. This is actually expected behavior -- it's actually the CUDA runtime that is throwing the exception, not us.
Thanks for your input,
Dan
- dan
- Administrator
- Posts: 61
- Joined: Thu Jul 23, 2009 2:29 pm
Re: Multi-GPU CULA
Just a small comment to add to Dan's post, which was that we once looked into this exception because we noticed it just as you did. It seems that it's thrown and handled several times by CUDA during normal operations. I'd only worry if one was unhandled, but that won't occur from CULA.
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
9 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest