Instead of exception handling, my computer reboots

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

Instead of exception handling, my computer reboots

Postby erikacule » Thu Dec 06, 2012 4:16 am

Hi

I am using the functions culaSgesvd and culaSgemm. I have not tried to find out if other culatools functions cause the same problem.

When I call these functions with data that are too big for the GPU, instead of handling the exception and returning an error, my computer reboots without warning (!).

If I run the same code under cula-memcheck this problem does not occur. Either the code runs as it should, even for data that cause the computer to reboot, or for even bigger data the exception is handled appropriately.

I tried initialising all of my data to zero and calloc-ing the memory for arrays of culaFloat, but this has not fixed the problem.

I would like the exception to be handled correctly so that I can direct my code to do something else if the data are too big for the GPU to handle.

Do you have any suggestions as to how I could debug this problem? I am preparing a MWE, which I can send if you think this would help.

Ubuntu 11.10, CUDA 5.0, CULA 16, nVidia GeForce GTX 580

Thanks in advance

Erika
erikacule
 
Posts: 4
Joined: Tue Jun 26, 2012 8:05 am

Re: Instead of exception handling, my computer reboots

Postby john » Thu Dec 06, 2012 10:42 am

Please post the reproducer once you have completed it - it is impossible to give any guidance absent that.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Instead of exception handling, my computer reboots

Postby erikacule » Fri Dec 21, 2012 4:06 am

Hi John

I attach my makefile and my MWE, test.c.

The code runs as

Code: Select all
./test n p


Where n = number of rows, p = number of columns of the matrix to be passed to culaSgesvd.

I did a bit more digging and found that the computer rebooting only seems to occur for some input matrix dimensions.

In this example, on my computer, n = 20,000 and 1217 <= p <= 6914 or vice versa causes the computer to reboot. p < 1217 it runs fine, p > 6915 I get the appropriate error message. Different dimension limitations seem to apply for culaSgemm.

If you can suggest how I could debug this or where else I am going wrong, I would be very grateful.

Many thanks

Erika
Attachments
makefile.txt
I couldn't upload the file without a file extension.
(645 Bytes) Downloaded 304 times
test.c
test.c
(2.91 KiB) Downloaded 318 times
erikacule
 
Posts: 4
Joined: Tue Jun 26, 2012 8:05 am

Re: Instead of exception handling, my computer reboots

Postby john » Fri Dec 21, 2012 9:34 am

Are you running this GPU as a display device as well as compute? I believe you are timing out the driver, causing it to somehow abort or kill the kernel. (The cuda-memcheck invocations seem to be able to sufficiently throttle the GPU to prevent this). I ran this on an unattached K20 card and it works correctly:

Code: Select all
[emp@centos5 svd]$ time LD_LIBRARY_PATH=/usr/local/cula/lib64 ./test 20000 6000

0.840188 0.394383 0.783099 0.798440 0.911647 0.197551 0.335223 0.768230 0.277775 0.553970
0.559327 0.465008 0.716441 0.934091 0.926551 0.192557 0.042576 0.060467 0.229079 0.062010
0.282955 0.669488 0.131806 0.513085 0.859476 0.338414 0.683484 0.467529 0.906918 0.042755
0.336203 0.664952 0.028596 0.721787 0.991453 0.091814 0.373052 0.085898 0.438245 0.855890
0.643493 0.277524 0.415689 0.529127 0.606891 0.909866 0.532046 0.648782 0.755835 0.544879
0.101862 0.107618 0.516674 0.976793 0.986778 0.615902 0.441280 0.750425 0.298174 0.406181
0.049530 0.116873 0.934094 0.627657 0.485945 0.590409 0.205888 0.586771 0.151884 0.414813
0.913154 0.364376 0.042659 0.454397 0.222347 0.346332 0.776218 0.999615 0.114817 0.764082
0.777715 0.542264 0.061665 0.111275 0.705298 0.243828 0.961570 0.073948 0.054152 0.378876
0.064273 0.582906 0.710440 0.105491 0.253948 0.336130 0.516837 0.877946 0.918046 0.515769
min dim is 6000
allocating output matrices
done
Calling culaSgesvd
Done

real    1m20.095s
user    2m23.211s
sys     0m12.534s
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Instead of exception handling, my computer reboots

Postby erikacule » Fri Dec 21, 2012 9:39 am

No, the GPU is not the display device. I updated the linux kernel when I moved to CUDA 5.0.

Thanks for taking a look though.
erikacule
 
Posts: 4
Joined: Tue Jun 26, 2012 8:05 am

Re: Instead of exception handling, my computer reboots

Postby john » Fri Dec 21, 2012 9:53 am

culaSgemm is a wrapper to cublasSgemm, so if you are seeing the same problem there, it suggests either a problem with the NVIDIA libs, or somehow a timeout is being triggered even though your display is unattached. I'm afraid I don't have a lot to go on besides that for you.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Instead of exception handling, my computer reboots

Postby erikacule » Fri Dec 21, 2012 9:57 am

OK I will take a look at that. I did a Google search about GPU and kernel time out and I see why you asked whether the GPU is the display device, so I will also have a look at the settings for the driver and see if there is something there.

Thanks for taking a look, I appreciate it.
erikacule
 
Posts: 4
Joined: Tue Jun 26, 2012 8:05 am


Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 2 guests

cron