Instead of exception handling, my computer reboots
7 posts
• Page 1 of 1
Instead of exception handling, my computer reboots
Hi
I am using the functions culaSgesvd and culaSgemm. I have not tried to find out if other culatools functions cause the same problem.
When I call these functions with data that are too big for the GPU, instead of handling the exception and returning an error, my computer reboots without warning (!).
If I run the same code under cula-memcheck this problem does not occur. Either the code runs as it should, even for data that cause the computer to reboot, or for even bigger data the exception is handled appropriately.
I tried initialising all of my data to zero and calloc-ing the memory for arrays of culaFloat, but this has not fixed the problem.
I would like the exception to be handled correctly so that I can direct my code to do something else if the data are too big for the GPU to handle.
Do you have any suggestions as to how I could debug this problem? I am preparing a MWE, which I can send if you think this would help.
Ubuntu 11.10, CUDA 5.0, CULA 16, nVidia GeForce GTX 580
Thanks in advance
Erika
I am using the functions culaSgesvd and culaSgemm. I have not tried to find out if other culatools functions cause the same problem.
When I call these functions with data that are too big for the GPU, instead of handling the exception and returning an error, my computer reboots without warning (!).
If I run the same code under cula-memcheck this problem does not occur. Either the code runs as it should, even for data that cause the computer to reboot, or for even bigger data the exception is handled appropriately.
I tried initialising all of my data to zero and calloc-ing the memory for arrays of culaFloat, but this has not fixed the problem.
I would like the exception to be handled correctly so that I can direct my code to do something else if the data are too big for the GPU to handle.
Do you have any suggestions as to how I could debug this problem? I am preparing a MWE, which I can send if you think this would help.
Ubuntu 11.10, CUDA 5.0, CULA 16, nVidia GeForce GTX 580
Thanks in advance
Erika
- erikacule
- Posts: 4
- Joined: Tue Jun 26, 2012 8:05 am
Re: Instead of exception handling, my computer reboots
Please post the reproducer once you have completed it - it is impossible to give any guidance absent that.
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: Instead of exception handling, my computer reboots
Hi John
I attach my makefile and my MWE, test.c.
The code runs as
Where n = number of rows, p = number of columns of the matrix to be passed to culaSgesvd.
I did a bit more digging and found that the computer rebooting only seems to occur for some input matrix dimensions.
In this example, on my computer, n = 20,000 and 1217 <= p <= 6914 or vice versa causes the computer to reboot. p < 1217 it runs fine, p > 6915 I get the appropriate error message. Different dimension limitations seem to apply for culaSgemm.
If you can suggest how I could debug this or where else I am going wrong, I would be very grateful.
Many thanks
Erika
I attach my makefile and my MWE, test.c.
The code runs as
- Code: Select all
./test n p
Where n = number of rows, p = number of columns of the matrix to be passed to culaSgesvd.
I did a bit more digging and found that the computer rebooting only seems to occur for some input matrix dimensions.
In this example, on my computer, n = 20,000 and 1217 <= p <= 6914 or vice versa causes the computer to reboot. p < 1217 it runs fine, p > 6915 I get the appropriate error message. Different dimension limitations seem to apply for culaSgemm.
If you can suggest how I could debug this or where else I am going wrong, I would be very grateful.
Many thanks
Erika
- Attachments
-
makefile.txt
- I couldn't upload the file without a file extension.
- (645 Bytes) Downloaded 322 times
-
test.c
- test.c
- (2.91 KiB) Downloaded 336 times
- erikacule
- Posts: 4
- Joined: Tue Jun 26, 2012 8:05 am
Re: Instead of exception handling, my computer reboots
Are you running this GPU as a display device as well as compute? I believe you are timing out the driver, causing it to somehow abort or kill the kernel. (The cuda-memcheck invocations seem to be able to sufficiently throttle the GPU to prevent this). I ran this on an unattached K20 card and it works correctly:
- Code: Select all
[emp@centos5 svd]$ time LD_LIBRARY_PATH=/usr/local/cula/lib64 ./test 20000 6000
0.840188 0.394383 0.783099 0.798440 0.911647 0.197551 0.335223 0.768230 0.277775 0.553970
0.559327 0.465008 0.716441 0.934091 0.926551 0.192557 0.042576 0.060467 0.229079 0.062010
0.282955 0.669488 0.131806 0.513085 0.859476 0.338414 0.683484 0.467529 0.906918 0.042755
0.336203 0.664952 0.028596 0.721787 0.991453 0.091814 0.373052 0.085898 0.438245 0.855890
0.643493 0.277524 0.415689 0.529127 0.606891 0.909866 0.532046 0.648782 0.755835 0.544879
0.101862 0.107618 0.516674 0.976793 0.986778 0.615902 0.441280 0.750425 0.298174 0.406181
0.049530 0.116873 0.934094 0.627657 0.485945 0.590409 0.205888 0.586771 0.151884 0.414813
0.913154 0.364376 0.042659 0.454397 0.222347 0.346332 0.776218 0.999615 0.114817 0.764082
0.777715 0.542264 0.061665 0.111275 0.705298 0.243828 0.961570 0.073948 0.054152 0.378876
0.064273 0.582906 0.710440 0.105491 0.253948 0.336130 0.516837 0.877946 0.918046 0.515769
min dim is 6000
allocating output matrices
done
Calling culaSgesvd
Done
real 1m20.095s
user 2m23.211s
sys 0m12.534s
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: Instead of exception handling, my computer reboots
No, the GPU is not the display device. I updated the linux kernel when I moved to CUDA 5.0.
Thanks for taking a look though.
Thanks for taking a look though.
- erikacule
- Posts: 4
- Joined: Tue Jun 26, 2012 8:05 am
Re: Instead of exception handling, my computer reboots
culaSgemm is a wrapper to cublasSgemm, so if you are seeing the same problem there, it suggests either a problem with the NVIDIA libs, or somehow a timeout is being triggered even though your display is unattached. I'm afraid I don't have a lot to go on besides that for you.
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: Instead of exception handling, my computer reboots
OK I will take a look at that. I did a Google search about GPU and kernel time out and I see why you asked whether the GPU is the display device, so I will also have a look at the settings for the driver and see if there is something there.
Thanks for taking a look, I appreciate it.
Thanks for taking a look, I appreciate it.
- erikacule
- Posts: 4
- Joined: Tue Jun 26, 2012 8:05 am
7 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 0 guests