CULA ... Runtime error (4)

Support for issues specific to the Linux operating systems.

CULA ... Runtime error (4)

Postby psillymathhead » Tue Apr 06, 2010 12:24 pm

After great success on a gt220 card and CUDA3.0(in part thanks to your admins!), I acquired a gtx295 card for further prototyping, and after installing it then confirming CUDA SDK examples are good, I am having problems with CULA.

Specifically, CULA Runtime error (4). All of the CULA examples are giving me this error, and my previous code is not working (giving zero vector answers etc).

I could not find adequate description of error in documentation. I did have working perfectly with gt220 card....

What is this error, and where should I look to fix it?

Thanks,
G
psillymathhead
 
Posts: 29
Joined: Wed Mar 31, 2010 6:14 pm

Re:CULA ... Runtime error (4)

Postby psillymathhead » Tue Apr 06, 2010 7:41 pm

I am concerned that CULA1.2premium is not working with NVIDIA's top of the line 295 graphics card...

What exactly is "Runtime Error 4" ?
psillymathhead
 
Posts: 29
Joined: Wed Mar 31, 2010 6:14 pm

Re:CULA ... Runtime error (4)

Postby psillymathhead » Wed Apr 07, 2010 7:57 am

Same problem after reverting back to CUDA2.3 drivers and toolkit.
psillymathhead
 
Posts: 29
Joined: Wed Mar 31, 2010 6:14 pm

Re:CULA ... Runtime error (4)

Postby dan » Wed Apr 07, 2010 9:10 am

"Runtime Error 4" is a launch failure (cudaErrorLaunchFailure). The numeric runtime errors correspond to CUDA's cudaError status codes.

When your gt220 was working did you use CUDA 3.0?

Now that you've reverted to the 2.3 SDK, can you try running one of the examples from the SDK and report as to whether that succeeds?

Make sure that your environment is set up correctly for the examples to link against the correct cuda libs.

Dan
dan
Administrator
 
Posts: 61
Joined: Thu Jul 23, 2009 2:29 pm

Re:CULA ... Runtime error (4)

Postby psillymathhead » Wed Apr 07, 2010 9:34 am

The SDK examples I have tried are working. DeviceQry, nbody, SimpleCUBLAS,etc are ok. I will look into whether it is a CUDA problem, but I find it unlikely as I can only reproduce error with this card and CULA... no problems with my old gt220 or my advisors gtx280 with same config. I believe a ticket is being submitted, but I will gladly supply any info/tests that will be helpful to resolving this.

I tried the third party memtest from the web on the card and it did not give any mem failures.

cuda@alienbox:/usr/local/NVIDIA_GPU_Computing_SDK/C/bin/linux/release$ ./MonteCarloMultiGPU
main(): generating input data...
main(): starting 2 host threads...
main(): waiting for GPU results...
main(): GPU statistics
GPU #0
Options : 128
Simulation paths: 262144
Time (ms.) : 2.359000
Options per sec.: 54260.280525
GPU #1
Options : 128
Simulation paths: 262144
Time (ms.) : 2.961000
Options per sec.: 43228.639475
main(): comparing Monte Carlo and Black-Scholes results...
L1 norm : 3.064244E-06
Average reserve: 371.047315
PASSED
Shutting down...

Press ENTER to exit...

I appreciate your response, and will do some research into possible CUDA problems and post this evening when I get home.

Thanks
psillymathhead
 
Posts: 29
Joined: Wed Mar 31, 2010 6:14 pm

Re:CULA ... Runtime error (4)

Postby psillymathhead » Wed Apr 07, 2010 9:39 am

yes, gt220 works with cuda3.0 and cuda2.3. SDK is ok with both versions of CUDA so far as I could test. I have tried linking to both cula's and nvidia's cudart with not luck.

I will look into any CUDA problems with this specific card and respond this evening.

Thanks,
G
psillymathhead
 
Posts: 29
Joined: Wed Mar 31, 2010 6:14 pm

Re:CULA ... Runtime error (4)

Postby psillymathhead » Wed Apr 07, 2010 3:47 pm

I am now running old CUDA2.3 and 190 series drivers as these should be supported by CULA. SDK examples seem fine:

CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA

Device 0: "GeForce GTX 295"
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939261952 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.24 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Device 1: "GeForce GTX 295"
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 938803200 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.24 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit...

If I use culaSelectDevice(1), I seem to have success with my geev code.... But this is only utilizing half the card. I am reading this Runtime Error may be something of a gpu side Seg Fault?
psillymathhead
 
Posts: 29
Joined: Wed Mar 31, 2010 6:14 pm

Re:CULA ... Runtime error (4)

Postby psillymathhead » Thu Apr 08, 2010 3:11 pm

My advisor and I believe this to be a hardware problem, after memtest detected some problems with device 1 of the gtx295. Hopefully a new unit will play nice with CULA. Will post success/fail when new unit is installed and tested.
psillymathhead
 
Posts: 29
Joined: Wed Mar 31, 2010 6:14 pm

Re:CULA ... Runtime error (4)

Postby kyle » Thu Apr 08, 2010 3:39 pm

psillymathhead wrote:I am now running old CUDA2.3 and 190 series drivers as these should be
If I use culaSelectDevice(1), I seem to have success with my geev code.... But this is only utilizing half the card. I am reading this Runtime Error may be something of a gpu side Seg Fault?


Just as a FYI, the GTX 295 is treated by CULA (and CUDA) as two separate graphics cards. It's essentially the same thing as having two GTX 260s in your computer. And as is the case with other multi-GPU scenarios, CULA will not automatically parallelize across these two devices.
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re:CULA ... Runtime error (4)

Postby psillymathhead » Fri Apr 09, 2010 7:59 am

yes, I am aware. Perhaps I was not specific enough. CULA fails if we simply use a CULA function, or if we use particular side of device by manual selection and call a CULA functions...

My guess is it was in general defaulting to device half which was broken. Honestly, I am just hoping that everything plays nice when we get another unit.

We don't need to parallelize CULA functions across multi-devices, more likely we will use the various devices to simultaneously process different matrices using culaSelectDevice.

Just for curiosity since we have this hardware and like speedtesting, can geev be run in parallel across the two devices??

In any case, I expect CULA to work simultaneously with both halves of NVIVIDA's top of the line graphics card for two separate matrices.. Right now I am chalking up to hardware failure, and if thats the case I will gladly post success in due time... in general we are greatly enjoying CULA tools, but I wish the documentation for troubleshooting and the Device/Error routines was a little better.

Thanks much :)
psillymathhead
 
Posts: 29
Joined: Wed Mar 31, 2010 6:14 pm

Re:CULA ... Runtime error (4)

Postby kyle » Fri Apr 09, 2010 9:49 am

Yes, you could run two independent GEEV simultaneously on each "device".

It's also worth noting that, despite the large number, the GTX 295 is not the "top of the line" NVIDIA card. It's two mid-range GPU on one PCB. In the world of graphics this is a nice solution because SLI rendering is a seamless parallelization of raster graphics. For the computation world, the GTX 285 has the top GFLOP processing power (excluding the new 400-series).

Either way, let us know how your parallelization efforts go!
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm


Return to Linux Support

Who is online

Users browsing this forum: No registered users and 1 guest

cron