Page 1 of 1

Problem with using CULA together with my own CUDA code

PostPosted: Sun Sep 04, 2011 5:18 am
by slawomirkaczmarek
Hi, I am developing c++ application that needs to use CULA functions together with my own simple CUDA functions. I am using the latest CULA Basic library on linux with 64bit nvcc and g++ compilers.

My code looks roughly like this:
Code: Select all
loop
    some calculations and calls to my own CUDA functions;
    culaInitialize();
    calls to CULA and CUBLAS functions;
    culaShutdown();
end loop

Each of my CUDA functions is rather simple, looking like this:
Code: Select all
cudaMalloc();
cudaMemcpy(); // host to device
call to kernel;
cudaThreadSynchronize();
cudaMemcpy(); // device to host
cudaFree();

The problem is: my CUDA functions are not working after using CULA, so the first iteration of loop works, but on second iteration first call to cudaMalloc() returns "unspecified launch failure".
I've already tried calling culaInitialize() only once before entering loop but then even the first iteration resulted with error. Only thing that somehow worked was calling cudaDeviceReset() after culaShutdown(), my functions started to work but the second call to culaInitialize() returned "unspecified launch failure"...

Both my and CULA functions are not operating on large amounts of data so I don't think it is lack of memory problem.

I'm rather new to CUDA programming so it is likely that I'm missing something simple...

Re: Problem with using CULA together with my own CUDA code

PostPosted: Tue Sep 06, 2011 6:05 am
by john
You'll need to post some code showing usage. Have you checked the return codes from the CULA functions? I'm guessing that you are providing bad arguments to the CULA functions, which are leading to unspecified launch failures - these basically corrupt your card's state and so it can be tough to launch future kernels after one.

Re: Problem with using CULA together with my own CUDA code

PostPosted: Tue Sep 06, 2011 1:18 pm
by slawomirkaczmarek
I was already checking return codes from CULA functions but I was not checking those of CUBLAS! It turned out that I was passing wrong matrix to one of the functions. Thanks for pointing me at this :D