Page 1 of 1

Runtime Error (17) culaDeviceSgesv with cudaHostAl

PostPosted: Tue Mar 23, 2010 6:40 pm
by konod
Dear all,
I downloaded CULA 1.2 and compiled the example geqrf_device.c code.
When I run it, there is no problem.

Then I chaged geqrf_device.c code to use cudaHostAllocMapped memory.
When I run it, I get Runtime Error (17) when it makes the call to culaDeviceSgeqrf.

I am running Visual Studio 2005, CULA 1.2, CUDA 2.3.

CULA and CUDA seems to initialize fine.

Thanks for any help.

Kohei

Re:Runtime Error (17) culaDeviceSgesv with cudaHostAl

PostPosted: Tue Mar 23, 2010 8:28 pm
by dan
konod wrote:Then I chaged geqrf_device.c code to use cudaHostAllocMapped memory.
When I run it, I get Runtime Error (17) when it makes the call to culaDeviceSgeqrf.

What GPU are you using? cudaHostAllocMapped requires specific support from a GPU (most likely an integrated part like a GPU). NVIDIA has a section in their guide about using this type of memory.

If you do have support, make sure you are calling cudaHostGetDevicePointer to get a valid GPU device pointer with which to work. My guess is that you haven't done that because error 17 refers to "cudaErrorInvalidDevicePointer".

Dan

Re:Runtime Error (17) culaDeviceSgesv with cudaHostAl

PostPosted: Tue Mar 23, 2010 9:42 pm
by konod
Hi, Dan

I use GTX 280.
When I run the sample of CUDA simpleZeroCopy.exe, there is no problem.

And, When I call cudaHostGetDevicePointer in my code, there is no err.
So, when I comment out culaDeviceSgeqrf in my code, my code well done.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// point to host memory
float* A = NULL;
float* TAU = NULL;

// point to device memory
float* Ad = NULL;
float* TAUd = NULL;

//CUDA Setting
err = cudaSetDeviceFlags(cudaDeviceMapHost);
checkCudaError(err);

printf("Allocating Matrices\n");
err = cudaHostAlloc((void**)&A, M*N*sizeof(float), cudaHostAllocMapped);
checkCudaError(err);
err = cudaHostAlloc((void**)&TAU, N*sizeof(float), cudaHostAllocMapped);
checkCudaError(err);

if(!A || !TAU)
exit(EXIT_FAILURE);

err = cudaHostGetDevicePointer((void**)&Ad, (void*)A, 0);
checkCudaError(err);

err = cudaHostGetDevicePointer((void**)&TAUd, (void*)TAU, 0);
checkCudaError(err);

printf("Initializing CULA\n");
status = culaInitialize();
checkStatus(status);

memset(A, 0, M*N*sizeof(float));

printf("Calling culaDeviceSgeqrf\n");
status = culaDeviceSgeqrf(M, N, Ad, M, TAUd);
checkStatus(status);

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sincerely yours

Kohei

Re:Runtime Error (17) culaDeviceSgesv with cudaHostAl

PostPosted: Wed Mar 24, 2010 3:34 pm
by dan
We've tried your code on a few of our machines and found it works. Can you report a little more on your machine by running the sysinfo.bat script (in the examples folder) and attaching its output here?

Beyond that, I'd like to note that using cudaHostAllocMapped memory is a little unusual for this application. What is the problem you're hoping to solve by using this type of memory?

Dan

Re:Runtime Error (17) culaDeviceSgesv with cudaHostAl

PostPosted: Sun Mar 28, 2010 4:57 pm
by konod
Hi, Dan

I attach the output of sysinfo.bat script.

I think that using the cudaHostAllocMapped memory regards to speed up the program, dose not it?

Sincerely yours

Kohei [file name=sysinforeport.zip size=218541]http://www.culatools.com/images/fbfiles/files/sysinforeport.zip[/file]

Re:Runtime Error (17) culaDeviceSgesv with cudaHostAl

PostPosted: Tue Mar 30, 2010 10:10 am
by kyle
cudaHostAllocMapped is designed for integrated GPUs that can utilize system memory. In these cases you don't need to explicitly copy memory to the GPU, just create an alias. Hence the term "zero copy".

This feature is typically only found in mid-to-high range laptops. Using this type of memory on a dedicated card, like your GTX 280, doesn't make sense because there is no memory shared between the host and device. You'll probably even see a large slowdown because CUDA will automatically synchronize the two memory location with every GPU operation!

Re:Runtime Error (17) culaDeviceSgesv with cudaHostAl

PostPosted: Sun Apr 04, 2010 7:10 pm
by konod
Hi, Dan and kyle

I understand your suggestions.

Then, I want to know why my code does not work well on our machine.

When, I run the CUDADeviceQuery.exe, sample of CUDA,
"Support host page-locked memory mapping" is "Yes".
So, I think our GPU device can use cudaHostAllocMapped memory.
And, On your machines, my code works well.

Why I get Runtime Error (17), when I run my code on our machine?


Sincerely yours

Kohei

Re:Runtime Error (17) culaDeviceSgesv with cudaHostAl

PostPosted: Mon Apr 05, 2010 7:42 am
by kyle
Upon further investigation, your GT200 based GPU should support page-locked memory mapping on the host. However, there is no performance gain in using this memory. According to the NVIDIA Pinned Memory FAQ:

For discrete GPUs, mapped pinned memory is only a performance win in certain cases. Since the memory is not cached by the GPU, it should be read or written exactly once; and the global loads and stores that read or write the memory must be coalesced to avoid a 2x performance penalty.

Also,
...the only way to ensure that the buffer is ready to be read is through explicit synchronization.


CULA does not follow these design patterns. Based upon this, host-mapped pined memory unsupported by CULA.

However, unmapped pinned memory is certainly supported and will provide a minor performance gain in some instances.

If you have questions about gaining more performance, please let us know.

-Kyle