Page 1 of 1

CULA compile problem with gpu function

PostPosted: Sat Oct 02, 2010 10:12 am
by steve90370
Hello all:

I find that CULA is a very powerful tool for solving matrix problem. And I try to implement a LU solver calculated by GPU on MAC platform. I try to combine GPU matrix transpose function(which is a GPU kernel function) with CULA culaDeviceSgetrs. But it will response error while compiling.

Below are my makefile:
CC= nvcc

LIBS= -lcula -lcublas -lcudart

@echo "To build this example, type one of:"
@echo ""
@echo " make build32"
@echo " make build64"
@echo ""
@echo "where '32' and '64' represent the platform you wish to build for"
@echo ""
@echo "Note: this example requires the CUDA toolkit to compile"
${CC} -m32 -v -o getrf $(CFLAGS) $(INCLUDES) $(LIBPATH32) $ (LIBS)
sh ../
${CC} -m64 -o geqrf_device geqrf_device.c $(CFLAGS) $(INCLUDES) $(LI BPATH64) $(LIBS)

rm -f getrf

Here are the error message and compile log:

nvcc -m32 -v -o getrf -DNDEBUG -O3 -I/usr/local/cula/include -I/usr/local/cuda/bin: -L/usr/local/cula/lib -lcula -lcublas -lcudart
#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda/bin
#$ _THERE_=/usr/local/cuda/bin
#$ TOP=/usr/local/cuda/bin/..
#$ PATH=/usr/local/cuda/bin/../open64/bin:/usr/local/cuda/bin:/usr/local/cuda/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
#$ INCLUDES="-I/usr/local/cuda/bin/../include"
#$ LIBRARIES= "-L/usr/local/cuda/bin/../lib" -lcudart
#$ gcc -D__CUDA_ARCH__=100 -E -x c++ -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_13_DOUBLE_INTRINSICS -DCUDA_NO_SM_11_ATOMIC_INTRINSICS "-I/usr/local/cuda/bin/../include" -I. -D__CUDACC__ -C -O3 -I"/usr/local/cula/include" -I"/usr/local/cuda/bin:" -D"NDEBUG" -include "cuda_runtime.h" -m32 -malign-double -o "/tmp/tmpxft_00002f97_00000000-4_getrf.cpp1.ii" ""
#$ cudafe --m32 --gnu_version=40201 -tused --no_remove_unneeded_entities --gen_c_file_name "/tmp/tmpxft_00002f97_00000000-1_getrf.cudafe1.c" --stub_file_name "/tmp/tmpxft_00002f97_00000000-1_getrf.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00002f97_00000000-1_getrf.cudafe1.gpu" --include_file_name "/tmp/tmpxft_00002f97_00000000-3_getrf.fatbin.c" "/tmp/tmpxft_00002f97_00000000-4_getrf.cpp1.ii" error: identifier "culaDeviceSgetrf" is undefined warning: variable "t_A" was declared but never referenced

1 error detected in the compilation of "/tmp/tmpxft_00002f97_00000000-4_getrf.cpp1.ii".
# --error 0x2 --
make: *** [build32] Error 2

Please help! Thank you very much!

Re: CULA compile problem with gpu function

PostPosted: Tue Oct 05, 2010 6:49 am
by john
My guess is that you haven't #include "cula.h" because your compiler displayed the error: identifier "culaDeviceSgetrf" is undefined message; basically it can't find the declaration, which indicates a missing header.

Re: CULA compile problem with gpu function

PostPosted: Wed Oct 06, 2010 11:40 pm
by steve90370
Thank you so much! My problem solved!
By the way, I have some questions about CULA tool:
Is all the CULA standard automatically open space on device memory and work on GPU?

And for the all the input matrix, are they all stored in column major order?

If yes, I have to do matrix transpose first before applying the cula function?

And also another question about the precision:

I use culaDeviceSgesv to calculate the solution of Ax = b. After calculation I calculate the difference between b and A*X. And I found that the result precision is not very well... Some difference is almost close to 1.0f! Is this because the limited precision of CULA basic?


Re: CULA compile problem with gpu function

PostPosted: Fri Oct 08, 2010 5:53 am
by john
CULA matrices are in column major storage, so you might need to transpose. Fortunately, transpose is very simple (see the CUDA SDK.)

Did you compare your results to a software LAPACK? Our results will match theirs very closely, even in single precision. You didn't specify how you measured your error, so it is hard to describe further. If you are seeing large errors, then you might be calling the routine incorrectly as well.