CULA R15, CUDA 5 RC: which libraries to use?

Hi, I'm having a difficulty that I haven't been able to find documented. I'm on a brand new MacBook Pro with Retina display (NVIDIA GeForce GT 650M), and my CULA codes are crashing out on culaInitialize(). I tried running some tests from the shipped CULA examples, with the same symptoms; abort() on culaInitialize(). The NVIDIA SDK examples all run fine. It seems that the problem is that I have to use the CUDA 5 RC libraries in place of the CULA R15 (CUDA 4.2) CUDA libraries. This makes sense, but I wanted to make sure that it was correct. And when I do this, I get "Insufficient memory" errors.
I'm using GCC 4.2.1 from XCode, the current CUDA 5 release candidate (V0.2.1221) with driver 5.0.24, CULA R15, OS X 10.8.1.
Backtrace from the debugger gives:
Program received signal SIGABRT, Aborted.
0x00007fff86addd46 in __kill ()
(gdb) bt
#0 0x00007fff86addd46 in __kill ()
#1 0x00007fff88cb4eec in __abort ()
#2 0x00007fff88cb5d43 in __stack_chk_fail ()
#3 0x00000001001dd1d2 in cudalib::GetDeviceCount ()
#4 0x00000001001f4828 in culaInitialize ()
#5 0x0000000100000e4d in main ()
It seems to be grabbing the CULA 4.2 cuda runtime and blas:
% otool -L gesvd
gesvd:
libcula_core.dylib (compatibility version 0.0.0, current version 0.0.0)
libcula_lapack.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libcublas.dylib (compatibility version 1.1.0, current version 4.2.0)
@rpath/libcudart.dylib (compatibility version 1.1.0, current version 4.2.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
Recompiling with a pointer to the NVIDIA libraries:
% gcc -m64 -o gesvd gesvd.c -DNDEBUG -O3 -I/usr/local/cula/include -L/usr/local/cuda/lib -L/usr/local/cula/lib64 -lcula_core -lcula_lapack -lcublas -lcudart -pthread
yields the following:
% otool -L gesvd
gesvd:
libcula_core.dylib (compatibility version 0.0.0, current version 0.0.0)
libcula_lapack.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libcublas.dylib (compatibility version 1.1.0, current version 5.0.0)
@rpath/libcudart.dylib (compatibility version 1.1.0, current version 5.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
% ./gesvd
--------------------------------------------------------------------------------
This example demonstrates using CULA to implement an image compression
algorithm. Two images will be generated:
Full-fidelity: image_original.bmp
Reduced: image_reduced.bmp
--------------------------------------------------------------------------------
Generating image ... done.
Performing singular value decomposition using CULA ... Insufficient memory to complete this operation
So, no crash, but I don't know if the failure is indicative of another problem. The same happens with the CULA "systemSolve" example:
% otool -L systemSolve
systemSolve:
libcula_core.dylib (compatibility version 0.0.0, current version 0.0.0)
libcula_lapack.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libcublas.dylib (compatibility version 1.1.0, current version 5.0.0)
@rpath/libcudart.dylib (compatibility version 1.1.0, current version 5.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
% ./systemSolve
-------------------
SGESV
-------------------
Allocating Matrices
Initializing CULA
Calling culaSgesv
Insufficient memory to complete this operation
Any help is appreciated.
Thanks,
Chris
Proof of system details:
% which gcc
/usr/bin/gcc
% gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
% which nvcc
/usr/local/cuda/bin/nvcc
% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Aug__3_17:12:53_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221
% env | grep CULA
CULA_ROOT=/usr/local/cula
CULA_INC_PATH=/usr/local/cula/include
CULA_LIB_PATH_32=/usr/local/cula/lib
CULA_LIB_PATH_64=/usr/local/cula/lib64
% pwd
/usr/local/cula/examples/imageCompressionSVD
% make build64
sh ../checkenvironment.sh
gcc -m64 -o gesvd gesvd.c -DNDEBUG -O3 -I/usr/local/cula/include -L/usr/local/cula/lib64 -lcula_core -lcula_lapack -lcublas -lcudart -pthread
% ./gesvd
--------------------------------------------------------------------------------
This example demonstrates using CULA to implement an image compression
algorithm. Two images will be generated:
Full-fidelity: image_original.bmp
Reduced: image_reduced.bmp
--------------------------------------------------------------------------------
Generating image ... done.
Abort
Device info:
NVIDIA GeForce GT 650M:
Chipset Model: NVIDIA GeForce GT 650M
Type: GPU
Bus: PCIe
PCIe Lane Width: x8
VRAM (Total): 1024 MB
Vendor: NVIDIA (0x10de)
Device ID: 0x0fd5
Revision ID: 0x00a2
ROM Revision: 3688
gMux Version: 3.2.19 [3.2.8]
CUDA Driver Version: 5.0.24
GPU Driver Version: 8.0.51 295.30.00f01
I'm using GCC 4.2.1 from XCode, the current CUDA 5 release candidate (V0.2.1221) with driver 5.0.24, CULA R15, OS X 10.8.1.
Backtrace from the debugger gives:
Program received signal SIGABRT, Aborted.
0x00007fff86addd46 in __kill ()
(gdb) bt
#0 0x00007fff86addd46 in __kill ()
#1 0x00007fff88cb4eec in __abort ()
#2 0x00007fff88cb5d43 in __stack_chk_fail ()
#3 0x00000001001dd1d2 in cudalib::GetDeviceCount ()
#4 0x00000001001f4828 in culaInitialize ()
#5 0x0000000100000e4d in main ()
It seems to be grabbing the CULA 4.2 cuda runtime and blas:
% otool -L gesvd
gesvd:
libcula_core.dylib (compatibility version 0.0.0, current version 0.0.0)
libcula_lapack.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libcublas.dylib (compatibility version 1.1.0, current version 4.2.0)
@rpath/libcudart.dylib (compatibility version 1.1.0, current version 4.2.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
Recompiling with a pointer to the NVIDIA libraries:
% gcc -m64 -o gesvd gesvd.c -DNDEBUG -O3 -I/usr/local/cula/include -L/usr/local/cuda/lib -L/usr/local/cula/lib64 -lcula_core -lcula_lapack -lcublas -lcudart -pthread
yields the following:
% otool -L gesvd
gesvd:
libcula_core.dylib (compatibility version 0.0.0, current version 0.0.0)
libcula_lapack.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libcublas.dylib (compatibility version 1.1.0, current version 5.0.0)
@rpath/libcudart.dylib (compatibility version 1.1.0, current version 5.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
% ./gesvd
--------------------------------------------------------------------------------
This example demonstrates using CULA to implement an image compression
algorithm. Two images will be generated:
Full-fidelity: image_original.bmp
Reduced: image_reduced.bmp
--------------------------------------------------------------------------------
Generating image ... done.
Performing singular value decomposition using CULA ... Insufficient memory to complete this operation
So, no crash, but I don't know if the failure is indicative of another problem. The same happens with the CULA "systemSolve" example:
% otool -L systemSolve
systemSolve:
libcula_core.dylib (compatibility version 0.0.0, current version 0.0.0)
libcula_lapack.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libcublas.dylib (compatibility version 1.1.0, current version 5.0.0)
@rpath/libcudart.dylib (compatibility version 1.1.0, current version 5.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
% ./systemSolve
-------------------
SGESV
-------------------
Allocating Matrices
Initializing CULA
Calling culaSgesv
Insufficient memory to complete this operation
Any help is appreciated.
Thanks,
Chris
Proof of system details:
% which gcc
/usr/bin/gcc
% gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
% which nvcc
/usr/local/cuda/bin/nvcc
% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Aug__3_17:12:53_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221
% env | grep CULA
CULA_ROOT=/usr/local/cula
CULA_INC_PATH=/usr/local/cula/include
CULA_LIB_PATH_32=/usr/local/cula/lib
CULA_LIB_PATH_64=/usr/local/cula/lib64
% pwd
/usr/local/cula/examples/imageCompressionSVD
% make build64
sh ../checkenvironment.sh
gcc -m64 -o gesvd gesvd.c -DNDEBUG -O3 -I/usr/local/cula/include -L/usr/local/cula/lib64 -lcula_core -lcula_lapack -lcublas -lcudart -pthread
% ./gesvd
--------------------------------------------------------------------------------
This example demonstrates using CULA to implement an image compression
algorithm. Two images will be generated:
Full-fidelity: image_original.bmp
Reduced: image_reduced.bmp
--------------------------------------------------------------------------------
Generating image ... done.
Abort
Device info:
NVIDIA GeForce GT 650M:
Chipset Model: NVIDIA GeForce GT 650M
Type: GPU
Bus: PCIe
PCIe Lane Width: x8
VRAM (Total): 1024 MB
Vendor: NVIDIA (0x10de)
Device ID: 0x0fd5
Revision ID: 0x00a2
ROM Revision: 3688
gMux Version: 3.2.19 [3.2.8]
CUDA Driver Version: 5.0.24
GPU Driver Version: 8.0.51 295.30.00f01