Page 1 of 1

geqrf basic question

PostPosted: Mon Jul 19, 2010 10:15 am
by jinyan
Hi, I'm new to CULA, can someone explain to me the parameters on Sgeqrf?

m - # of rows in A
n - # of columns in A
a - pointer to matrix A
Ida - leading dimension of A, so Ida = m?
tau - scalar factors of elementary reflectors??

So if I call culaSgeqrf(...), where does the output Q and R matrices get stored?

Another question, what's the difference between culaSgeqrf and culaDeviceSgeqrf? If I use culaDeviceSgeqrf, do I have to allocate memory space/copy memory to GPU, like CUDA?

Sorry for the silly questions, the CULA programming guide wasn't very beginner-friendly.

Thanks in advance!

Re: geqrf basic question

PostPosted: Mon Jul 19, 2010 10:40 am
by kyle
CULA's QR decomposition is implemented using Householder reflections. Check out the Wikipedia page for some decent information if you are unfamiliar with the algorithm.

jinyan wrote:m - # of rows in A
n - # of columns in A
a - pointer to matrix A
Ida - leading dimension of A, so Ida = m?
tau - scalar factors of elementary reflectors??

LDA is typically M, but doesn't have to be. TAU is used to construct (or multiply) the Q matrix after xGEQRF.

jinyan wrote:So if I call culaSgeqrf(...), where does the output Q and R matrices get stored?

After xGEQRF, R is stored in the upper triangular portion of A. Q can then be generated using xORGQR with A and TAU as the inputs. Alternatively, Q can multiplied directly with another matrix using xORMQR with A, TAU, and C as the inputs. This two step method is typical and will be seen in any package based on the LAPACK interface. When you call QR in MATLAB for example, it calls both of these functions behinds the scenes. Also, it's worth nothing that xORGQR and xORMQR are only available in CULA Premium.

jinyan wrote:Another question, what's the difference between culaSgeqrf and culaDeviceSgeqrf? If I use culaDeviceSgeqrf, do I have to allocate memory space/copy memory to GPU, like CUDA?

culaSgeqrf expects host memory and culaDeviceSgeqrf expects device memory. We always recommend the standard host interface because it's simpler and uses special allocation methods that maximizes memory throughput.

jinyan wrote:Sorry for the silly questions, the CULA programming guide wasn't very beginner-friendly.

Not a problem! The guide is more focused at developers who are familiar with LAPACK notation but new to GPU computing. Let us know if you have any other questions.

Re: geqrf basic question

PostPosted: Tue Jul 20, 2010 8:41 am
by jinyan
Thanks for the help, Kyle!

One more question, I'm getting this error while trying to run the executable

error while loading shared libraries: libcula.so: cannot open shared object file: No such file or directory

I saw your reply on the other post "make sure to include...in LD_LIBRARY_PATH", I tried looking around on google and still couldn't figure out how to change LD_LIBRARY_PATH. Can you help? I'm fairly new to linux too :cry:

Re: geqrf basic question

PostPosted: Tue Jul 20, 2010 8:50 am
by kyle
To add CULA to your library path, try the following command.

Code: Select all
export LD_LIBRARY_PATH=/usr/local/cula/lib64:$LD_LIBRARY_PATH

Change to 'lib' if you are on a 32-bit system.

Re: geqrf basic question

PostPosted: Tue Jul 20, 2010 10:37 am
by john
You can add that to your .bashrc if you want to make it permanent.

Re: geqrf basic question

PostPosted: Tue Jul 20, 2010 11:06 am
by jinyan
woohoo! okay program is working, well, only for square matrices right now. the code I wrote for CPU calculation is row-major, so the column-major implementation in CULA is causing some errors in rectangular matrices. I'll fix that later.

Another question, when I try to include cutil.h to use the cut...Timer functions, it says cutil.h cannot be found. How can i tell the program to look for the header in CUDA SDK folder?

Re: geqrf basic question

PostPosted: Tue Jul 20, 2010 12:26 pm
by kyle
Check your compiler for documentation on how to search for additional include directories.

In GCC it's:

Code: Select all
-I dir


If you are using cutil.h, remember you'll have to link against the cutil library as well.

Re: geqrf basic question

PostPosted: Tue Jul 20, 2010 1:29 pm
by john
If you would prefer to avoid cutil, you'll find some simple timing code in the CULA examples/benchmark folder.

Re: geqrf basic question

PostPosted: Wed Jul 21, 2010 8:47 am
by jinyan
Thanks again. I ended up using clock(), it seems to be the more precise out of the two (clock() vs time()).

Is there a function in CULA that will allow me to efficiently transpose a matrix? I couldn't find one in manual.

Re: geqrf basic question

PostPosted: Wed Jul 21, 2010 2:16 pm
by john
I'm afraid there isn't. If your matrix is square or if you are willing to do an out-of-place transpose, the CUDA code is pretty simple and very fast. The Nvidia GPU SDK has an okay implementation.