About the performance of gels

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

About the performance of gels

Postby stzpz » Fri Nov 04, 2011 12:48 pm

I am trying to use gels to solve the linear equations Ax=b where A is a 2000x38 matrix and the data are already in device.

In the first try, I did not pad the matrix A, and use culaDeviceSgels to solve. By running about 700 times with different data (but in similar size), it takes about 2.4s. However, if I transfer the data back to host and use MKL to solve it, it only takes about 1.2s, and the data transfer time is about 0.2s. It seems that CULA is much slower than MKL.

Then I was told that if the data are correctly padded, I could get better performance. So I bought the premium version of CULA which provide the culaGetOptimalPitch() function that can calculate the best pitch for the data. Then I use it to calculate the best pitch of the matrix A (in row-major), so it becomes a pitch x 38 matrix. Then I transpose it and use culaDeviceSgels() to solve it. However, the speed is about 2.5s, which is a bit slower than the unpadded one.

Is anybody know why it is slow? And why the padded one is even slower? Did I use them incorrectly?

Thanks!
stzpz
 
Posts: 2
Joined: Fri Oct 21, 2011 2:35 pm

Re: About the performance of gels

Postby john » Wed Dec 07, 2011 2:09 pm

I'm just now seeing this thread, please pardon the delay in the response.

The sizes you mention are not an ideal use case for the GPU, because the number of operations required are relatively small. On the order of a couple million operations per matrix. Essentially by the time the commands to initiate the matrix operation reach the GPU, the CPU has completed one or more solutions. Even without the communication delay, the parallelism in a matrix of this size is fairly limited, so the GPU will perform far below its peak level. I'm sorry that I don't have a much better answer to this for you.

Could you motivate for me your use case? What is your problem and time budget such that 1.2s/700 is your performance bottleneck?
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm


Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 2 guests

cron