## About the performance of gels

2 posts
• Page

**1**of**1**### About the performance of gels

I am trying to use gels to solve the linear equations Ax=b where A is a 2000x38 matrix and the data are already in device.

In the first try, I did not pad the matrix A, and use culaDeviceSgels to solve. By running about 700 times with different data (but in similar size), it takes about 2.4s. However, if I transfer the data back to host and use MKL to solve it, it only takes about 1.2s, and the data transfer time is about 0.2s. It seems that CULA is much slower than MKL.

Then I was told that if the data are correctly padded, I could get better performance. So I bought the premium version of CULA which provide the culaGetOptimalPitch() function that can calculate the best pitch for the data. Then I use it to calculate the best pitch of the matrix A (in row-major), so it becomes a pitch x 38 matrix. Then I transpose it and use culaDeviceSgels() to solve it. However, the speed is about 2.5s, which is a bit slower than the unpadded one.

Is anybody know why it is slow? And why the padded one is even slower? Did I use them incorrectly?

Thanks!

In the first try, I did not pad the matrix A, and use culaDeviceSgels to solve. By running about 700 times with different data (but in similar size), it takes about 2.4s. However, if I transfer the data back to host and use MKL to solve it, it only takes about 1.2s, and the data transfer time is about 0.2s. It seems that CULA is much slower than MKL.

Then I was told that if the data are correctly padded, I could get better performance. So I bought the premium version of CULA which provide the culaGetOptimalPitch() function that can calculate the best pitch for the data. Then I use it to calculate the best pitch of the matrix A (in row-major), so it becomes a pitch x 38 matrix. Then I transpose it and use culaDeviceSgels() to solve it. However, the speed is about 2.5s, which is a bit slower than the unpadded one.

Is anybody know why it is slow? And why the padded one is even slower? Did I use them incorrectly?

Thanks!

- stzpz
**Posts:**2**Joined:**Fri Oct 21, 2011 2:35 pm

### Re: About the performance of gels

I'm just now seeing this thread, please pardon the delay in the response.

The sizes you mention are not an ideal use case for the GPU, because the number of operations required are relatively small. On the order of a couple million operations per matrix. Essentially by the time the commands to initiate the matrix operation reach the GPU, the CPU has completed one or more solutions. Even without the communication delay, the parallelism in a matrix of this size is fairly limited, so the GPU will perform far below its peak level. I'm sorry that I don't have a much better answer to this for you.

Could you motivate for me your use case? What is your problem and time budget such that 1.2s/700 is your performance bottleneck?

The sizes you mention are not an ideal use case for the GPU, because the number of operations required are relatively small. On the order of a couple million operations per matrix. Essentially by the time the commands to initiate the matrix operation reach the GPU, the CPU has completed one or more solutions. Even without the communication delay, the parallelism in a matrix of this size is fairly limited, so the GPU will perform far below its peak level. I'm sorry that I don't have a much better answer to this for you.

Could you motivate for me your use case? What is your problem and time budget such that 1.2s/700 is your performance bottleneck?

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

2 posts
• Page

**1**of**1**### Who is online

Users browsing this forum: No registered users and 1 guest