## sgesv in 1.1 is slow...

### Re:sgesv in 1.1 is slow...

Boxed Cylon wrote:

So the question is not so much why sgesv is slow, as why is it slow when the 2nd dimension of B is large. In my own application, this dimension is about 1000. This result seems odd - presumably all the computing is in setting up the inverse; I would have thought the timing would be rather independent of the 2nd dimension of B.

[/quote]

Thanks for this analysis. Here are my results for large right-hand sides:

As you can see, my results are similar to yours. We've finally got some common ground! =)

It appears that this issue is unrelated to other slowdowns I've discovered in Matlab. As you said, it looks like it appears to be slow when the RHS is large. I'll do some more digging into this specific case and let you know what we find.

Dan

So the question is not so much why sgesv is slow, as why is it slow when the 2nd dimension of B is large. In my own application, this dimension is about 1000. This result seems odd - presumably all the computing is in setting up the inverse; I would have thought the timing would be rather independent of the 2nd dimension of B.

[/quote]

Thanks for this analysis. Here are my results for large right-hand sides:

- Code: Select all
`>> A = rand(2048,2048, 'single');`

>> B = rand(2048,64, 'single');

>> tic; A\B; toc;

Elapsed time is 0.468459 seconds.

>> tic; [x y z] = culaGesv(A,B); toc;

Elapsed time is 0.210827 seconds.

>> A = rand(2048,2048, 'single');

>> B = rand(2048,5000, 'single');

>> tic; A\B; toc;

Elapsed time is 1.570242 seconds.

>> tic; [x y z] = culaGesv(A,B); toc;

Elapsed time is 4.637038 seconds.

As you can see, my results are similar to yours. We've finally got some common ground! =)

It appears that this issue is unrelated to other slowdowns I've discovered in Matlab. As you said, it looks like it appears to be slow when the RHS is large. I'll do some more digging into this specific case and let you know what we find.

Dan

- dan
- Administrator
**Posts:**61**Joined:**Thu Jul 23, 2009 2:29 pm

### Re:sgesv in 1.1 is slow...

Hi:

Boxed Cyclon and Dan...I am happy to hear that there is a common ground with the problem... I will test my machine in next days to see also if the problem is a large B.

Thank you very much to all of you

By the way, if the problem is that B is large, then (I understand) that the culprit routines must be:

- swapping B (using ipiv)

- Computing the triangular systems.

Anyway, I did several test using sgetrf and also it was slow. Could you please to check if you have problems using sgetrf? I suppose that CULA sgev must be:

sgesv = sgetrf (computing LU) + sgetrs (swapping B, solving triangular systems)

jpeinado

Boxed Cyclon and Dan...I am happy to hear that there is a common ground with the problem... I will test my machine in next days to see also if the problem is a large B.

Thank you very much to all of you

By the way, if the problem is that B is large, then (I understand) that the culprit routines must be:

- swapping B (using ipiv)

- Computing the triangular systems.

Anyway, I did several test using sgetrf and also it was slow. Could you please to check if you have problems using sgetrf? I suppose that CULA sgev must be:

sgesv = sgetrf (computing LU) + sgetrs (swapping B, solving triangular systems)

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

Here comes the result out of my machine, working on complex precision

>> A = rand(2048,2048,'single')+1i*rand(2048,2048,'single');

>> B = rand(2048,64,'single')+1i*rand(2048,64,'single');

>> tic; A\B; toc

Elapsed time is 1.594184 seconds.

>> tic; x = culasv(A,B ); toc;

Elapsed time is 0.338316 seconds.

>> B = rand(2048,2048,'single')+1i*rand(2048,2048,'single');

>> tic; A\B; toc;

Elapsed time is 2.029847 seconds.

>> tic; x = culasv(A,B ); toc;

Elapsed time is 0.770844 seconds.

>> B = rand(2048,5000,'single')+1i*rand(2048,5000,'single');

>> tic; A\B; toc;

Elapsed time is 3.651626 seconds.

>> tic; x = culasv(A,B ); toc;

Elapsed time is 1.451895 seconds.

Note: culasv is "Cula Solver" using "Cgesv".

Speedup is obtained even where size of B >= size of A.

BR/CJ

>> A = rand(2048,2048,'single')+1i*rand(2048,2048,'single');

>> B = rand(2048,64,'single')+1i*rand(2048,64,'single');

>> tic; A\B; toc

Elapsed time is 1.594184 seconds.

>> tic; x = culasv(A,B ); toc;

Elapsed time is 0.338316 seconds.

>> B = rand(2048,2048,'single')+1i*rand(2048,2048,'single');

>> tic; A\B; toc;

Elapsed time is 2.029847 seconds.

>> tic; x = culasv(A,B ); toc;

Elapsed time is 0.770844 seconds.

>> B = rand(2048,5000,'single')+1i*rand(2048,5000,'single');

>> tic; A\B; toc;

Elapsed time is 3.651626 seconds.

>> tic; x = culasv(A,B ); toc;

Elapsed time is 1.451895 seconds.

Note: culasv is "Cula Solver" using "Cgesv".

Speedup is obtained even where size of B >= size of A.

BR/CJ

- cjest
**Posts:**12**Joined:**Wed Feb 10, 2010 3:01 pm

### Re:sgesv in 1.1 is slow...

My results:

Dan's Routine:

Changing the Matrix B

Using my routine called culaDeviceSgesv (similar to Boxed Cyclon's routine)

Changing the Matrix B

My results are very similar to yours

I used a new machine with a C2DUO processor and a Geforce GTX280

It would be very important to test if culaSgetrf works correctly. I looked for my tests and I did not this test yet. If culaSgetrf works correctly, then the problem could be in swapping B or solving the triangular systems

jpeinado

Dan's Routine:

- Code: Select all

A=rand(2048,2048,'single');

A=rand(2048,64,'single');

tic;A\B;toc

Elapsed time is 0.629523 seconds.

>> tic;A\B;toc

>> tic;[X]=culaGesv2(A,B) ;toc

Initializing CULA...

$$$$$$$$$$ 0.144 s

X-top = -9.435948e-02 3.882708e-01 -6.582417e-02

X-bottom = 4.102392e-01 4.056736e-01 -2.996187e-01

Elapsed time is 1.398374 seconds.

Changing the Matrix B

- Code: Select all
`>> B=rand(2048,2048,'single');`

>> tic;A\B;toc

Elapsed time is 1.220912 seconds.

>> tic;[X]=culaGesv2(A,B) ;toc

Initializing CULA...

$$$$$$$$$$ 1.848 s

X-top = 8.216147e-02 -1.769290e-01 3.114136e-01

X-bottom = 8.668031e-02 2.901745e-01 3.632181e-01

Elapsed time is 1.906207 seconds.

Using my routine called culaDeviceSgesv (similar to Boxed Cyclon's routine)

- Code: Select all
`A=rand(2048,2048,'single');`

A=rand(2048,64,'single');

tic;A\B;toc

Elapsed time is 0.579672 seconds.

tic;[X]=culaDeviceSgesv(A,B) ;toc

Elapsed time is 0.170306 seconds.

Changing the Matrix B

- Code: Select all
`>> B=rand(2048,2048,'single');`

>> tic;A\B;toc

Elapsed time is 1.322053 seconds.

>> tic;[X]=culaDeviceSgesv(A,B);toc

Elapsed time is 1.893691 seconds.

My results are very similar to yours

I used a new machine with a C2DUO processor and a Geforce GTX280

It would be very important to test if culaSgetrf works correctly. I looked for my tests and I did not this test yet. If culaSgetrf works correctly, then the problem could be in swapping B or solving the triangular systems

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

jpeinado wrote:My results:

Dan's Routine:

- Code: Select all

A=rand(2048,2048,'single');

A=rand(2048,64,'single');

tic;A\B;toc

Elapsed time is 0.629523 seconds.

>> tic;A\B;toc

>> tic;[X]=culaGesv2(A,B) ;toc

Initializing CULA...

$$$$$$$$$$ 0.144 s

X-top = -9.435948e-02 3.882708e-01 -6.582417e-02

X-bottom = 4.102392e-01 4.056736e-01 -2.996187e-01

Elapsed time is 1.398374 seconds.

Changing the Matrix B

- Code: Select all
`>> B=rand(2048,2048,'single');`

>> tic;A\B;toc

Elapsed time is 1.220912 seconds.

>> tic;[X]=culaGesv2(A,B) ;toc

Initializing CULA...

$$$$$$$$$$ 1.848 s

X-top = 8.216147e-02 -1.769290e-01 3.114136e-01

X-bottom = 8.668031e-02 2.901745e-01 3.632181e-01

Elapsed time is 1.906207 seconds.

jpeinado

Its important to run the CUDA test, at least, twice. The second time the device is initialized already, which makes it the better test. In the first test above the time "$$$$$$$$$$ 0.144 s" is the more accurate measure, rather than the "tic;toc" time of 1.398374 seconds. (I suspect you know this already... :) )

- Boxed Cylon
**Posts:**48**Joined:**Fri Oct 16, 2009 8:57 pm

### Re:sgesv in 1.1 is slow...

Boxed Cylon wrote:

Yes. In fact, I did before another run, to get the first test time, to avoid to include the init time :)

jpeinado

Its important to run the CUDA test, at least, twice. The second time the device is initialized already, which makes it the better test. In the first test above the time "$$$$$$$$$$ 0.144 s" is the more accurate measure, rather than the "tic;toc" time of 1.398374 seconds. (I suspect you know this already... :) )

Yes. In fact, I did before another run, to get the first test time, to avoid to include the init time :)

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

Just wanted to update here before the weekend with some good news - we found the problem that was causing the slowdown for the large NRHS. Currently I am solving the 2048 problem (with B sized at 2048x2048) in 0.19 seconds, and I think I still have room to make it go a bit quicker.

Also good news for everyone: gesv is seeing significantly increased speeds across the board (all sizes, all precisions). And more good news is that we should see some related improvements in a few other routines like gels and posv.

You can expect a service release on this one as early as next week. Thank you all for the very detailed feedback - it was very helpful in finding this.

John

Also good news for everyone: gesv is seeing significantly increased speeds across the board (all sizes, all precisions). And more good news is that we should see some related improvements in a few other routines like gels and posv.

You can expect a service release on this one as early as next week. Thank you all for the very detailed feedback - it was very helpful in finding this.

John

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re:sgesv in 1.1 is slow...

Ah ha! :)

The improvements won't be a game changer for me, but it will be nice to have my application run a little faster. And all is right with the universe again...

The improvements won't be a game changer for me, but it will be nice to have my application run a little faster. And all is right with the universe again...

- Boxed Cylon
**Posts:**48**Joined:**Fri Oct 16, 2009 8:57 pm

### Re:sgesv in 1.1 is slow...

john wrote:Just wanted to update here before the weekend with some good news - we found the problem that was causing the slowdown for the large NRHS

VERY GOOD NEWS!!! John!!! :) :)

For me, it is very important in my algorithms to have a good dgesv routine :) :)

Now, I have the UJI - CULAPACK sgetrf, but I think CULA routines could get better results. In my algorithms I have AX=B, where A and B are the same size. Then B is as large as A

Please let us know the advances you have

Thank you very much to all the CULA people... and very specially to Boxed Cyclon

Gracias !!!

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

Hi,

is it feasible to put Gesv in a kernel? if Yes, would that be a good way to speedup "\" operation for smaller matrices?

BR

is it feasible to put Gesv in a kernel? if Yes, would that be a good way to speedup "\" operation for smaller matrices?

BR

- cjest
**Posts:**12**Joined:**Wed Feb 10, 2010 3:01 pm

### Re:sgesv in 1.1 is slow...

cjest wrote:Hi,

is it feasible to put Gesv in a kernel? if Yes, would that be a good way to speedup "" operation for smaller matrices?

BR

Hi:

gesv (CULA) is a kernel. If you want to avoid to send the matrices from CPU to GPU in each iteration, it is possible to do it using CulaDeviceSgesv call.

I think it is not possible to speedup \ for smaller matrices doing what you want to do.

If you want to speedup your computing. For example you have a great loop with \ computing. You need that each \ be independent. Probably (if you are solving for example an iterative method) this is not possible.

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

john wrote:Just wanted to update here before the weekend with some good news - we found the problem that was causing the slowdown for the large NRHS. Currently I am solving the 2048 problem (with B sized at 2048x2048) in 0.19 seconds, and I think I still have room to make it go a bit quicker.

Also good news for everyone: gesv is seeing significantly increased speeds across the board (all sizes, all precisions). And more good news is that we should see some related improvements in a few other routines like gels and posv.

You can expect a service release on this one as early as next week. Thank you all for the very detailed feedback - it was very helpful in finding this.

John

Is there any new about this?

Thanks

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

Should be very soon. We have expanded the scope of work to have some far-reaching speedups across many of the CULA routines including gels, getrf, posv, and more.

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re:sgesv in 1.1 is slow...

Thank you very much

Jesus

Jesus

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

1.3 is released. I'm anxious to see the new results from this thread!

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Who is online

Users browsing this forum: No registered users and 1 guest