sgesv in 1.1 is slow...

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

Re:sgesv in 1.1 is slow...

Postby dan » Thu Mar 04, 2010 1:40 pm

Boxed Cylon wrote:
So the question is not so much why sgesv is slow, as why is it slow when the 2nd dimension of B is large. In my own application, this dimension is about 1000. This result seems odd - presumably all the computing is in setting up the inverse; I would have thought the timing would be rather independent of the 2nd dimension of B.
[/quote]

Thanks for this analysis. Here are my results for large right-hand sides:

Code: Select all
>> A = rand(2048,2048, 'single');
>> B = rand(2048,64, 'single');
>> tic; A\B; toc;
Elapsed time is 0.468459 seconds.
>> tic; [x y z] = culaGesv(A,B); toc;
Elapsed time is 0.210827 seconds.
>> A = rand(2048,2048, 'single');
>> B = rand(2048,5000, 'single');
>> tic; A\B; toc;
Elapsed time is 1.570242 seconds.
>> tic; [x y z] = culaGesv(A,B); toc;
Elapsed time is 4.637038 seconds.


As you can see, my results are similar to yours. We've finally got some common ground! =)

It appears that this issue is unrelated to other slowdowns I've discovered in Matlab. As you said, it looks like it appears to be slow when the RHS is large. I'll do some more digging into this specific case and let you know what we find.

Dan
dan
Administrator
 
Posts: 61
Joined: Thu Jul 23, 2009 2:29 pm

Re:sgesv in 1.1 is slow...

Postby jpeinado » Thu Mar 04, 2010 2:05 pm

Hi:


Boxed Cyclon and Dan...I am happy to hear that there is a common ground with the problem... I will test my machine in next days to see also if the problem is a large B.


Thank you very much to all of you


By the way, if the problem is that B is large, then (I understand) that the culprit routines must be:


- swapping B (using ipiv)


- Computing the triangular systems.



Anyway, I did several test using sgetrf and also it was slow. Could you please to check if you have problems using sgetrf? I suppose that CULA sgev must be:

sgesv = sgetrf (computing LU) + sgetrs (swapping B, solving triangular systems)


jpeinado
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Re:sgesv in 1.1 is slow...

Postby cjest » Fri Mar 05, 2010 1:31 am

Here comes the result out of my machine, working on complex precision

>> A = rand(2048,2048,'single')+1i*rand(2048,2048,'single');
>> B = rand(2048,64,'single')+1i*rand(2048,64,'single');
>> tic; A\B; toc
Elapsed time is 1.594184 seconds.
>> tic; x = culasv(A,B ); toc;
Elapsed time is 0.338316 seconds.
>> B = rand(2048,2048,'single')+1i*rand(2048,2048,'single');
>> tic; A\B; toc;
Elapsed time is 2.029847 seconds.
>> tic; x = culasv(A,B ); toc;
Elapsed time is 0.770844 seconds.
>> B = rand(2048,5000,'single')+1i*rand(2048,5000,'single');
>> tic; A\B; toc;
Elapsed time is 3.651626 seconds.
>> tic; x = culasv(A,B ); toc;
Elapsed time is 1.451895 seconds.

Note: culasv is "Cula Solver" using "Cgesv".
Speedup is obtained even where size of B >= size of A.


BR/CJ
cjest
 
Posts: 12
Joined: Wed Feb 10, 2010 3:01 pm

Re:sgesv in 1.1 is slow...

Postby jpeinado » Fri Mar 05, 2010 3:20 am

My results:

Dan's Routine:

Code: Select all

A=rand(2048,2048,'single');
A=rand(2048,64,'single');

tic;A\B;toc
Elapsed time is 0.629523 seconds.
>> tic;A\B;toc

>> tic;[X]=culaGesv2(A,B) ;toc
Initializing CULA...
$$$$$$$$$$  0.144 s
X-top = -9.435948e-02 3.882708e-01 -6.582417e-02
X-bottom = 4.102392e-01 4.056736e-01 -2.996187e-01
Elapsed time is 1.398374 seconds.



Changing the Matrix B

Code: Select all
>> B=rand(2048,2048,'single');
>> tic;A\B;toc
Elapsed time is 1.220912 seconds.
>> tic;[X]=culaGesv2(A,B) ;toc
Initializing CULA...
$$$$$$$$$$  1.848 s
X-top = 8.216147e-02 -1.769290e-01 3.114136e-01
X-bottom = 8.668031e-02 2.901745e-01 3.632181e-01
Elapsed time is 1.906207 seconds.




Using my routine called culaDeviceSgesv (similar to Boxed Cyclon's routine)

Code: Select all
A=rand(2048,2048,'single');
A=rand(2048,64,'single');

tic;A\B;toc
Elapsed time is 0.579672 seconds.

tic;[X]=culaDeviceSgesv(A,B) ;toc
Elapsed time is 0.170306 seconds.


Changing the Matrix B

Code: Select all
>> B=rand(2048,2048,'single');
>> tic;A\B;toc
Elapsed time is 1.322053 seconds.
>> tic;[X]=culaDeviceSgesv(A,B);toc
Elapsed time is 1.893691 seconds.


My results are very similar to yours
I used a new machine with a C2DUO processor and a Geforce GTX280


It would be very important to test if culaSgetrf works correctly. I looked for my tests and I did not this test yet. If culaSgetrf works correctly, then the problem could be in swapping B or solving the triangular systems



jpeinado
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Re:sgesv in 1.1 is slow...

Postby Boxed Cylon » Fri Mar 05, 2010 8:23 am

jpeinado wrote:My results:

Dan's Routine:

Code: Select all

A=rand(2048,2048,'single');
A=rand(2048,64,'single');

tic;A\B;toc
Elapsed time is 0.629523 seconds.
>> tic;A\B;toc

>> tic;[X]=culaGesv2(A,B) ;toc
Initializing CULA...
$$$$$$$$$$  0.144 s
X-top = -9.435948e-02 3.882708e-01 -6.582417e-02
X-bottom = 4.102392e-01 4.056736e-01 -2.996187e-01
Elapsed time is 1.398374 seconds.



Changing the Matrix B

Code: Select all
>> B=rand(2048,2048,'single');
>> tic;A\B;toc
Elapsed time is 1.220912 seconds.
>> tic;[X]=culaGesv2(A,B) ;toc
Initializing CULA...
$$$$$$$$$$  1.848 s
X-top = 8.216147e-02 -1.769290e-01 3.114136e-01
X-bottom = 8.668031e-02 2.901745e-01 3.632181e-01
Elapsed time is 1.906207 seconds.


jpeinado


Its important to run the CUDA test, at least, twice. The second time the device is initialized already, which makes it the better test. In the first test above the time "$$$$$$$$$$ 0.144 s" is the more accurate measure, rather than the "tic;toc" time of 1.398374 seconds. (I suspect you know this already... :) )
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re:sgesv in 1.1 is slow...

Postby jpeinado » Fri Mar 05, 2010 8:42 am

Boxed Cylon wrote:

Its important to run the CUDA test, at least, twice. The second time the device is initialized already, which makes it the better test. In the first test above the time "$$$$$$$$$$ 0.144 s" is the more accurate measure, rather than the "tic;toc" time of 1.398374 seconds. (I suspect you know this already... :) )



Yes. In fact, I did before another run, to get the first test time, to avoid to include the init time :)


jpeinado
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Re:sgesv in 1.1 is slow...

Postby john » Fri Mar 05, 2010 4:04 pm

Just wanted to update here before the weekend with some good news - we found the problem that was causing the slowdown for the large NRHS. Currently I am solving the 2048 problem (with B sized at 2048x2048) in 0.19 seconds, and I think I still have room to make it go a bit quicker.

Also good news for everyone: gesv is seeing significantly increased speeds across the board (all sizes, all precisions). And more good news is that we should see some related improvements in a few other routines like gels and posv.

You can expect a service release on this one as early as next week. Thank you all for the very detailed feedback - it was very helpful in finding this.

John
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re:sgesv in 1.1 is slow...

Postby Boxed Cylon » Fri Mar 05, 2010 5:40 pm

Ah ha! :)

The improvements won't be a game changer for me, but it will be nice to have my application run a little faster. And all is right with the universe again...
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re:sgesv in 1.1 is slow...

Postby jpeinado » Sat Mar 06, 2010 7:48 am

john wrote:Just wanted to update here before the weekend with some good news - we found the problem that was causing the slowdown for the large NRHS




VERY GOOD NEWS!!! John!!! :) :)


For me, it is very important in my algorithms to have a good dgesv routine :) :)


Now, I have the UJI - CULAPACK sgetrf, but I think CULA routines could get better results. In my algorithms I have AX=B, where A and B are the same size. Then B is as large as A


Please let us know the advances you have


Thank you very much to all the CULA people... and very specially to Boxed Cyclon


Gracias !!!


jpeinado
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Re:sgesv in 1.1 is slow...

Postby cjest » Wed Mar 10, 2010 5:46 am

Hi,
is it feasible to put Gesv in a kernel? if Yes, would that be a good way to speedup "\" operation for smaller matrices?

BR
cjest
 
Posts: 12
Joined: Wed Feb 10, 2010 3:01 pm

Re:sgesv in 1.1 is slow...

Postby jpeinado » Thu Mar 11, 2010 10:43 am

cjest wrote:Hi,
is it feasible to put Gesv in a kernel? if Yes, would that be a good way to speedup "" operation for smaller matrices?

BR


Hi:

gesv (CULA) is a kernel. If you want to avoid to send the matrices from CPU to GPU in each iteration, it is possible to do it using CulaDeviceSgesv call.

I think it is not possible to speedup \ for smaller matrices doing what you want to do.

If you want to speedup your computing. For example you have a great loop with \ computing. You need that each \ be independent. Probably (if you are solving for example an iterative method) this is not possible.

jpeinado
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Re:sgesv in 1.1 is slow...

Postby jpeinado » Mon Mar 22, 2010 3:30 pm

john wrote:Just wanted to update here before the weekend with some good news - we found the problem that was causing the slowdown for the large NRHS. Currently I am solving the 2048 problem (with B sized at 2048x2048) in 0.19 seconds, and I think I still have room to make it go a bit quicker.

Also good news for everyone: gesv is seeing significantly increased speeds across the board (all sizes, all precisions). And more good news is that we should see some related improvements in a few other routines like gels and posv.

You can expect a service release on this one as early as next week. Thank you all for the very detailed feedback - it was very helpful in finding this.

John


Is there any new about this?

Thanks

jpeinado
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Re:sgesv in 1.1 is slow...

Postby john » Tue Mar 30, 2010 7:07 am

Should be very soon. We have expanded the scope of work to have some far-reaching speedups across many of the CULA routines including gels, getrf, posv, and more.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re:sgesv in 1.1 is slow...

Postby jpeinado » Sun Apr 04, 2010 5:03 am

Thank you very much


Jesus
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Re:sgesv in 1.1 is slow...

Postby john » Thu Apr 08, 2010 2:06 pm

1.3 is released. I'm anxious to see the new results from this thread!
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

PreviousNext

Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 1 guest

cron