sgesv in 1.1 is slow...

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

Re:sgesv in 1.1 is slow...

Postby Boxed Cylon » Thu Apr 08, 2010 9:25 pm

As you can see from the figure below, the speed of sgesv in CULA 1.3 in my little test routine is much improved. In this benchmark Nrhs=5000. CULA 1.2 gave speed ratios Tcpu/Tgpu of around 0.6, that is the cpu (AMD Phenom II 965, single core) was faster than the gpu. With CULA 1.3 this ratio is now approaching 2, or a 3-fold improvement in computation speed compared to CULA 1.2. I am using the new CUDA 3.0.

My own application now completes in about 1 hr, rather than the ca. 1.5 hours before, which is a very nice improvement. CULA 1.2 was a division laggard, to be sure...now fixed!

Thanks to all you CULA people for sorting this out and working out a fix!

Image
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re:sgesv in 1.1 is slow...

Postby john » Fri Apr 09, 2010 8:13 am

Good to see, thanks Boxed. And thanks for bearing with us there, I'm glad we were able to get that issue taken care of.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re:sgesv in 1.1 is slow...

Postby jpeinado » Wed Apr 14, 2010 6:17 am

Hi John, Boxed Cyclon..etc:


I am working in new version 1.3. I must say that first results are impressive.... :woohoo:


At this time, I am using the free version, waiting that system administrator of my machine, installs the complate version....


Tomorrow, I will put reults....I must do more test...

Now, I am working with two different machines, with different CPUS and GPUS.....

At first sight, I think it works very well....

jpeinado
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Re:sgesv in 1.1 is slow...

Postby jpeinado » Thu Apr 15, 2010 12:38 am

My results using two different machines on CULA 1.2 and 1.3 from MATLAB

UJI-CULAPACK is a particular version for GPU of some LAPACK routines for GPU
it is done in Universitat Jaume I of Castellon (Spain). It was the fastest version compatible with MATLAB...but now...


Code: Select all
X=A\B

A y B are matrices of same size


Machine Dualcore E7300  + GTX280


CULA 1.2 culaDeviceSgesv


Size      SpeedUp    GflopsCPU    GflopsGPU
-----------------------------------------------------------
  128       0.16        5.69         0.92   
  256       0.28       11.55         3.24   
  512       0.50       14.28         7.12   
1024       0.70       15.29        10.68   
2048       0.62       19.43        12.11   
4096       0.50       25.37        12.77   
8192       0.44       29.40        13.03   



cula 1.3 culaDeviceSgesv


Size      SpeedUp    GflopsCPU    GflopsGPU
-----------------------------------------------------------
  128       0.33        3.82         1.27   
  256       0.52       11.54         6.00   
  512       1.44       15.02        21.56   
1024       4.54       14.86        67.46   
2048       6.35       19.43       123.38   
4096       7.81       25.39       198.32   
8192       8.72       29.25       254.88   


uji-culapack CULAPACK LU + CUBLAS triangular systems

Size      SpeedUp    GflopsCPU    GflopsGPU
-----------------------------------------------------------
  128       0.11        3.80         0.42   
  256       0.13       11.62         1.55   
  512       0.19       15.17         2.87   
1024       0.68       15.03        10.19   
2048       1.57       20.10        31.53   
4096       2.82       25.58        72.02   
8192       3.63       29.22       105.96 

Machine QuadCore E5430 + Quadro FX5800

CULA 1.2 culaDeviceSgesV

Size      SpeedUp    GflopsCPU    GflopsGPU
-----------------------------------------------------------
  128       0.10        5.05         0.53   
  256       0.23       10.91         2.50   
  512       0.35       18.90         6.57   
1024       0.44       23.12        10.15   
2048       0.44       26.97        11.85   
4096       0.32       37.99        12.32   
8192       0.25       48.87        12.46   




cula 1.3 culaDeviceSgesV

Size      SpeedUp    GflopsCPU    GflopsGPU
-----------------------------------------------------------
  128       0.12        5.21         0.62   
  256       0.38       10.75         4.05   
  512       0.97       18.82        18.34   
1024       2.67       23.16        61.81   
2048       4.44       28.31       125.79   
4096       5.32       37.59       200.05   
8192       5.35       48.31       258.50   



uji-culapack   UJI-CULAPACK LU + CUBLAS triangular systems

Size      SpeedUp    GflopsCPU    GflopsGPU
-----------------------------------------------------------
  128       0.07        5.24         0.38   
  256       0.12       11.89         1.42   
  512       0.26       19.04         5.02   
1024       0.71       22.55        15.97   
2048       1.47       28.48        41.93   
4096       2.25       38.06        85.72   
8192       2.50       48.85       122.35   






The results on CULA 1.3 are impressive....

Congratulations to the CULA team...


By the way, John...one question!!! Are you using hybrid algorithms....?

jpeinado
jpeinado
 
Posts: 37
Joined: Mon Sep 14, 2009 10:48 am

Previous

Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 1 guest

cron