Poor cula performance

Support for issues specific to the Linux operating systems.

Poor cula performance

Postby cbv3b » Thu Jun 23, 2011 10:28 am

I have two servers (4X Intel(R) Core(TM)2 Extreme CPU X9770 @ 3.20GHz with a GTX 285 and 4X CPU0: Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz with a GTX 480 and C2050) running ubuntu 10.04.2 LTS and CUDA 3.2. When I run the benchmark (results below) I see considerably worse performance on the server with the fermi cards. Also concerning is that the server with the fermi cards takes 40x longer to run the SGESVD using MK. Any ideas on tracking this down?

As a side note I tried to post this in the private support forum but I couldn't start a post.

Sever with GTX 285
Code: Select all
Initializing CULA...
Initializing MKL...

Benchmarking the following functions:
-------------------------------------
             SGEQRF
             SGETRF
             SGELS
             SGGLSE
             SGESV
             SGESVD
-------------------------------------


     -- SGEQRF Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       0.65       1.21    1.8576
5120       1.17       2.23    1.9055
6144       1.86       3.82    2.0579
7168       2.87       6.05    2.1079
8192       4.13      11.97    2.8967

     -- SGETRF Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       0.30       0.78    2.6007
5120       0.53       1.39    2.6198
6144       0.65       2.42    3.6913
7168       1.34       3.73    2.7868
8192       1.97       6.14    3.1113

     -- SGELS Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       0.87       1.63    1.8841
5120       1.48       3.06    2.0755
6144       2.31       5.41    2.3451
7168       3.42       8.95    2.6173
8192       4.81      13.89    2.8897

     -- SGGLSE Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       0.91       4.63    5.1123
5120       1.55       7.47    4.8292
6144       2.41      11.41    4.7292
7168       3.59      16.68    4.6429
8192       5.05      24.53    4.8588

     -- SGESV Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       0.44       0.81    1.8369
5120       0.74       1.41    1.8999
6144       0.95       2.31    2.4439
7168       1.73       3.58    2.0754
8192       2.47       5.88    2.3818

     -- SGESVD Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096      22.49     116.07    5.1620
5120      37.54     206.11    5.4901
6144      51.61     338.14    6.5514
7168      85.09     557.83    6.5558
8192     120.41     776.76    6.4508


Sever with GTX480

Code: Select all
Initializing CULA...
Initializing MKL...

Benchmarking the following functions:
-------------------------------------
             SGEQRF
             SGETRF
             SGELS
             SGGLSE
             SGESV
             SGESVD
-------------------------------------


     -- SGEQRF Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       6.18       3.62    0.5868
5120       7.76       5.44    0.7017
6144       9.26       8.03    0.8669
7168       5.79      10.80    1.8631
8192      12.54      21.74    1.7331

     -- SGETRF Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       1.47       1.57    1.0689
5120       2.90       3.08    1.0616
6144       3.39       5.18    1.5268
7168       4.14       7.62    1.8390
8192       5.22      11.07    2.1181

     -- SGELS Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       8.03       3.47    0.4319
5120      14.38       5.99    0.4166
6144      17.32       9.45    0.5457
7168      19.75      12.76    0.6465
8192      22.79      18.95    0.8317

     -- SGGLSE Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096      11.62      11.47    0.9865
5120      14.66      16.75    1.1427
6144      17.59      24.75    1.4071
7168      20.76      33.80    1.6283
8192      24.22      43.62    1.8011

     -- SGESV Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096       2.31       2.19    0.9465
5120       3.20       2.83    0.8836
6144       3.77       5.25    1.3941
7168       2.22       7.85    3.5339
8192       5.41      12.27    2.2703

     -- SGESVD Benchmark  --

Size   CULA (s)    MKL (s)   Speedup
------ ---------- ---------- ---------
4096     644.57    4155.89    6.4475
cbv3b
 
Posts: 3
Joined: Wed Nov 10, 2010 2:05 pm

Re: Poor cula performance

Postby kyle » Thu Jun 23, 2011 12:23 pm

Private support is for non-academic purchases, only.

With regards to your speed, are you sure your benchmarks are running on the correct device? Are your servers under heavy load when you are benchmarking? Those numbers seem very low for both the CPU and GPU.
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Poor cula performance

Postby cbv3b » Thu Jun 23, 2011 1:25 pm

Yes the benchmarks are running on the correct device, and top showed that only a few root threads were running. What was concerning to me was that he FERMI devices were so much slower than the TESLA device...
cbv3b
 
Posts: 3
Joined: Wed Nov 10, 2010 2:05 pm

Re: Poor cula performance

Postby kyle » Thu Jun 23, 2011 1:27 pm

For single precision, there should be about a 20% performance increase between generation. For double precision that increase is upwards of 100%.

I'd suggest running other CUDA SDK examples to further benchmark your performance.
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Poor cula performance

Postby cbv3b » Fri Jun 24, 2011 7:56 am

Looks like I'm have some problems with some of the SDK routines too. Should have checked this first. Thanks :)
cbv3b
 
Posts: 3
Joined: Wed Nov 10, 2010 2:05 pm


Return to Linux Support

Who is online

Users browsing this forum: No registered users and 2 guests

cron