Slow TRTRI on C2050

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

Slow TRTRI on C2050

Postby katayama » Tue Jan 03, 2012 5:39 pm

Dear Experts,

We've been testing a few functions on C2050 and GTX580 and noticed that TRTRI is very slow on C2050, a half speed on GTX580. All others we tested are OK. For example, POTRF is twice faster on C2050 than GTX.

See attached.

We have three C2050 and three GTX580 and they are all the same.

Would anyone help?

Thanks,

Nobu
Attachments
C2050TRTRIdata-2e.pdf
(14.21 KiB) Downloaded 360 times
spec.pdf
(23.55 KiB) Downloaded 347 times
katayama
 
Posts: 4
Joined: Fri Oct 08, 2010 6:31 am

Re: Slow TRTRI on C2050

Postby kyle » Tue Jan 03, 2012 7:51 pm

Single (STRTRI) or double precision (DTRTRI)?
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Slow TRTRI on C2050

Postby katayama » Sun Jan 08, 2012 1:45 am

Sorry. It's double.

Nobu
katayama
 
Posts: 4
Joined: Fri Oct 08, 2010 6:31 am

Re: Slow TRTRI on C2050

Postby kyle » Sun Jan 08, 2012 5:15 pm

The Tesla line (i.e. C2050) has approximately 2x the number of double precision units as the GeForce line (i.e. GTX580). This translates to a 2x speedup for some double precision routines when using the Telsa line of GPUs.
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Slow TRTRI on C2050

Postby katayama » Tue Jan 10, 2012 4:21 am

Yes. That I know. I get twice as slow DTRTRI on C2050 as on GTX580. GTX580 is twice faster than C2050 when computing DTRTRI. It is the other way around from what I expect. See the attached files. The numbers listed are measured Gflops/s not the elapsed time.

This is why I am asking for a help

Best,
katayama
 
Posts: 4
Joined: Fri Oct 08, 2010 6:31 am

Re: Slow TRTRI on C2050

Postby john » Tue Jan 10, 2012 8:41 am

Hello, thank you for your report. We have identified the reason for the perf drop and made the correction. My perf number for dtrtri @ 6144 on the C2050 is now .36 seconds (214 GFLOPS). This will be available in CULA R14, which will be released a day or two after CUDA 4.1 goes final.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Slow TRTRI on C2050

Postby katayama » Tue Jan 10, 2012 3:17 pm

Hi John and Kyle,

Thanks for the investigation and a quick fix. Would it be possible to send binary for R13? My student is writing up a report for his graduation (B.S.) using CULA and would be nice to have him benchmark the new version.

Best

nobu
katayama
 
Posts: 4
Joined: Fri Oct 08, 2010 6:31 am

Re: Slow TRTRI on C2050

Postby john » Tue Jan 17, 2012 9:55 am

The new version will be available very soon with the updated code.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm


Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 2 guests

cron