Poor CULA dtrsm performance

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

Poor CULA dtrsm performance

Postby jhogg » Fri Dec 14, 2012 8:22 am

Hi,

I'm just benchmarking CULA dtrsm with a single right-hand-side against CUBLAS, MAGMA, Host MKL and my own code to solve Lx=b. CULA seems to be coming off really badly, as evidenced by the numbers below.

I'm using the following code:

culaInitialize();
cudaThreadSynchronize();
clock_gettime(CLOCK_REALTIME, &tp1);
culaDeviceDtrsm('L', 'L', 'N', 'U', n, 1, double(1.0), a_gpu, lda, x_gpu, n);
cudaThreadSynchronize();
clock_gettime(CLOCK_REALTIME, &tp2);
culaShutdown();

Am I doing something wrong?

Thanks,

Jonathan.

========================
Results:

n=100
CPU BLAS took 0.000045
CUBLAS BLAS took 0.000114
CULA Dense BLAS took 0.000267
My BLAS took 0.000052

n=1000
CPU BLAS took 0.000596
CUBLAS BLAS took 0.001731
CULA Dense BLAS took 0.001925
My BLAS took 0.000854

n=10000
CPU BLAS took 0.027538
CUBLAS BLAS took 0.028302
CULA Dense BLAS took 0.079307
My BLAS took 0.004455

n=16000
CPU BLAS took 0.067040
CUBLAS BLAS took 0.049763
CULA Dense BLAS took 0.183970
My BLAS took 0.009597
jhogg
 
Posts: 2
Joined: Thu Dec 13, 2012 8:54 am

Re: Poor CULA dtrsm performance

Postby john » Fri Dec 14, 2012 3:44 pm

For starters you don't need a cudaThreadSync after CULA routines - we have a synchronize inside our routines. We'll continue to consider the remainders. As of R17, our routines will be all fall throughs to CUBLAS, so there will no longer be custom CULA code there.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Poor CULA dtrsm performance

Postby jhogg » Mon Dec 17, 2012 3:02 am

CudaThreadSync was just there to ensure timing was fair with other routines, which do need it there as they don't autoamtically cause a host-gpu sync so timings would otherwise be inaccurate. Noted re R17. NVIDIA are looking at adopting my code for CUBLAS , but I need to compare against other implementations if they are available.

Regards,

Jonathan.
jhogg
 
Posts: 2
Joined: Thu Dec 13, 2012 8:54 am


Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 2 guests

cron