Is it a terrible bug of CULA?

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

Is it a terrible bug of CULA?

Postby xhsh » Sun Dec 22, 2013 12:47 am

I am calling the "zgesv" subroutine in CULA, namely, cula_zgesv with MPI. If I use one MPI process with one K20c card, it takes 0.7435620 seconds for a 3000*3000 matrix. However, if I use two MPI processes with two K20c cards, it takes 7.932089 seconds. It is ten times longer. For a 2000*2000 matrix, it takes 0.28 seconds and 5.6 seconds respectively. Why is there such a big difference? Is it a terriible bug or just I have done something wrong? In the following, I paste my code(It is a very simple code):

Code: Select all

PROGRAM cula_test


use cudafor
use cula_status
use cula_lapack
use cula_lapack_device_pgfortran

IMPLICIT NONE
include 'mpif.h'
INTEGER :: n
complex*16, allocatable::A(:,:),U(:,:)
integer,allocatable::ipiv(:)
integer I,J,info,MPIerror,node,Nnodes
real*8 c,d
real*4 t1,t2

external cula_initialize
external cula_shutdown
external cudasetdevice

call MPI_Init( MPIerror )
call MPI_Comm_Rank( MPI_Comm_World, Node, MPIerror )
call MPI_Comm_Size( MPI_Comm_World, Nnodes, MPIerror )


if(node.eq.0) info=cudasetdevice(0)
if(node.eq.1) info=cudasetdevice(1)

info = cula_initialize()
n = 3000

ALLOCATE(A(n,n),U(n,n), ipiv(n))

do I = 1,N
   do J = 1, N
      call random_number(c)
      call random_number(d)
      A(I,J)=dcmplx(c,d)
   enddo
enddo

U(:,:)=(0.d0,0.d0)
do I = 1, N
   U(I,I)=(1.d0,0.d0)
enddo

call cpu_time(t1)
info= cula_zgesv(n,n,A,n,ipiv,U,n)
call cpu_time(t2)
print *,'GPU: ', U(1,1),t2-t1

deallocate(A,U,ipiv)

call cula_shutdown()
call MPI_FINALIZE(MPIerror)

END


Since there are not any communications between the different MPI processes, I think the time should be approximately the same no matter ONE or TWO processes are usded. In fact, the time is the same when I call the "zgemm" subroutine in CUBLAS with one or two MPI processes.

So, could anybody tell me why I see such a problem in CULA but not in CUBLAS and how to deal with it? I have been confused about it for several months.
Last edited by xhsh on Sun Dec 22, 2013 12:55 am, edited 2 times in total.
xhsh
 
Posts: 8
Joined: Wed Feb 23, 2011 5:42 pm

Re: Is it a terrible bug of CULA?

Postby xhsh » Sun Dec 22, 2013 12:53 am

Following the previous post, for one MPI process and one K20c card, the output is:

Code: Select all
GPU:   (-8.6050432536450713E-002,-0.1393513431401034)   0.7435620


For two MPI processes and two K20c card, the output is:

Code: Select all
GPU:   (-8.6050432536450713E-002,-0.1393513431401034)    7.932089
GPU:   (-8.6050432536450713E-002,-0.1393513431401034)    8.396360
xhsh
 
Posts: 8
Joined: Wed Feb 23, 2011 5:42 pm


Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 2 guests

cron