Runtime error, help

Support for issues specific to the Linux operating systems.

Runtime error, help

Postby tsuitung » Tue Jun 14, 2011 7:45 pm

I'm using PGI compiler ( workstation 10.3 ) on centos 5.5 X64. I also using CUDA fortran language to program. When I try to combine them together, there is a runtime error.
Here is my source code:cluster.cuf
Code: Select all
program cluster
   use cudafor
   include 'mpif.h'       
   external cula_initialize
      external cula_cgesvd
      external cula_shutdown
   integer cula_initialize,cula_device_cgesvd !cula
   integer :: ierr,cpuid,numprocs,namelen !mpi
   character* (mpi_max_processor_name) processor_name
   integer :: gpuid,numdevices !gpu
   integer :: info
   type(cudadeviceprop) :: prop
   complex :: u(3,3),vt(4,4),a(3,4)
   real :: s(3)
   real :: start,finish
   complex,allocatable,device :: ad(:,:)
   integer :: pitch_ad
   complex,device :: ud(3,3),vtd(4,4)
   real,device :: sd(3)

   info=cudaGetDeviceCount(numdevices)
   m=3
   n=4
   lda=3
   ldu=3
   ldvt=4
   a=reshape((/(5.91,-5.69),(-3.15,-4.08),(-4.89,4.20),(7.09,2.72),(-1.89,3.27),(4.10,-6.70),(7.78,-4.06),(4.57,-2.07),(3.28,-3.84),(-0.79,-7.21),(-3.88,-3.30),(3.84,1.19)/),(/3,4/))
   info=cudamallocpitch(ad,pitch_ad,n,m)
   info=cudamemcpy2d(ad,pitch_ad,a,n*4,n*4,m,cudamemcpyhosttodevice)
   call mpi_init(ierr)
   call mpi_comm_rank(mpi_comm_world,cpuid,ierr)
   call mpi_comm_size(mpi_comm_world,numprocs,ierr)
   call mpi_get_processor_name(processor_name,namelen,ierr)
   gpuid=mod(cpuid,numdevices)
   info=cudasetdevice(gpuid)
   info=cudagetdeviceProperties(prop,gpuid)
   write(*,"(a9,i2,a12)") "There are",numdevices,"GPU device!"
   write (*,"(a21,i2,a4,i1,a4,a30)"), "Hello world! process ",cpuid," of ",numprocs," on ",processor_name
   write (*,"(a6,i2)") "GPU id",gpuid
   write (*,"(a12,a20)") "Device name ",prop%name
   !Initialize CULA
      info=cula_initialize()
   call check_status(info)
   call cpu_time(start)
   info=cula_device_cgesvd('a','a', M, N, ad, LDA, sd,ud, LDU,vtd, LDVT)
   !call check_status(info)
   call cpu_time(finish)
   info=cudamemcpy(s,sd,3,cudamemcpydevicetohost)
   write(*,*) s
   write(*,*) "GPU time=",finish-start,"s"
     call cula_shutdown()
   info=cudafree(ad)
   info=cudafree(sd)

   call mpi_finalize(ierr)
end

subroutine check_status(culastatus)
   integer culastatus
   integer info
   integer cula_geterrorinfo

   info = cula_geterrorinfo()
   if (culastatus .ne. 0) then
      if (culastatus .eq. 7) then
         !culaargumenterror
         write(*,*) 'invalid value for parameter ', info
      else if (culastatus .eq. 8) then
         !culadataerror
         write(*,*) 'data error (', info ,')'
      else if (culastatus .eq. 9) then
         !culablaserror
         write(*,*) 'blas error (', info ,')'
      else if (culastatus .eq. 10) then
         !cularuntimeerror
         write(*,*) 'runtime error (', info ,')'
      else
         !others
         call cula_getstatusstring(culastatus)
      endif
      stop 1
   end if
end subroutine check_status

My makefile
Code: Select all
FC=pgfortran

#Change to -Mmpi2 for MPICH2
MPI=-Mmpi
#add cuf
CUDA=-ta=nvidia -Mcuda
#lib
LIB=-L/usr/local/cula/lib64 -lcula_pgfortran -llapack -lblas
mpihello:
   $(FC) $(MPI) $(CUDA) $(LIB) -o cluster cluster.cuf

Results
Code: Select all
[root@localhosts1 cluster]# mpirun -np 1 cluster
There are 3 GPU device!
Hello world! process  0 of 1 on localhosts1                   
GPU id 0
Device name Tesla C1060         
runtime error (           36 )
    1

Look forward to your help!
Thanks!
Last edited by tsuitung on Fri Jun 17, 2011 5:53 am, edited 1 time in total.
tsuitung
 
Posts: 8
Joined: Mon Nov 15, 2010 7:08 pm

Re: MPI leads to runtime error

Postby tsuitung » Fri Jun 17, 2011 3:54 am

Is there anybody can help me?
tsuitung
 
Posts: 8
Joined: Mon Nov 15, 2010 7:08 pm

Re: Runtime error, help

Postby kyle » Fri Jun 17, 2011 6:15 am

Can you run the program using other CUDA libraries like CUFFT or CUBLAS?
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Runtime error, help

Postby tsuitung » Fri Jun 17, 2011 10:54 pm

kyle wrote:Can you run the program using other CUDA libraries like CUFFT or CUBLAS?

I didn't test other lib, but i have found that the problem is
Code: Select all
info=cudasetdevice(gpuid)
conflicts with
Code: Select all
info=cula_initialize()
. I can get program run if any of them removed. Of course if
Code: Select all
info=cula_initialize()
removed, we can't get right answer.
So, is there any solutions?
waiting for your answer.
Last edited by tsuitung on Tue Jun 21, 2011 7:33 pm, edited 1 time in total.
tsuitung
 
Posts: 8
Joined: Mon Nov 15, 2010 7:08 pm

Re: Runtime error, help

Postby cding » Sun Jun 19, 2011 4:24 pm

I'm trying to use cuda fortran and mpi with cula together. I saw you compile the mpi and cuda fortran code by pgfortran -Mmpi -Mcuda together. Do you think that's ok to compile them separately. Like, compile .cuf by pgfortran, compile mpi code which calls the subroutine in .cuf by pgi mpif90.

Thank you very much.
cding
 
Posts: 15
Joined: Tue Sep 14, 2010 8:25 pm

Re: Runtime error, help

Postby tsuitung » Sun Jun 19, 2011 7:34 pm

cding wrote:I'm trying to use cuda fortran and mpi with cula together. I saw you compile the mpi and cuda fortran code by pgfortran -Mmpi -Mcuda together. Do you think that's ok to compile them separately. Like, compile .cuf by pgfortran, compile mpi code which calls the subroutine in .cuf by pgi mpif90.

Thank you very much.


pgfortran has MPICH1 installed. So if you add the flag "-Mmpi", the compiler will add mpi automatically. More details you can see the pgi reference.
tsuitung
 
Posts: 8
Joined: Mon Nov 15, 2010 7:08 pm

Re: Runtime error, help

Postby cding » Mon Jun 20, 2011 9:26 am

tsuitung wrote:
cding wrote:I'm trying to use cuda fortran and mpi with cula together. I saw you compile the mpi and cuda fortran code by pgfortran -Mmpi -Mcuda together. Do you think that's ok to compile them separately. Like, compile .cuf by pgfortran, compile mpi code which calls the subroutine in .cuf by pgi mpif90.

Thank you very much.


pgfortran has MPICH1 installed. So if you add the flag "-Mmpi", the compiler will add mpi automatically. More details you can see the pgi reference.


Thank you so much.
cding
 
Posts: 15
Joined: Tue Sep 14, 2010 8:25 pm

Re: Runtime error, help

Postby john » Fri Jun 24, 2011 1:35 pm

tsuitung wrote:
kyle wrote:Can you run the program using other CUDA libraries like CUFFT or CUBLAS?

I didn't test other lib, but i have found that the problem is
Code: Select all
info=cudasetdevice(gpuid)
conflicts with
Code: Select all
info=cula_initialize()
. I can get program run if any of them removed. Of course if
Code: Select all
info=cula_initialize()
removed, we can't get right answer.
So, is there any solutions?
waiting for your answer.

CULA is programmed such that calling culaInitialize after cudaSetDevice will result in CULA respecting the bound device. I would have thought it impossible for culaInitialize to emit error 36 (cudaSetOnActiveProcess). We'll look into that for you.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Runtime error, help

Postby cding » Thu Jun 30, 2011 10:30 am

tsuitung wrote:
kyle wrote:Can you run the program using other CUDA libraries like CUFFT or CUBLAS?

I didn't test other lib, but i have found that the problem is
Code: Select all
info=cudasetdevice(gpuid)
conflicts with
Code: Select all
info=cula_initialize()
. I can get program run if any of them removed. Of course if
Code: Select all
info=cula_initialize()
removed, we can't get right answer.
So, is there any solutions?
waiting for your answer.


I use the same code with yours. The only change is that I force it to choose the device with ID 2.
Code: Select all
info=cudasetdevice(2)

When I tried it, it failed to select the right one for me. Still it would use the 0th deivce. I check the value of "info" which is the return value after call the cudasetdevice() function. The value is 36, which means runtime_error(36).

Did you get the same error or Did you checked the value of info after calling cudasetdevice(). Cuz it's not like CULA functions which have the step to check status by using "check_status(info)".

Could you try it if you have time? I have got stucked on MPI+CULA+CUDA_FORTAN with PGI compiler for a very long time.

Plus, did you try it on more than 1 cpu? I use "mpirun -np 4 cluster", but still, it executed on one cpu.
cding
 
Posts: 15
Joined: Tue Sep 14, 2010 8:25 pm


Return to Linux Support

Who is online

Users browsing this forum: No registered users and 1 guest

cron