Problem with calling device interface in Fortran

Support for issues specific to the Windows operating systems.

Problem with calling device interface in Fortran

Postby Tomek » Mon Aug 08, 2011 12:12 pm

Currently I’m using CULA R12 Premium with NVIDIA CUDA 4.0 drivers. When I use CULA_DEVICE_SGESV routine in Win7 (64-bit) the program hanged (no errors reported during compilation). However CULA_SGESV routine runs well. I prefer to use the device interface to reduce the I/O time. I’m running Visual Studio 2008, and use Fermi class card (GTX 580) and PGI Visual Fortran Compiler ver 11.6.
Here’s the source code I used to test CULA library interfaces (host and device).
Code: Select all
       module cula_test
            use cudafor
           
            contains       
! --------------------------------------------------                             
            ! gpu error reporting routine
            subroutine check_status(status)
           
                integer status
                integer info
                integer cula_geterrorinfo
                info = cula_geterrorinfo()
                if (status .ne. 0) then
                    if (status .eq. 7) then
                        write(*,*) 'invalid value for parameter ', info
                    else if (status .eq. 8) then
                        write(*,*) 'data error (', info ,')'
                    else if (status .eq. 9) then
                        write(*,*) 'blas error (', info ,')'
                    else if (status .eq. 10) then
                        write(*,*) 'runtime error (', info ,')'
                    else
                        call cula_getstatusstring(status)
                    endif
                    stop 1
                end if
               
            end subroutine 
! --------------------------------------------------           
            ! cpu test (baseline)
            subroutine do_cpu_test(n,nrhs,ain,bin)
               
                ! input
                real,dimension(:,:) :: ain,bin
               
                ! allocations
                real,dimension(:,:),allocatable :: a,b,ans
                integer,dimension(:),allocatable :: ipiv
                integer n,nrhs
                integer c1,c2,cr,cm
                real norm
               
                ! back up input for reconstruction test
                allocate( a(n,n), b(n,nrhs), ipiv(n), ans(n,nrhs) )
                a = ain
                b = bin               
               
                ! start test
                call system_clock( c1, cr, cm )
                print *, 'starting cpu test...'
                               
                ! call lapack solver
                call sgesv(n,nrhs,a,n,ipiv,b,n,info)
               
                ! stop test
                call system_clock( count=c2 )
                print *, '  runtime:', 1.e3*real(c2-c1) / real(cr), 'ms'
                print *, '  gflops:', (0.66*n**3.) / (real(c2-c1) / real(cr)) / (1.e9)
               
                ! check answer
                ans = bin;
                call sgemm('n','n',n,nrhs,n,1.0,ain,n,b,n,-1.0,ans,n)
                norm = slange('1',n,nrhs,ans,n,work) / real(n)
                print *, '  error:', norm
                print *, ''
               
                ! cleanup
                deallocate(a,b,ipiv,ans)
               
            end subroutine do_cpu_test   
! --------------------------------------------------             
            ! cula test (host interface)
            subroutine do_cula_host_test(n,nrhs,ain,bin)

       External   CULA_SGESV
       Integer :: CULA_SGESV
               
                ! input
                real,dimension(:,:) :: ain,bin
               
                ! allocations (all on host)
                real,dimension(:,:),allocatable :: a,b,ans
                integer,dimension(:),allocatable :: ipiv
                integer :: n,nrhs,c1,c2,cr,cm,status
                real :: norm
               
                ! back up input for reconstruction test
                allocate( a(n,n), b(n,nrhs), ipiv(n), ans(n,nrhs) )
                a = ain
                b = bin               
               
                ! start test
                call system_clock( c1,cr,cm )
                print *, 'starting cula (host interface) test...'
                               
                ! call cula solver (host interface)
                status = cula_sgesv(n,nrhs,a,n,ipiv,b,n)
                call check_status(status)
               
                ! stop test
                call system_clock( count=c2 )
                print *, '  runtime:', 1.e3*real(c2-c1) / real(cr), 'ms'
                print *, '  gflops:', (0.66*n**3.) / (real(c2-c1) / real(cr)) / (1.e9)
               
                ! check answer
                ans = bin;
                call sgemm('n','n',n,nrhs,n,1.0,ain,n,b,n,-1.0,ans,n)
                norm = slange('1',n,nrhs,ans,n,work) / real(n)
                print *, '  error:', norm
                print *, ''
               
                ! cleanup
                deallocate(a,b,ipiv,ans)
               
            end subroutine do_cula_host_test
! --------------------------------------------------         
            ! cula test (device interface)
            subroutine do_cula_device_test(n,nrhs,ain,bin)

       External   CULA_DEVICE_SGESV
       Integer :: CULA_DEVICE_SGESV
           
                ! input
                real,dimension(:,:) :: ain,bin
               
                ! allocations (all on host)
                real,dimension(:,:),allocatable :: a,b,ans
                integer :: n,nrhs c1,c2,cr,cm, status
                real :: norm
               
                ! gpu memory
                real,device,dimension(:,:),allocatable :: a_dev,b_dev
                integer,device,dimension(:),allocatable :: ipiv_dev
               
                ! back up input for reconstruction test
                allocate( a(n,n), b(n,nrhs), ans(n,nrhs) )
                a(1:n,1:n) = ain
                b(1:n,1:nrhs) = bin               
               
                ! allocate gpu memory
                allocate( a_dev(n,n), b_dev(n,nrhs), ipiv_dev(n) )
               
                ! start test
                call system_clock( c1,cr,cm )
                print *, 'starting cula (device interface) test...'
               
                ! copy memory to gpu
                a_dev = a
                b_dev = b
               
                ! call cula solver (device interface)
                status = cula_device_sgesv(n,nrhs,a_dev,n,ipiv_dev,b_dev,n)
       call check_status(status)
               
                ! copy answer to host
                b = b_dev
               
                ! stop test
                call system_clock( count=c2 )
                print *, '  runtime:', 1.e3*real(c2-c1) / real(cr), 'ms'
                print *, '  gflops:', (0.66*n**3.) / (real(c2-c1) / real(cr)) / (1.e9)
               
                ! check answer
                ans(1:n,1:nrhs) = bin;
                call sgemm('n','n',n,nrhs,n,1.,ain,n,b,n,-1.,ans,n)
                norm = slange('1',n,nrhs,ans,n,work) / real(n)
                print *, '  error:', norm
                print *, ''
               
                ! cleanup
                deallocate(a,b,ans)
                deallocate(a_dev,b_dev,ipiv_dev)
               
            end subroutine do_cula_device_test             
! --------------------------------------------------                           
        end module cula_test   
   
! ###########################################################   
   
   !main program

        program cula
         use cula_test

         implicit none
         
         External CULA_INITIALIZE
External CULA_SHUTDOWN
         
         Integer :: CULA_INITIALIZE
Integer :: CULA_SHUTDOWN
         Integer :: status
         
         real :: error,eps
         ! Host memory
         real,dimension(:,:),allocatable :: a, b
         integer :: n, nrhs, info, i, j
         
         n = 5000
         nrhs = 1
         
         print *,'cula + pgfortran test (matrix solve)'
         print *,'  array size: ', n, ' by ', n
         print *,'  right hand sides: ', nrhs
         print *,''
         allocate( a(n,n), b(n,nrhs) )
         
         ! intialize a and b
         call random_number(a)
         call random_number(b)
         
         ! Make sure a() isn't singular
         do i=1,n
           a(i,i) = 10. * a(i,i) + 10.
         end do
         
         ! initialize cula
         status = cula_initialize()
         call check_status(status)
         
         ! do cpu test (baseline)
         call do_cpu_test(n,nrhs,a,b)
         
         ! do gpu test (host interface)
         call do_cula_host_test(n,nrhs,a,b)
         
         ! do gpu test (device interface)
         call do_cula_device_test(n,nrhs,a,b)

         ! shutdown cula
         status = cula_shutdown()
         call check_status(status)
         
         deallocate(a,b)
         
        end program cula

CUBLAS's Fortran bindings were also tested and didn’t work.
Any idea what’s wrong? I hope CULA R12 Premium supports device interface when calling from Fortran

Thank you in advance!
Tomek
CULA Academic
 
Posts: 3
Joined: Tue Mar 16, 2010 1:00 pm

Re: Problem with calling device interface in Fortran

Postby kyle » Mon Aug 08, 2011 3:23 pm

We posted an article in PGI's "Insider" newsletter a few months ago with some details regarding PGI + CULA: http://www.pgroup.com/lit/articles/insider/v2n3a5.htm

At first glance, I'm not seeing anything wrong in the file; but you might have a linker problem.

In the article, we include a link to a PGI example source file; try and see if that program plus the suggestion compiler input works.

To answer your question though, any device function should work with PGI's CUDA Fortran extension.
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Problem with calling device interface in Fortran

Postby john » Tue Aug 09, 2011 5:46 am

Can you please try a test for me? Can you compile for Win32 target rather than x64 and report if your problem remains or if it disappears? You can keep all the other software the same when you do this, it should be as easy as flipping the switch in MSVC.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Problem with calling device interface in Fortran

Postby Tomek » Tue Aug 09, 2011 7:52 am

I completely removed the old NVIDIA drivers from my computer, not just upgrading them to 4.0 release (make a new installation), and everything works fine (in Win7 64-bit). I also installed a 11.7 version of PGI Visual Fortran compiler. Sorry to bothering you, but I haven't been able to get my mind off that problem all week.

Anyway thanks for your help.

I do try a test for Win32 and let you know if everything runs well.
Tomek
CULA Academic
 
Posts: 3
Joined: Tue Mar 16, 2010 1:00 pm


Return to Windows Support

Who is online

Users browsing this forum: No registered users and 1 guest

cron