Page 1 of 1

cula_device_dsyev - Argument 1 is invalid

PostPosted: Wed Dec 05, 2012 5:44 am
by perrineedham
Hi there,

I'm trying to use the fortran-device interface in my code but not having much luck. The CULA function I am trying to call is cula_device_dsyev.

Operating system: RHEL 5.5

Brief description of the problem: The program compiles fine but exits with the error status
Argument 1 is invalid (see the Reference Manual for more information)
when the
CULA_CHECK_STATUS(STATUS)
routine is called after the call to the
cula_device_dsyev
function.

Detailed description of the problem: I am using the pgfortran v12.8 compiler and am using the cula_lapack_device_pgfortran module (from the cula include directory) to interface with the CULA function that I am trying to use.

Here is a cut-down version of the code I'm trying to run:

Code: Select all
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!     Calling code
      Program main()

      use cula_status
      EXTERNAL CULA_INITIALIZE
      integer :: status

!     Initialize CULA
      STATUS = CULA_INITIALIZE()
      CALL CULA_CHECK_STATUS(STATUS)

      call formd(ftot,dtot,lowt,eigvec,eigval,scr1,scr2,
     &   scr9,scr10,nocc,iuhf,nbasis)

      end program

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!     Host code
      subroutine formd(ftot,dtot,lowt,eigvec,eigval,scr1,scr2,
     &   scr9,scr10,nocc,iuhf,nbasis)

      use cudafor
      use cula_lapack_device_pgfortran
      use cula_status

!     Declare device arrays
      double precision,allocatable,device,dimension(:,:)::scr1_d
      double precision,allocatable,device,dimension(:)::eigval_d
      integer::status,nbasis
      character :: job,lo
     
      job='V'
      lo='L'

!     Allocate device arrays
      allocate(scr1_d(nbasis,nbasis),eigval_d(nbasis))

c
c     Copy Fock to Work (for diagonaliser)
c
      ij=0
      do i=1,nbasis
         do j=1,i
            ij=ij+1
            scr1(i,j)=ftot(ij)
         enddo
      enddo

!     Copy host data to device
      scr1_d=scr1
      eigval_d=eigval

!     Check allocate and copy status
         CALL CULA_CHECK_STATUS(STATUS)

!     Call CULA function
      status=cula_device_dsyev(job,lo,nbasis,scr1_d,nbasis,eigval_d)

!     Check CULA function status
      CALL CULA_CHECK_STATUS(STATUS)

!     Copy device data back to the host
      eigval=eigval_d
      scr1=scr1_d

      end subroutine
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


I cannot see why argument 1 would be invalid as i am declaring job as a character and giving it the value'V'.

The interface being used is straight out of the include file but I've included it below for completeness:

Code: Select all
 
module cula_lapack_device_pgfortran
use ISO_C_BINDING

! culaStatus culaDeviceDsyev(char jobz, char uplo, int n, culaDeviceDouble* a, int lda, culaDeviceDouble* w);
    interface
        integer(C_INT) function cula_device_dsyev(jobz,uplo,n,a,lda,w) &
        BIND(C,NAME="culaDeviceDsyev")
            use ISO_C_BINDING
            character(C_CHAR), value :: jobz
            character(C_CHAR), value :: uplo
            integer(C_INT), value :: n
            real(C_DOUBLE), device, dimension(:,:) :: a
            integer(C_INT), value :: lda
            real(C_DOUBLE), device, dimension(:) :: w
        end function
    end interface

end module


CUDA version installed: CUDA 5.0

GPU model: Fermi C2050

Any help would be much appreciated,

Perri

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Thu Dec 06, 2012 4:34 am
by perrineedham
Additional information: The file containing the call to the CULA library has the suffix .F not .cuf...could this effect execution?

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Thu Dec 06, 2012 6:42 am
by john
We're looking into the former (the first post), and as for the latter (second post) that might be a question to pose to PGI. The convention, at least, seems to be that CUDA-Fortran code typically goes into .cuf files.

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Thu Dec 06, 2012 6:50 am
by perrineedham
Yes but using -Mcuda with *.F files works the same. I guess it's a possibility if you can't reproduce the error on you side with a .cuf file.

Thanks for the help, it's much appreciated.

Cheers,
Perri

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Thu Dec 06, 2012 11:13 am
by john
We are still attempting to reproduce from your example.

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Fri Dec 07, 2012 5:40 am
by perrineedham
Okay, I changed the file from .F to .CUF and I'm still getting the same runtime error. So that doesn't appear to be the problem.

Cheers,
Perri

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Fri Dec 07, 2012 7:15 am
by perrineedham
I found the problem... waheyyyy!!!

I was using the compiler flags -O3 -fast. When I removed them from my Makefile it worked perfectly.

cheers,
Perri

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Wed Dec 12, 2012 1:01 pm
by john
Hello,
We've actually found situations (without -O3 -fast) that trigger a bug using pgfortran. The PGI folks have acknowledged for us that there is an issue with their handing of C_CHAR arguments by-value in their ISO_C_BINDING support. Basically CULA ends up receiving the wrong character value, and this is beyond our control. We have no further details at this time, but will update this thread if we learn any more from PGI.

We did find that a potential fix is to edit the module that ships with CULA and to change the declarations from function to subroutine. This seemed to fix the problem, but at the cost of hiding the error codes.

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Wed Jan 23, 2013 5:03 am
by perrineedham
Hi there,

I'd like to know a little more about the compiler bugs you're seeing when using PGI's ISO_C_BINDING please as I'm getting another error, which I can't seem to suss out.

The error I'm getting is:


This status can mean one of the following:
CULA Lapack: Data error at pos 5941 (see the Reference Manual for guidance)
CULA Sparse: A data problem was found; inspect result.flag to discriminate
Warning: ieee_invalid is signaling
Warning: ieee_divide_by_zero is signaling
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
1

Which on further reading appears to mean I have a zero on the diagonal or a singular matrix. I'm trying to use the CULA_DEVICE_DSYEV function. What's unusual is that it runs fine for a matrix of size 4000x4000 but when I try to run a matrix of size 6000x6000 I get this error.

I guess I'm just trying to figure out whether or not it is a problem with my code or something wrong with the library.

Any help would be much appreciated,
Perri

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Wed Jan 23, 2013 9:50 am
by john
For DSYEV, info>1 means it's a convergence error. It most often means that the algorithm hit the limits of machine precision while trying the computation. Either your matrix has been loaded incorrectly to the device, or it might be numerically unstable.

Re: cula_device_dsyev - Argument 1 is invalid

PostPosted: Wed Jan 23, 2013 9:55 am
by perrineedham
Thanks for the quick reply.

Well seeing as all the smaller datasets worked fine would you say it's safe to say that the array is numerically unstable... or could this still be a data transfer issue that only reveals itself with larger datasets?

Excuse my lack of knowledge here but what does it mean for an array to not be numerically stable in this sense?

Cheers,
Perri