2Mar/12Off

CUDA and Fortran

by John

Let's start by saying that CUDA and Fortran aren't the best of friends out of the box. CUDA is a C library without true Fortran support, and Fortran isn't naturally attuned to C's value semantics. Since our users want to use our CULA Device interface routines to avoid transfers between the host and the GPU, those users also need to be able to allocate device memory. The best and easiest way, in our findings, is to use the Portland Groups's Fortran compiler, with the CUDA-Fortran language extensions. This makes CUDA a first-class citizen and so running CULA's Device interface is quite simple. Keep an eye on the upcoming issues of the Portland Group's newsletter, because we will be revising our old article about CULA + PGI integration there.

Now for those without the PGI compiler, the answer is the ISO_C_BINDING method for module writing, which allows Fortran to call into C code using the C types for pointers and with value semantics. Most newer Fortran compilers support this, and as of CULA R15 there will be available a cula_lapack_device module that takes advantage of this mode. That said, CUDA does not publish a formal module for ISO_C_BINDING integration, so you will need to write your own. Here are some sample definitions which can be pretty easily copied to produce the definitions for the CUDA routines you need.

      MODULE CUDA_CONSTANTS
          USE ISO_C_BINDING
          ENUM, BIND(C)
              ENUMERATOR :: cudaMemcpyHostToHost=0, &
              cudaMemcpyHostToDevice, &
              cudaMemcpyDeviceToHost, &
              cudaMemcpyDeviceToDevice, &
              cudaNotUsedInFortran
          END ENUM
      END MODULE
      MODULE CUDA_MEMORY_MANAGEMENT
          IMPLICIT NONE
          INTERFACE
              INTEGER(C_INT) FUNCTION CUDA_MALLOC(BUFFER, SZ) &
              BIND(C,NAME="cudaMalloc")
                  USE ISO_C_BINDING
                  TYPE (C_PTR) :: BUFFER
                  INTEGER (C_SIZE_T), VALUE :: SZ
              END FUNCTION
          END INTERFACE
          INTERFACE
              FUNCTION CUDA_MEMCPY(DST,SRC,CO,KI) RESULT(R) &
              BIND(C,NAME="cudaMemcpy")
                  USE CUDA_CONSTANTS
                  INTEGER (C_INT) :: R
                  TYPE (C_PTR), VALUE :: DST
                  TYPE (C_PTR), VALUE :: SRC
                  INTEGER (C_SIZE_T), VALUE :: CO
                  INTEGER (C_INT), VALUE :: KI
              END FUNCTION
          END INTERFACE
      END MODULE

Using the module to allocate GPU memory, transfer data to that memory, and then to run a CULA routine is as simple as;

        USE CULA_LAPACK_DEVICE
        USE CUDA_MEMORY_MANAGEMENT
        TYPE(C_PTR) :: A_DEVICE
        REAL, ALLOCATABLE, DIMENSION(:,:), TARGET :: A
        SIZE_A = M*N*SIZEOF(A(1,1))
        STATUS = CUDA_MALLOC(A_DEVICE,SIZE_A)
        STATUS = CUDA_MEMCPY(TAU_DEVICE,C_LOC(TAU),&
                             SIZE_TAU,cudaMemcpyHostToDevice)
        STATUS = CULA_DEVICE_SGEQRF(M, N, A_DEVICE, M, TAU_DEVICE)

With these examples, you can start integrating your CUDA and Fortran codes much more easily. PGI is still our preferred method, but this one works well enough for Intel and GNU Fortran compilers. The upcoming CULA R15 release will feature the publication of the modules that will allow you to integrate the CULA Device interface with this programming style.

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.