Matlab and sgesv and segmentation faults

General CULA Dense (LAPACK & BLAS) support and troubleshooting. Use this forum if you are having a general problem or have encountered a bug.

Matlab and sgesv and segmentation faults

Postby Boxed Cylon » Wed Aug 15, 2012 5:22 pm

Thanks for the latest installation of CULA/R15! I've updated my CUDA version to 4.2. I use a GTX 480 on Suse linux 11.4.

I wonder if someone can help me with a nagging issue. I run CULA using a mex file and matlab - I find that while the mex file may run o.k., when the mex function is cleared, or matlab is exited, I get a segmentation fault. This suggests, usually, a memory management problem. I can't seem to sort it out, however. The issue seems to have been evolving with CUDA/CULA version. I had luck once with the "cleanup" routine, which is meant to clear all the variables and shutdown CUDA and CULA for a safe matlab exit.

Here is the mex file, which is a just a call to culaDeviceSgesv. I compile it as gpu_sgesv.mexa64. The routine is designed to be called in a loop, so we need not initialize CUDA/CULA everytime.

Code: Select all
#include "mex.h"
#include "cublas.h"
#include "cula.h"
#include "cula_lapack_device.h"
#include "cuda.h"
#include "sys/time.h"

static int initialized = 0;

void cleanup(void) {
   mexPrintf("MEX-file is terminating, exiting CUDA Thread\n");
   culaShutdown();
   mexPrintf("Exited CUDA Thread.\n");
}


void checkStatus(culaStatus status)
{
    if(!status)
        return;

    if(status == culaArgumentError)
        printf("Invalid value for parameter %d\n", culaGetErrorInfo());
    else if(status == culaRuntimeError)
        printf("Runtime error (%d)\n", culaGetErrorInfo());
    else
        printf("%s\n", culaGetStatusString(status));

    culaShutdown();
    exit(EXIT_FAILURE);
}

void mexFunction( int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])

{
      int I,L;
      int Ic,Lc;
      mwSize dims0[2];

      // INPUT VARIABLES   %%%%%%%%%%%%%%%%%%%%%%%%%
      // A is dimensioned LXL
      // B is dimensioned LXI
      float *A,*B;
 
      // OUTPUT VARIABLE, X=A\B   %%%%%%%%%%%%%%%%%%
      float *X;

      // CUDA/GPU VARIABLES %%%%%%%%%%%%%%%%%%%%%%%%
      float *ga, *gb;
      int* ipiv = 0;

      culaStatus status;

      if (nrhs != 2) {
         mexErrMsgTxt("gpu_sgesv requires 2 input arguments");
      } else if (nlhs != 1) {
         mexErrMsgTxt("gpu_sgesv requires 1 output argument");
      }

      if ( !mxIsSingle(prhs[0]) || !mxIsSingle(prhs[1]) ) {
           mexErrMsgTxt("Input arrays must be single precision.");
      }


// %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
// Single-precision input arrays */
// Dimensions, and then array data
      L = mxGetN(prhs[0]);
      I = mxGetN(prhs[1]);
      A =   (float*) mxGetData(prhs[0]);
      B =   (float*) mxGetData(prhs[1]);

// Left hand side matrix set up    (the solution) 
      dims0[0]=L;
      dims0[1]=I;

      plhs[0] = mxCreateNumericArray(2,dims0,mxSINGLE_CLASS,mxREAL);
      X = (float*) mxGetData(plhs[0]);

      // Make modulo 32 dimensions  - speeds up the sgemm calculations significantly
      // Just used as an example here.
       Ic=I+(32-I%32);
       Lc=L+(32-L%32);

      if (!initialized) {
        printf("Initializing for CULA...\n");
          status = culaInitialize();
      //    Early exit if CULA fails to initialize (no GPU, etc)
          checkStatus(status);
          mexAtExit(cleanup);
          initialized = 1; };

      cudaMalloc ((void**)&ga,Lc*Lc*sizeof(float));
      cudaMalloc ((void**)&gb,Lc*Ic*sizeof(float));
      cudaMemset(ga,0,Lc*Lc*4);  /* zero these since we've padded them */
      cudaMemset(gb,0,Lc*Ic*4);

      cublasSetMatrix (L, L, sizeof(float), A, L, (void*)ga, Lc);
      cublasSetMatrix (L, I, sizeof(float), B, L, (void*)gb, Lc);

    // Allocate for ipiv - a working matrix used by sgesv, and ignored here.
      cudaMalloc ((void**)&ipiv,L*sizeof(int));

    // Ready to go...
    // First numbers L, I pertain only to the non-padded sections of the arrays.
       status = culaDeviceSgesv(L,I,ga,Lc,ipiv,gb,Lc);
       checkStatus(status);

    // Get the solution off the GPU
       cublasGetMatrix (L, I, sizeof(float), gb, Lc, X, L);
    // X has the solution we need; now back to matlab after a bit of clean up.


    // Clear the variables to avoid GPU memory leak (and GPU crash!)

       cudaFree (ga);
       cudaFree (gb);
       cudaFree (ipiv);
       culaFreeBuffers();

}


Here is some simple matlab lines that calls this routine:

Code: Select all
clear all

format compact
maxNumCompThreads(1);


N=[5000];

%start_cuda

for K=N,
  K
  K1=1000;
  A1=randn(K,K);
  B1=randn(K,K1);

  A=single(A1);
  B=single(B1);

  disp(' ')
  disp('CPU:')
  tic
  Lp=A\B;
  T1=toc

  Lp(1,1:3)
  Lp(end,(end-2):end)

disp(' ')
disp('CULA GPU: ')
tic
[X]= gpu_sgesv(A,B);
T2=toc

  X(1,1:3)
  X(end,(end-2):end)

end


Can anyone spot what seems to be the trouble with the mex C code?
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re: Matlab and sgesv and segmentation faults

Postby john » Thu Aug 16, 2012 11:29 am

Hi Boxed,
I'd like to narrow down your problem, and I lack the exact platform to reproduce your issue. Is it possible to ask if you have a test that ONLY uses CUBLAS without CULA? A simple cublasCreate/sgemm/cublasDestroy would be adequate and would determine in which library the problem lies.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Matlab and sgesv and segmentation faults

Postby kyle » Thu Aug 16, 2012 2:52 pm

Also, was this happening with previous versions of CULA or is it new to R15?
kyle
Administrator
 
Posts: 301
Joined: Fri Jun 12, 2009 7:47 pm

Re: Matlab and sgesv and segmentation faults

Postby Boxed Cylon » Thu Aug 16, 2012 6:00 pm

It is true, that I may have been a little too quick to place the blame on CULA - I suppose the issue has been a generic one with mex files and CUDA. I run into the problem using MAGMA routines as well (MAGMA seems to be something of a mess - a constructive research mess making interesting progress that can be made to work, but a mess...) The issue has been evolving over the various versions; at one point I thought I had the problem sorted out. Perhaps we can identify this annoying culprit.

I suppose my suspicion is that CULA has some hidden array allocations that get crosswired with matlab. But "culaShutdown();" is supposed to handle that.

This routine works using CUDA's sgemm, however. it does not crash when exiting matlab or clearing the function: (Using cudaThreadExit in the CULA/sgesv routine causes a serious Segmentation fault barf.) Without calling cudaThreadExit when the function is cleared, the segmentation fault returns.

Code: Select all
#include "mex.h"
#include "cublas.h"
#include "cuda_runtime.h"
#include "sys/time.h"

void cleanup(void) {
   mexPrintf("MEX-file is terminating, exiting CUDA Thread\n");
   cudaThreadExit();
   mexPrintf("Exited CUDA Thread.\n");
}

/*  gpu_sgemm.cu - Gateway function for subroutine sgemm
  function C = sgemm_cu(transa,transb,single(alpha),single(beta),single(A),single(B),single(C))
  transa,transb = 0/1 for no transpose/transpose of A,B
  Input arrays must be single precision.
*/

void mexFunction( int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])
{
      cublasStatus status;
      int M,K,L,N,MM,NN,KK;
      int Mc,Kc,Lc,Nc,MMc,NNc,KKc;
      int dims0[2];
      int ta,tb;
      float alpha,beta;
      float *a,*b,*c,*cc;
      float *ga,*gb,*gc;
      char transa,transb;
      cublasStatus retStatus;

      if (nrhs != 7) {
          mexErrMsgTxt("sgemm requires 7 input arguments");
      } else if (nlhs != 1) {
          mexErrMsgTxt("sgemm requires 1 output argument");
      }

      if ( !mxIsSingle(prhs[4]) ||
           !mxIsSingle(prhs[5]) ||
           !mxIsSingle(prhs[6]))   {
           mexErrMsgTxt("Input arrays must be single precision.");
      }

      ta = (int) mxGetScalar(prhs[0]);
      tb = (int) mxGetScalar(prhs[1]);
      alpha = (float) mxGetScalar(prhs[2]);
      beta = (float) mxGetScalar(prhs[3]);

      M = mxGetM(prhs[4]);   /* gets number of rows of A */
      K = mxGetN(prhs[4]);   /* gets number of columns of A */
      L = mxGetM(prhs[5]);   /* gets number of rows of B */
      N = mxGetN(prhs[5]);   /* gets number of columns of B */

      if (ta == 0) {
          transa='n';
          MM=M;
          KK=K;
      } else {
          transa='t';
          MM=K;
          KK=M;
      }

      if (tb == 0) {
          transb='n';
          NN=N;
      } else {
          transb='t';
          NN=L;
      }

/*    printf("transa=%c\n",transa);
      printf("transb=%c\n",transb);
      printf("alpha=%f\n",alpha);
      printf("beta=%f\n",beta);     */

/* Left hand side matrix set up */
      dims0[0]=MM;
      dims0[1]=NN;
      plhs[0] = mxCreateNumericArray(2,dims0,mxSINGLE_CLASS,mxREAL);
      cc = (float*) mxGetData(plhs[0]);
     
/* Single-precision arrays */
/* Matrix 1 */
      a = (float*) mxGetData(prhs[4]);
/* Matrix 2 */
      b = (float*) mxGetData(prhs[5]);
/* Matrix 3 */
      c = (float*) mxGetData(prhs[6]);

//      cudaSetDevice(0);

/* STARTUP   CUBLAS */
      retStatus = cublasInit();
     // test for error
     retStatus = cublasGetError ();
     if (retStatus != CUBLAS_STATUS_SUCCESS) {
        fprintf(stderr,"CUBLAS: an error occurred in cublasInit\n");
      }


     Mc=M+32-M%32;
     Kc=K+32-K%32;
/* ALLOCATE SPACE ON THE GPU AND COPY a INTO IT */
      cudaMalloc ((void**)&ga,Mc*Kc*sizeof(float));
      // test for error
      retStatus = cublasGetError ();
      if (retStatus != CUBLAS_STATUS_SUCCESS) {
          fprintf(stderr,"CUBLAS: an error occurred in cudaMalloc\n");
      }
      cudaMemset(ga,0,Mc*Kc*4);
      retStatus = cublasSetMatrix (M, K, sizeof(float),
               a, M, (void*)ga, Mc);

      Lc=L+32-L%32;
      Nc=N+32-N%32;
/* SAME FOR B, C */
      cudaMalloc ((void**)&gb,Lc*Nc*sizeof(float));
      cudaMemset(gb,0,Lc*Nc*4);
      retStatus = cublasSetMatrix (L, N, sizeof(float),
               b, L, (void*)gb, Lc);

      MMc=MM+32-MM%32;
      NNc=NN+32-NN%32;
      KKc=KK+32-KK%32;
      cudaMalloc ((void**)&gc,MMc*NNc*sizeof(float));
//      if (beta != 0.0 ) {
         cudaMemset(gc,0,MMc*NNc*4);
         retStatus = cublasSetMatrix (MM, NN, sizeof(float),
                  c, MM, (void*)gc, MMc);
//      }

/*  PADDED ARRAYS */
/*    printf("Op(A) has No. rows = %i\n",MMc);
      printf("Op(B) has No. cols = %i\n",NNc);
      printf("Op(A) has No. cols = %i\n",KKc);
      printf("A has leading dimension = %i\n",Mc);
      printf("B has leading dimension = %i\n",Lc);
      printf("C has leading dimension = %i\n",MMc); */

/* READY TO CALL SGEMM */
    (void) cublasSgemm (transa, transb, MMc, NNc, KKc, alpha,
                                ga, Mc, gb, Lc, beta, gc, MMc);
    status = cublasGetError();
    if (status != CUBLAS_STATUS_SUCCESS) {
        fprintf (stderr, "!!!! kernel execution error.\n");
    }

/* NOW COPY THE RESULTING gc ON THE GPU TO THE LOCAL c */
     retStatus = cublasGetMatrix (MM, NN, sizeof(float), gc, MMc, cc, MM);
     if (retStatus != CUBLAS_STATUS_SUCCESS) {
          fprintf(stderr,"CUBLAS: an error occurred in cublasGetMatrix\n");
     }

/* FREE UP GPU MEMORY AND SHUTDOWN (OPTIONAL?) */
      cudaFree (ga);
      cudaFree (gb);
      cudaFree (gc);
      cublasShutdown(); 
      mexAtExit(cleanup);

}


Here is a driver matlab script for it:

Code: Select all
  clear all

format compact
maxNumCompThreads(1);


N=[5000];
ETR=[];
ETR2=[];

%start_cuda

for K=N,
  K
  K1=1000;
  A1=randn(K,K);
  B1=randn(K,K1);

  A=single(A1);
  B=single(B1);

  disp(' ')
  disp('CPU:')
  tic
  Lp=A*B;
  T1=toc

  Lp(1,1:3)
  Lp(end,(end-2):end)

disp(' ')
disp('CUDA GPU: ')
tic
one=single(1);
zero=single(0);
C=single(zeros(size(Lp)));
[X]= gpu_sgemm(zero,zero,one,zero,A,B,C);
T2=toc

  X(1,1:3)
  X(end,(end-2):end)

end
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re: Matlab and sgesv and segmentation faults

Postby Boxed Cylon » Mon Aug 27, 2012 1:47 pm

I've continued to poke at this problem of segmentation faults - I keep coming back to the conclusion that there is something within the CULA system that is causing the problem.

If I comment out "culaInitialize" (and other lines related to that/culaStatus) the gpu_sgesv runs o.k., with no segmentation fault at "clear" or "exit" from matlab. (it gives the wrong answer from culaDeviceSgesv of course!) I suppose the memory problems could be caused by either arrays on the device, or arrays on the host.

It may be telling that trying to use "cudaDeviceReset" rather than "culaShutdown" to close down CUDA and clear variables causes a more vigorous crash of matlab. Calling culaShutdown before cudaDeviceReset is no help. "cudaDeviceReset" seems to be the all-powerful way to clear memory allocations within CUDA.

One of the puzzlements, perhaps a separate question, is whether when using CULA and CUBLAS, one has to also use cublasInit, in addition to culaInitialize. (hence also cublasShutdown). It looks like not - when using CULA, CUBLAS is initialized by default. But perhaps a complete answer to this question might be noted in the documentation (it might be there, but I couldn't find it in my cursory search).
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re: Matlab and sgesv and segmentation faults

Postby john » Tue Aug 28, 2012 12:49 pm

Hi Cylon, thank you for the updates. I've been working with your code on a few different configurations and unfortunately haven't been able to reproduce your stated behavior. In all cases, the code seems to function cleanly (with and without the terminate call). I can load and unload the mex file with no crashes. I've gotten this working on windows+2010+R14, windows+2012+R15, and RHEL+2011+R14. I unfortunately don't have your exact configuration available to me.

I'll put out as much information as I can here, in the hopes that we can get enough of a discussion going to help us pinpoint some new angles to locate the problem.

culaInitialize does very little, as does culaShutdown. culaInitialize grabs some details about your device and environment (CUDA driver and runtime version) to determine compatibility. CUBLAS is initialized because it's virtually certain to be used in any upcoming CULA calls.

culaShutdown does nothing unless it's the final user thread to shut down. If it is, it clears all memory and then closes down CUBLAS. The cleared memory is only a few megabytes of GPU memory in most cases - you can achieve the same clearing without culaShutdown by calling culaFreeBuffers.

I'm beginning to think that this might be a problem that is very specific to your personal setup. Have you attempted the code on any other systems, maybe a Linux other than SUSE?

For the test you mentioned in your new post (removing culaInitialize), have you tried the same test with just the CUBLAS versions?
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Matlab and sgesv and segmentation faults

Postby Boxed Cylon » Tue Aug 28, 2012 1:09 pm

How peculiar...

I'll keep poking at the problem; perhaps something will occur to me.

For the record, I am using Matlab R2012a, 64-bit on Suse linux 11.4. At the moment I don't have another linux version to try. Perhaps it is time to upgrade Suse linux. I might be able to try an older matlab version.

That on your systems the routine loads and unloads o.k. tells me I am not completely crazy...

It might be important that I have a system with two GPU's - device 0 is a GTX 480, while device 1 is a GT 440. I use the latter for the desktop and the GTX480 as a compute-only device. By default CUDA should/is using device 0.
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re: Matlab and sgesv and segmentation faults

Postby john » Tue Aug 28, 2012 2:39 pm

I want to add quickly that in the next release we will be no longer doing a global cublasInit. This is now a redundant "feature" because CULA no longer needs this particular CUBLAS functionality. I can't hazard a guess whether this will cure your issue or not (since I can't replicate and test it). Which reminds me that I would recommend converting your code forward to the CUBLAS "v2" API, because it's a tighter interface to CUBLAS. In short you no longer use cublasInit, but instead create a specific handle with cublasCreate.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Matlab and sgesv and segmentation faults

Postby Boxed Cylon » Tue Aug 28, 2012 4:37 pm

I've tried the routine with R2012b (8.0.0.755) (beta) matlab and with the previous version of CULA (R14), all with the same result - segmentation fault when I clear the mex routine.

Curiously, if I code it like this:

Code: Select all
       
if (!initialized) {
         printf("Initializing for CULA...\n");
      //   cudaSetDevice(1);
      //   cublasInit();
          culaInitialize();
          culaFreeBuffers();
      //    Early exit if CULA fails to initialize (no GPU, etc)
         mexAtExit(cleanup);
         initialized = 1; };


with culaFreeBuffers right after culaInitialize, the code still runs correctly...whereas with culaInitialize commented out it runs, but incorrectly. Perhaps there is a small character array set up by culaInitialize that doesn't get cleared? (I can also have culaInitialize, culaFreeBuffers, and cudaDeviceReset one after the other and it runs correctly, but crashes when cleared. This suggests an uncleared array on the host, maybe... )

Matlab is extraordinarily temperamental about memory, with some rather odd behavior at times. Sometimes memory issues crop up only occasionally - it just depends if the memory issue steps on matlab's toes. I once had such an issue in which the mex file would run o.k. continuously for days on end before crashing; eventually traced to a minor allocation problem. I wish I were more of an expert about it.

Its possible the issue resides with the older C libraries on my system; I'm contemplating the long-avoided upgrade...
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re: Matlab and sgesv and segmentation faults

Postby Boxed Cylon » Tue Aug 28, 2012 6:53 pm

Its likely not helpful, but here is an example of a matlab barf - in this case I have executed a cudaDeviceReset at the exit of the mex file, which results in this barf. Otherwise, I get just a quiet "Segmentation fault" and get spat out into the bash shell. As you see at the top, the messages from the mex file at exit are printed, including "Exited CUDA Thread.", meaning that the Reset/Free/Shutdown routines executed without error. The crash therefore happens when matlab goes to clear the mex function at the end.

Code: Select all
>> clear all
MEX-file is terminating, exiting CUDA Thread
Exited CUDA Thread.

------------------------------------------------------------------------
       Segmentation violation detected at Tue Aug 28 18:45:17 2012
------------------------------------------------------------------------

Configuration:
  Crash Decoding  : Disabled
  Current Visual  : 0x21 (class 4, depth 24)
  Default Encoding: UTF-8
  GNU C Library   : 2.11.3 stable
  MATLAB Root     : /usr/local/matlab
  MATLAB Version  : 8.0.0.755 (R2012b)
  Operating System: Linux 2.6.37.6-0.20-default #1 SMP 2011-12-19 23:39:38 +0100 x86_64
  Processor ID    : x86 Family 6 Model 26 Stepping 5, GenuineIntel
  Virtual Machine : Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode
  Window System   : The X.Org Foundation (10903000), display :0

Fault Count: 1


Abnormal termination:
Segmentation violation

Register State (from fault):
  RAX = 0000000000000000  RBX = 00007eff4d809b30
  RCX = 0000000000000000  RDX = 0000000000000000
  RSP = 00007eff4d809af0  RBP = 00007eff4d809b28
  RSI = 0000000000000000  RDI = 00007eff4d809b00

   R8 = 00000000014f64a0   R9 = 0000000000000001
  R10 = 000000000148c220  R11 = 00007eff0848ce00
  R12 = 00007eff0fdd20c0  R13 = 00007eff12242f00
  R14 = 0000000000000000  R15 = 0000000000000004

  RIP = 00007eff0df0b1c7  EFL = 0000000000010246

   CS = 0033   FS = 0000   GS = 0000

Stack Trace (from fault):
[  0] 0x00007eff617a768e           /usr/local/matlab/bin/glnxa64/libmwfl.so+00517774 _ZN2fl4diag15stacktrace_base7captureERKNS0_14thread_contextEm+000158
[  1] 0x00007eff617a8962           /usr/local/matlab/bin/glnxa64/libmwfl.so+00522594
[  2] 0x00007eff617aa4ae           /usr/local/matlab/bin/glnxa64/libmwfl.so+00529582 _ZN2fl4diag13terminate_logEPKcRKNS0_14thread_contextE+000174
[  3] 0x00007eff60a99003          /usr/local/matlab/bin/glnxa64/libmwmcr.so+00557059 _ZN2fl4diag13terminate_logEPKcPK8ucontext+000067
[  4] 0x00007eff60a95b0d          /usr/local/matlab/bin/glnxa64/libmwmcr.so+00543501
[  5] 0x00007eff60a977a5          /usr/local/matlab/bin/glnxa64/libmwmcr.so+00550821
[  6] 0x00007eff60a979c5          /usr/local/matlab/bin/glnxa64/libmwmcr.so+00551365
[  7] 0x00007eff60a9806e          /usr/local/matlab/bin/glnxa64/libmwmcr.so+00553070
[  8] 0x00007eff60a98205          /usr/local/matlab/bin/glnxa64/libmwmcr.so+00553477
[  9] 0x00007eff5f0262d0                             /lib64/libpthread.so.0+00062160
[ 10] 0x00007eff0df0b1c7                              /usr/lib64/libcuda.so+01057223
[ 11] 0x00007eff0deec249                              /usr/lib64/libcuda.so+00930377
[ 12] 0x00007eff0ded3745                              /usr/lib64/libcuda.so+00829253 cuStreamDestroy_v2+000085
[ 13] 0x00007eff0fb84499               /usr/local/cula/lib64/libcudart.so.4+00050329
[ 14] 0x00007eff0fbb8b6f               /usr/local/cula/lib64/libcudart.so.4+00265071 cudaStreamDestroy+000511
[ 15] 0x00007eff01a8a4b5               /usr/local/cula/lib64/libcublas.so.4+00181429 cublasDestroy_v2+000021
[ 16] 0x00007eff0ffd5bc9            /usr/local/cula/lib64/libcula_lapack.so+02108361
[ 17] 0x00007eff104d2bf7            /usr/local/cula/lib64/libcula_lapack.so+07338999
[ 18] 0x00007eff0ffd5d82            /usr/local/cula/lib64/libcula_lapack.so+02108802
[ 19] 0x00007eff5ecdf935                                   /lib64/libc.so.6+00219445 __cxa_finalize+000165
[ 20] 0x00007eff0fdf4876            /usr/local/cula/lib64/libcula_lapack.so+00137334


This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.

If this problem is reproducible, please submit a Service Request via:
    http://www.mathworks.com/support/contact_us/

A technical support engineer might contact you with further information.

Thank you for your help.** This crash report has been saved to disk as /home/cylon/matlab_crash_dump.19575-1 **



MATLAB is exiting because of fatal error
Killed
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re: Matlab and sgesv and segmentation faults

Postby john » Wed Aug 29, 2012 7:35 am

I can see how this would happen on a reset without a culaShutdown. We have an internal CUBLAS handle that's only cleared when you call culaShutdown (otherwise we keep it around assuming you'll be back to call some more routines). If you do a CUDA reset, then our internal handle would be invalidated by that (unbeknownst to us) and so it's a legitmate error when we try to free it.

What appears to be happening in the stack trace is that the clear has forced the unload of the CULA library. This is actually slightly more intense than a culaShutdown. CULA, on unload, has to clear out its internal tracking data. This is basically a small list of CUBLAS handles and some trivial information on which devices and threads we've seen. Our internal buffers are included here as well. Assuming that culaShutdown was called, there's basically no work to do here except to free up a tiny host buffer or two (device IDs, etc). But in your case, I think there might be a lingering invalidated handle that's we're attempting to clear out. In short, I'd recommend culaShutdown before either a cudaDeviceReset or clearing the mex.

Please note that culaFreeBuffers shouldn't have any impact on this. The only purpose of that routine is to free up some GPU memory in case the GPU memory space has become fragmented. This routine does not clear the internal tracking data, nor our handles.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Matlab and sgesv and segmentation faults

Postby Boxed Cylon » Wed Aug 29, 2012 5:38 pm

I think you are on the right track with the unloading of the cula libraries discussion. I wrote the following small routine as a test:

Code: Select all
#include "mex.h"
#include "cublas.h"
#include "cuda.h"
#include "cula.h"
#include "cula_lapack_device.h"

/* --------------------------- host code ------------------------------*/


/*  end_cula.cu -
    Makes a call to culaShutdown to shut it all down.
    No input, no output.


  B. Cylon  08/2012
*/

void mexFunction( int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])
{

      fprintf (stderr,"*************  Stopping CULA  *************\n");
       culaShutdown();
      fprintf (stderr,"*************  Done!  *************\n");

}             


So after I run the matlab script that calls gpu_sgesv, I then run "end_cula" to run this mex file. After this, I can "clear gpu_sgesv" fine, but then matlab crashes with "clear end_cula"("Segmentation fault", and an abrupt dump out to the bash shell with no other error messages.) "end_cula" has the cula libraries still loaded; unloading them seems to be the trouble.

A similar routine with using cudaDeviceReset(); behaves differently - the one runs o.k., but matlab crashes more vigorously with "clear gpu_sgesv" (hence attempting to unload the cula libraries).
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re: Matlab and sgesv and segmentation faults

Postby john » Thu Aug 30, 2012 7:45 am

Maybe a more fundamental question. Is it possible on your system to do the following:
* Load a mex file that references CULA
* Run init and shutdown
* Unload the mex file
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Re: Matlab and sgesv and segmentation faults

Postby Boxed Cylon » Thu Aug 30, 2012 10:18 am

That's an idea... So, here are three mex files: (1) initialize cula, (2) run cula routine, (3) shutdown cula. After running these three in sequence (which gives the correct answers from sgesv), I can then clear "start_cula" and clear "gpu_sgesv", but clear "stop_cula" causes the usual "Segmentation fault" and the quiet dump to the bash shell.

The "gpu_sgesv" routine has had all Initialize and Shutdown calls commented out.

I would like to add that the issue here is along the lines of obsessive compulsive, rather than mission critical... Still, it would be nice to have it sorted out.

start_cula:
Code: Select all
#include "mex.h"
#include "cublas.h"
#include "cuda.h"
#include "cula.h"
#include "cula_lapack_device.h"

/* --------------------------- host code ------------------------------*/


/*  start_cula.cu -
    Makes a call to culaInitialize.
    No input, no output.


  B. Cylon  08/2012
*/

void mexFunction( int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])
{

      fprintf (stderr,"*************  Starting CULA  *************\n");
       culaInitialize();
      fprintf (stderr,"*************  Done!  *************\n");

}             


gpu_sgesv:
Code: Select all
#include "mex.h"
#include "cublas.h"
#include "cula.h"
#include "cula_lapack_device.h"
#include "cuda.h"
#include "sys/time.h"

static int initialized = 0;

void cleanup(void) {
   mexPrintf("MEX-file is terminating, exiting CUDA Thread\n");
   // cublasShutdown();
   //  culaShutdown();
   //  cudaDeviceReset();
   //  culaFreeBuffers();
   // culaShutdown();
   mexPrintf("Exited CUDA Thread.\n");
}

void mexFunction( int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])

{
      int I,L;
      int Ic,Lc;
      mwSize dims0[2];

      // CUDA/GPU VARIABLES %%%%%%%%%%%%%%%%%%%%%%%%
      int *ipiv;
      float *ga, *gb;

      // INPUT VARIABLES   %%%%%%%%%%%%%%%%%%%%%%%%%
      // A is dimensioned LXL
      // B is dimensioned LXI
      float *A,*B;
 
      // OUTPUT VARIABLE, X=A\B   %%%%%%%%%%%%%%%%%%
      float *X;


      if (nrhs != 2) {
         mexErrMsgTxt("gpu_sgesv requires 2 input arguments");
      } else if (nlhs != 1) {
         mexErrMsgTxt("gpu_sgesv requires 1 output argument");
      }

      if ( !mxIsSingle(prhs[0]) || !mxIsSingle(prhs[1]) ) {
           mexErrMsgTxt("Input arrays must be single precision.");
      }


// %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
// Single-precision input arrays */
// Dimensions, and then array data
      L = mxGetN(prhs[0]);
      I = mxGetN(prhs[1]);
      A =   (float*) mxGetData(prhs[0]);
      B =   (float*) mxGetData(prhs[1]);

// Left hand side matrix set up    (the solution) 
      dims0[0]=L;
      dims0[1]=I;

      plhs[0] = mxCreateNumericArray(2,dims0,mxSINGLE_CLASS,mxREAL);
      X = (float*) mxGetData(plhs[0]);

      // Make modulo 32 dimensions  - speeds up the sgemm calculations significantly
      // Just used as an example here.
       Ic=I+(32-I%32);
       Lc=L+(32-L%32);

       if (!initialized) {
         printf("Initializing for CULA...\n");
      //   cudaSetDevice(1);
      //   cublasInit();
      //    culaInitialize();
      //    Early exit if CULA fails to initialize (no GPU, etc)
          mexAtExit(cleanup);
          initialized = 1; };

       cudaMalloc ((void**)&ga,Lc*Lc*sizeof(float));
       cudaMemset(ga,0,Lc*Lc*sizeof(float));  /* zero these since we've padded them */
       cudaMalloc ((void**)&gb,Lc*Ic*sizeof(float));
       cudaMemset(gb,0,Lc*Ic*sizeof(float));

       cublasSetMatrix (L, L, sizeof(float), A, L, (void*)ga, Lc);
       cublasSetMatrix (L, I, sizeof(float), B, L, (void*)gb, Lc);

    // Allocate for ipiv - a working matrix used by sgesv, and ignored here.
       cudaMalloc ((void**)&ipiv,L*sizeof(int));
       cudaMemset(ipiv,0,L*sizeof(int));

    // First numbers L, I pertain only to the non-padded sections of the arrays.
       culaDeviceSgesv(L,I,ga,Lc,ipiv,gb,Lc);

    // Get the solution off the GPU
       cublasGetMatrix (L, I, sizeof(float), gb, Lc, X, L);
    // X has the solution we need; now back to matlab after a bit of clean up.

    // Clear the variables to avoid GPU memory leak (and GPU crash!)

       cudaFree (ga);
       cudaFree (gb);
       cudaFree (ipiv);

       culaFreeBuffers();
}


end_cula:
Code: Select all
#include "mex.h"
#include "cublas.h"
#include "cuda.h"
#include "cula.h"
#include "cula_lapack_device.h"

/* --------------------------- host code ------------------------------*/


/*  end_cula.cu -
    Makes a call to culaShutdown to shut it all down.
    No input, no output.


  B. Cylon  08/2012
*/

void mexFunction( int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])
{

      fprintf (stderr,"*************  Stopping CULA  *************\n");
       culaShutdown();
      fprintf (stderr,"*************  Done!  *************\n");

}             
Boxed Cylon
 
Posts: 48
Joined: Fri Oct 16, 2009 8:57 pm

Re: Matlab and sgesv and segmentation faults

Postby john » Thu Aug 30, 2012 1:18 pm

Good news - this finally reproduces it for me. I'll update when I know more.

Edit: I take that back. This only crashes for me if I skip the end_cula call.
john
Administrator
 
Posts: 587
Joined: Thu Jul 23, 2009 2:31 pm

Next

Return to CULA Dense Support

Who is online

Users browsing this forum: No registered users and 3 guests

cron