## LU decomposition in CULA

16 posts
• Page

**1**of**2**•**1**, 2### LU decomposition in CULA

Hi,

For a large sparse matrix, there are some tricks to accelerate LU decomposition.

I am wondering if the current LU decomposition solver DGETRF has already incorporate those tricks or it is totally based on the algorithm designed for dense matrix. If not, will it be considered to be the future feature? Thanks

Yu

For a large sparse matrix, there are some tricks to accelerate LU decomposition.

I am wondering if the current LU decomposition solver DGETRF has already incorporate those tricks or it is totally based on the algorithm designed for dense matrix. If not, will it be considered to be the future feature? Thanks

Yu

- athlonshi
**Posts:**10**Joined:**Wed Apr 28, 2010 6:03 am**Location:**Cambridge, MA

### Re: LU decomposition in CULA

I have noticed that there was a discussion regarding if the CULA package had implemented sparse matrix solvers in last year on this forum. The answer was no.

I am a CULA 2.0 premium user and I did see significant speed-up using CULA LU decomposition solver DGETRF compared to DGEFA in LAPACK. But since my matrix is a sparse one, I also used a sparse matrix solver to compare and found no advantage of using CULA.

If the solvers in CULA 2.0 are designed for general matrix operation, that means there are still a lot of room to further improve their performance, which is fantastic.

So, I just want to confirm that CULA 2.0 has not yet implemented any sparse matrix solver. And I hope this has been considered by CULA developers. Thanks!

Yu

I am a CULA 2.0 premium user and I did see significant speed-up using CULA LU decomposition solver DGETRF compared to DGEFA in LAPACK. But since my matrix is a sparse one, I also used a sparse matrix solver to compare and found no advantage of using CULA.

If the solvers in CULA 2.0 are designed for general matrix operation, that means there are still a lot of room to further improve their performance, which is fantastic.

So, I just want to confirm that CULA 2.0 has not yet implemented any sparse matrix solver. And I hope this has been considered by CULA developers. Thanks!

Yu

- athlonshi
**Posts:**10**Joined:**Wed Apr 28, 2010 6:03 am**Location:**Cambridge, MA

### Re: LU decomposition in CULA

You are correct in all your assumptions of the LU decomposition in CULA; the solver is designed for large dense matrices and does not consider sparseness. However, we are working on specialized sparse solvers (iterative and direct) that will be available in a later version of CULA.

- kyle
- Administrator
**Posts:**301**Joined:**Fri Jun 12, 2009 7:47 pm

### Re: LU decomposition in CULA

Regarding your problem specifically, what sparse solver are you currently using?

- kyle
- Administrator
**Posts:**301**Joined:**Fri Jun 12, 2009 7:47 pm

### Re: LU decomposition in CULA

Hi, Kyle:

Thanks for your reply!

I am not quite sure about the details of the sparse matrix solver that I am using.

It is integrated in a commercial software specifically for my research problem.

Yu

Thanks for your reply!

I am not quite sure about the details of the sparse matrix solver that I am using.

It is integrated in a commercial software specifically for my research problem.

Yu

- athlonshi
**Posts:**10**Joined:**Wed Apr 28, 2010 6:03 am**Location:**Cambridge, MA

### Re: LU decomposition in CULA

If you could PM myself and/or Kyle with the name of the commercial package, it might be helpful.

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: LU decomposition in CULA

Hi I have some problems with LU decomposition in CULA BASIC, in my opinion counts wrong or my implementation is badly . Could someone help me see the error?

int main()

{

cudaError_t err;

culaStatus status;

// point to host memory

culaFloat* A = NULL;

culaInt* TAU = NULL;

int i;

// point to device memory

culaFloat* Ad = NULL;

culaInt* TAUd = NULL;

printf("Allocating Matrices\n");

A = (culaFloat*)malloc(N*N*sizeof(A[0]));

TAU = (culaInt*)malloc(N*sizeof(TAU[0]));

if(!A || !TAU)

exit(EXIT_FAILURE);

memset(A, 0, N*N*sizeof(A[0]));

memset(TAU, 0, N*sizeof(TAU[0]));

A[0]=3;

A[1]=3;

A[2]=0;

A[3]=0;

A[4]=2;

A[5]=2;

A[6]=1;

A[7]=0;

A[8]=1;

printf("\n\n");

for ( i=0; i<N*N;++i)

{

printf("%f, ",A[i]);

if ( (i+1) % N == 0 )

{

printf("\n");

}

}

/* Allocate device memory for the matrices */

err = cudaMalloc((void**)&Ad, N*N*sizeof(Ad[0]));

checkCudaError(err);

err = cudaMalloc((void**)&TAUd, N*sizeof(TAUd[0]));

checkCudaError(err);

printf("Initializing CULA\n");

status = culaInitialize();

checkStatus(status);

err = cudaMemcpy(Ad, A, N*N*sizeof(Ad[0]), cudaMemcpyHostToDevice);

checkCudaError(err);

err = cudaMemcpy(TAUd, TAU, N*sizeof(TAUd[0]), cudaMemcpyHostToDevice);

checkCudaError(err);

printf("Calling culaDeviceSgetrf\n");

status = culaDeviceSgetrf(N, N, Ad, N, TAUd);

checkStatus(status);

err = cudaMemcpy(A, Ad, N*N*sizeof(Ad[0]), cudaMemcpyDeviceToHost);

checkCudaError(err);

printf("\n\n");

for ( i=0; i<N*N;++i)

{

printf("%f, ",A[i]);

if ( (i+1) % N == 0 )

{

printf("\n");

}

}

err = cudaMemcpy(TAU, TAUd, N*sizeof(TAUd[0]), cudaMemcpyDeviceToHost);

checkCudaError(err);

printf("\n\n");

for ( i=0; i<N;++i)

{

printf("%d, ",TAU[i]);

if ( (i+1) % N == 0 )

{

printf("\n");

}

}

printf("Shutting down CULA\n");

culaShutdown();

free(A);

free(TAU);

cudaFree(Ad);

cudaFree(TAUd);

getchar();

return EXIT_SUCCESS;

}

int main()

{

cudaError_t err;

culaStatus status;

// point to host memory

culaFloat* A = NULL;

culaInt* TAU = NULL;

int i;

// point to device memory

culaFloat* Ad = NULL;

culaInt* TAUd = NULL;

printf("Allocating Matrices\n");

A = (culaFloat*)malloc(N*N*sizeof(A[0]));

TAU = (culaInt*)malloc(N*sizeof(TAU[0]));

if(!A || !TAU)

exit(EXIT_FAILURE);

memset(A, 0, N*N*sizeof(A[0]));

memset(TAU, 0, N*sizeof(TAU[0]));

A[0]=3;

A[1]=3;

A[2]=0;

A[3]=0;

A[4]=2;

A[5]=2;

A[6]=1;

A[7]=0;

A[8]=1;

printf("\n\n");

for ( i=0; i<N*N;++i)

{

printf("%f, ",A[i]);

if ( (i+1) % N == 0 )

{

printf("\n");

}

}

/* Allocate device memory for the matrices */

err = cudaMalloc((void**)&Ad, N*N*sizeof(Ad[0]));

checkCudaError(err);

err = cudaMalloc((void**)&TAUd, N*sizeof(TAUd[0]));

checkCudaError(err);

printf("Initializing CULA\n");

status = culaInitialize();

checkStatus(status);

err = cudaMemcpy(Ad, A, N*N*sizeof(Ad[0]), cudaMemcpyHostToDevice);

checkCudaError(err);

err = cudaMemcpy(TAUd, TAU, N*sizeof(TAUd[0]), cudaMemcpyHostToDevice);

checkCudaError(err);

printf("Calling culaDeviceSgetrf\n");

status = culaDeviceSgetrf(N, N, Ad, N, TAUd);

checkStatus(status);

err = cudaMemcpy(A, Ad, N*N*sizeof(Ad[0]), cudaMemcpyDeviceToHost);

checkCudaError(err);

printf("\n\n");

for ( i=0; i<N*N;++i)

{

printf("%f, ",A[i]);

if ( (i+1) % N == 0 )

{

printf("\n");

}

}

err = cudaMemcpy(TAU, TAUd, N*sizeof(TAUd[0]), cudaMemcpyDeviceToHost);

checkCudaError(err);

printf("\n\n");

for ( i=0; i<N;++i)

{

printf("%d, ",TAU[i]);

if ( (i+1) % N == 0 )

{

printf("\n");

}

}

printf("Shutting down CULA\n");

culaShutdown();

free(A);

free(TAU);

cudaFree(Ad);

cudaFree(TAUd);

getchar();

return EXIT_SUCCESS;

}

- luk4szz
**Posts:**2**Joined:**Tue Oct 19, 2010 1:11 am

### Re: LU decomposition in CULA

luk4szz wrote: Could someone help me see the error?

Is there an error? Remember that CULA expects your data to be column major, not row major like you are printing to the screen.

- kyle
- Administrator
**Posts:**301**Joined:**Fri Jun 12, 2009 7:47 pm

### Re: LU decomposition in CULA

Hello to all,

I am a new user. I am using CULA from UBUNTU. I used the following function to do simple inverse of a matrix. Then, I need to use this inverse matrix for other purpose. But I am not sure how get/extract the inverse matrix. Because the function SGETRI does not returning any pointer. It is not taking any extra pointer even to store the output. It seems to me that the input matrix remains same after computation.

Thanks in advance.

-Jayanta

**************************the program I used for your reference ***************************

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <cula.h>

void checkStatus(culaStatus status)

{

if(!status)

return;

if(status == culaArgumentError)

printf("Invalid value for parameter %d\n", culaGetErrorInfo());

else if(status == culaDataError)

printf("Data error (%d)\n", culaGetErrorInfo());

else if(status == culaBlasError)

printf("Blas error (%d)\n", culaGetErrorInfo());

else if(status == culaRuntimeError)

printf("Runtime error (%d)\n", culaGetErrorInfo());

else

printf("%s\n", culaGetStatusString(status));

culaShutdown();

exit(EXIT_FAILURE);

}

int main() //int argc, char** argv

{

culaStatus status;

status = culaInitialize();

checkStatus(status);

/*

if (!culaInitialize())

{ printf("Error in initializing CULA\n");

return 1;

} */

int N=2,i;

culaFloat* A = NULL;

//culaFloat* B = NULL;

culaInt* TAU = NULL;

//culaFloat* TAU = NULL;

//float *B;

//B=(float*)malloc(N*N*sizeof(float));

A = (culaFloat*)malloc(N*N*sizeof(culaFloat));

//B = (culaFloat*)malloc(N*N*sizeof(culaFloat));

TAU = (culaInt*)malloc(N*sizeof(culaInt));

if(!A || !TAU)

exit(EXIT_FAILURE);

// memset(A, 0, N*N*sizeof(culaFloat));

//printf("type the elements:\n");

//for (i=0;i<N*N;i++) scanf("%f",A+i);

A[0] = 3;

A[1] = -4;

A[2] = -4;

A[3] = 6;

culaSgetri(N, A, N, TAU);

culaShutdown();

for (i=0;i<N*N;i++) printf("%f\t", *(A+i));

printf("\n\n");

free(A);

// free(B);

free(TAU);

return 0;

}

I am a new user. I am using CULA from UBUNTU. I used the following function to do simple inverse of a matrix. Then, I need to use this inverse matrix for other purpose. But I am not sure how get/extract the inverse matrix. Because the function SGETRI does not returning any pointer. It is not taking any extra pointer even to store the output. It seems to me that the input matrix remains same after computation.

Thanks in advance.

-Jayanta

**************************the program I used for your reference ***************************

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <cula.h>

void checkStatus(culaStatus status)

{

if(!status)

return;

if(status == culaArgumentError)

printf("Invalid value for parameter %d\n", culaGetErrorInfo());

else if(status == culaDataError)

printf("Data error (%d)\n", culaGetErrorInfo());

else if(status == culaBlasError)

printf("Blas error (%d)\n", culaGetErrorInfo());

else if(status == culaRuntimeError)

printf("Runtime error (%d)\n", culaGetErrorInfo());

else

printf("%s\n", culaGetStatusString(status));

culaShutdown();

exit(EXIT_FAILURE);

}

int main() //int argc, char** argv

{

culaStatus status;

status = culaInitialize();

checkStatus(status);

/*

if (!culaInitialize())

{ printf("Error in initializing CULA\n");

return 1;

} */

int N=2,i;

culaFloat* A = NULL;

//culaFloat* B = NULL;

culaInt* TAU = NULL;

//culaFloat* TAU = NULL;

//float *B;

//B=(float*)malloc(N*N*sizeof(float));

A = (culaFloat*)malloc(N*N*sizeof(culaFloat));

//B = (culaFloat*)malloc(N*N*sizeof(culaFloat));

TAU = (culaInt*)malloc(N*sizeof(culaInt));

if(!A || !TAU)

exit(EXIT_FAILURE);

// memset(A, 0, N*N*sizeof(culaFloat));

//printf("type the elements:\n");

//for (i=0;i<N*N;i++) scanf("%f",A+i);

A[0] = 3;

A[1] = -4;

A[2] = -4;

A[3] = 6;

culaSgetri(N, A, N, TAU);

culaShutdown();

for (i=0;i<N*N;i++) printf("%f\t", *(A+i));

printf("\n\n");

free(A);

// free(B);

free(TAU);

return 0;

}

- jayanta
**Posts:**3**Joined:**Thu Jun 24, 2010 4:00 pm

### Re: LU decomposition in CULA

getri inverts a matrix inplace, and also requires that you first call getrf on that matrix. Your program should read:

If you are using the inverse to solve a system A*x=b as inv(A)*b, then you should instead use the gesv function in CULA Basic or getrf (factorize) and getrs (backsolve).

- Code: Select all
`getrf(N,N,A,N,ipiv);`

getri(N,A,N,ipiv);

If you are using the inverse to solve a system A*x=b as inv(A)*b, then you should instead use the gesv function in CULA Basic or getrf (factorize) and getrs (backsolve).

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: LU decomposition in CULA

Thanks a lot. It worked like magic.

- jayanta
**Posts:**3**Joined:**Thu Jun 24, 2010 4:00 pm

### Re: LU decomposition in CULA

Hi,

Does new version CULA R10 provide optimized LU decomposition for sparse matrices ?

In general, does CULA provide optimized sparse matrix-vector operations ?

Regards,

Sachin

Does new version CULA R10 provide optimized LU decomposition for sparse matrices ?

In general, does CULA provide optimized sparse matrix-vector operations ?

Regards,

Sachin

- avatar
**Posts:**4**Joined:**Tue Nov 23, 2010 3:30 am

### Re: LU decomposition in CULA

avatar wrote:Hi,

Does new version CULA R10 provide optimized LU decomposition for sparse matrices ?

In general, does CULA provide optimized sparse matrix-vector operations ?

Regards,

Sachin

CULA R10 does not provide this, except in the case of banded matrices. Sparse matrix operations will be featured in the coming year. For sparse matrix-vector, please see the CUSPARSE package in your CUDA toolkit.

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: LU decomposition in CULA

Do we need to use both calls

to get invert a matrix? Can we call culaDeviceSegetri with the free version?

- Code: Select all
`getrf(N,N,A,N,ipiv);`

getri(N,A,N,ipiv);

to get invert a matrix? Can we call culaDeviceSegetri with the free version?

- luiceur
**Posts:**4**Joined:**Fri Jul 06, 2012 4:06 am

16 posts
• Page

**1**of**2**•**1**, 2Return to General CULA Discussion

### Who is online

Users browsing this forum: No registered users and 3 guests