## sgesv in 1.1 is slow...

### Re:sgesv in 1.1 is slow...

Just wanted to add that the preferred method of reporting your Matlab version is to copy-paste the output of the Matlab "version" command.

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re:sgesv in 1.1 is slow...

Reporting result based on:

Software:

Matlab: 2009b

Windows XP32.

CULA 1.2 Premium.

Hardware:

GPU: GeForce GTX 285

CPU: intel Xeon X5450 3.00Ghz

Problem: Find x in Ax=b, if you find it, make it faster...

Observation: Speedup is obtained for systems larger than 1500 usnig CULA 1.2 (not CUDA at all). Otherwise Matlab is faster, for small systems (e.g. size(A)<200) Matlab is extremely faster than CULA.

Q: Is it Matlab (R2009b) which slows down the process?

A: I don't think so. I've been trying some other mex functions (other methods running on CPU) for the same problem, I've not seen that overhead cost.

Source code:

Note: Just single/COMPLEX precisions.

// CULASV computes the solution to a system of linear equation. Ax=B on CULA

//

// Input:

// A: Coefficient Matrix: single/complex precision LxL

// B: single/complex precision LxI

// Output:

// X: single/complex precision LxI

//

// Calling from Matlab

// X = culasv(single(A),single(B))

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include "mex.h"

#include "culapack.h"

void checkStatus(culaStatus status) {

if(!status)

return;

if(status == culaArgumentError)

mexPrintf("Invalid value for parameter %d\n", culaGetErrorInfo());

else if(status == culaRuntimeError)

mexPrintf("Runtime error (%d)\n", culaGetErrorInfo());

else

mexPrintf("%s\n", culaGetStatusString(status));

culaShutdown();

mexErrMsgTxt("CULA error!");

}

void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])

{

// Input parameters: size(A) = LxL; size(B) = LxI

int ii, jj;

int L, I;

const mwSize *dims;

culaFloatComplex* A;

culaFloatComplex* B;

float *Ar, *Ai;

float *Br, *Bi;

// output X = Xr + Xi*i

float *Xr, *Xi;

// CUDA variables

culaInt* ipiv = 0;

culaStatus status;

// check the number of passing data

if (nrhs != 2) {

mexErrMsgTxt("Need two input arguments.");

}

if (nlhs != 1) {

mexErrMsgTxt("Only one output argument allowed.");

}

if (mxGetNumberOfDimensions(prhs[0]) != 2) {

mexErrMsgTxt("2D metrix requaired.");

}

// Get the dimensions

dims = mxGetDimensions(prhs[1]);

L = dims[0];

I = dims[1];

// Get pointers to the real and imaginary parts of the inputs

Ar = (float*)mxGetPr(prhs[0]);

Ai = (float*)mxGetPi(prhs[0]);

Br = (float*)mxGetPr(prhs[1]);

Bi = (float*)mxGetPi(prhs[1]);

A = (culaFloatComplex*)mxMalloc(L*L*sizeof(culaFloatComplex));

B = (culaFloatComplex*)mxMalloc(L*I*sizeof(culaFloatComplex));

// output dimension

// the solution

plhs[0] = mxCreateNumericArray(2, dims, mxSINGLE_CLASS, mxCOMPLEX);

Xr = (float*)mxGetPr(plhs[0]);

Xi = (float*)mxGetPi(plhs[0]);

// Allocate for ipiv - a working matrix used by sgesv

ipiv = (culaInt*)mxMalloc(L*sizeof(culaInt));

//------------------------

status = culaInitialize();

checkStatus(status);

// Set matrix A,B

for(ii = 0; ii < L*L; ii++) {

A[ii].x = Ar[ii];

A[ii].y = Ai[ii];

}

for(ii = 0; ii < L*I; ii++) {

B[ii].x = Br[ii];

B[ii].y = Bi[ii];

}

// Set ipiv to 0

memset(ipiv, 0, L*sizeof(culaInt));

// Calling culaCgesv

status = culaCgesv(L, I, A, L, ipiv, B, L);

checkStatus(status);

// set mex output

for(ii = 0; ii < I; ii++) {

for(jj = 0; jj < L ; jj++) {

*Xr++ = B[jj+L*ii].x;

*Xi++ = B[jj+L*ii].y;

}

}

mxFree(ipiv);

mxFree(A);

mxFree(B);

culaFreeBuffers();

//Shutdown CULA

culaShutdown();

}

Q: why am I doin' this?

A: I need to speedup Ax=b when A is not larger than 200.

Final Q: Any suggestion to solve the problem (size(A)~200) deploying GPU computing? jpeinado would you please give some digits/details what you mean by: CULAPACK (sgetrf) + CUBLAS (triangular systems) = OK

A: leave it for you...

BR //CJ

Software:

Matlab: 2009b

Windows XP32.

CULA 1.2 Premium.

Hardware:

GPU: GeForce GTX 285

CPU: intel Xeon X5450 3.00Ghz

Problem: Find x in Ax=b, if you find it, make it faster...

Observation: Speedup is obtained for systems larger than 1500 usnig CULA 1.2 (not CUDA at all). Otherwise Matlab is faster, for small systems (e.g. size(A)<200) Matlab is extremely faster than CULA.

Q: Is it Matlab (R2009b) which slows down the process?

A: I don't think so. I've been trying some other mex functions (other methods running on CPU) for the same problem, I've not seen that overhead cost.

Source code:

Note: Just single/COMPLEX precisions.

// CULASV computes the solution to a system of linear equation. Ax=B on CULA

//

// Input:

// A: Coefficient Matrix: single/complex precision LxL

// B: single/complex precision LxI

// Output:

// X: single/complex precision LxI

//

// Calling from Matlab

// X = culasv(single(A),single(B))

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include "mex.h"

#include "culapack.h"

void checkStatus(culaStatus status) {

if(!status)

return;

if(status == culaArgumentError)

mexPrintf("Invalid value for parameter %d\n", culaGetErrorInfo());

else if(status == culaRuntimeError)

mexPrintf("Runtime error (%d)\n", culaGetErrorInfo());

else

mexPrintf("%s\n", culaGetStatusString(status));

culaShutdown();

mexErrMsgTxt("CULA error!");

}

void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])

{

// Input parameters: size(A) = LxL; size(B) = LxI

int ii, jj;

int L, I;

const mwSize *dims;

culaFloatComplex* A;

culaFloatComplex* B;

float *Ar, *Ai;

float *Br, *Bi;

// output X = Xr + Xi*i

float *Xr, *Xi;

// CUDA variables

culaInt* ipiv = 0;

culaStatus status;

// check the number of passing data

if (nrhs != 2) {

mexErrMsgTxt("Need two input arguments.");

}

if (nlhs != 1) {

mexErrMsgTxt("Only one output argument allowed.");

}

if (mxGetNumberOfDimensions(prhs[0]) != 2) {

mexErrMsgTxt("2D metrix requaired.");

}

// Get the dimensions

dims = mxGetDimensions(prhs[1]);

L = dims[0];

I = dims[1];

// Get pointers to the real and imaginary parts of the inputs

Ar = (float*)mxGetPr(prhs[0]);

Ai = (float*)mxGetPi(prhs[0]);

Br = (float*)mxGetPr(prhs[1]);

Bi = (float*)mxGetPi(prhs[1]);

A = (culaFloatComplex*)mxMalloc(L*L*sizeof(culaFloatComplex));

B = (culaFloatComplex*)mxMalloc(L*I*sizeof(culaFloatComplex));

// output dimension

// the solution

plhs[0] = mxCreateNumericArray(2, dims, mxSINGLE_CLASS, mxCOMPLEX);

Xr = (float*)mxGetPr(plhs[0]);

Xi = (float*)mxGetPi(plhs[0]);

// Allocate for ipiv - a working matrix used by sgesv

ipiv = (culaInt*)mxMalloc(L*sizeof(culaInt));

//------------------------

status = culaInitialize();

checkStatus(status);

// Set matrix A,B

for(ii = 0; ii < L*L; ii++) {

A[ii].x = Ar[ii];

A[ii].y = Ai[ii];

}

for(ii = 0; ii < L*I; ii++) {

B[ii].x = Br[ii];

B[ii].y = Bi[ii];

}

// Set ipiv to 0

memset(ipiv, 0, L*sizeof(culaInt));

// Calling culaCgesv

status = culaCgesv(L, I, A, L, ipiv, B, L);

checkStatus(status);

// set mex output

for(ii = 0; ii < I; ii++) {

for(jj = 0; jj < L ; jj++) {

*Xr++ = B[jj+L*ii].x;

*Xi++ = B[jj+L*ii].y;

}

}

mxFree(ipiv);

mxFree(A);

mxFree(B);

culaFreeBuffers();

//Shutdown CULA

culaShutdown();

}

Q: why am I doin' this?

A: I need to speedup Ax=b when A is not larger than 200.

Final Q: Any suggestion to solve the problem (size(A)~200) deploying GPU computing? jpeinado would you please give some digits/details what you mean by: CULAPACK (sgetrf) + CUBLAS (triangular systems) = OK

A: leave it for you...

BR //CJ

- cjest
**Posts:**12**Joined:**Wed Feb 10, 2010 3:01 pm

### Re:sgesv in 1.1 is slow...

Just to report that Cula 1.2 Sgesv seems to be slow with:

Suse linux 11.1 64-bit

CUDA 3.0 (Yes, I know this is not yet supported...CUDA 2.3 gave a slow result also, I believe)

GTX 260

Matlab 7.9.0.529 (R2009b) 64-bit

Perhaps it is Matlab 32-bit vs. 64-bit that is the issue? This is a strange problem... (Perhaps there is a 32-bit flag that one could set when compiling the mex file?)

Suse linux 11.1 64-bit

CUDA 3.0 (Yes, I know this is not yet supported...CUDA 2.3 gave a slow result also, I believe)

GTX 260

Matlab 7.9.0.529 (R2009b) 64-bit

Perhaps it is Matlab 32-bit vs. 64-bit that is the issue? This is a strange problem... (Perhaps there is a 32-bit flag that one could set when compiling the mex file?)

- Boxed Cylon
**Posts:**48**Joined:**Fri Oct 16, 2009 8:57 pm

### Re:sgesv in 1.1 is slow...

Q: why am I doin' this?

A: I need to speedup Ax=b when A is not larger than 200.

Final Q: Any suggestion to solve the problem (size(A)~200) deploying GPU computing?

@cjest,

Unfortunately, this is not something that in this form can be solved by GPU computing. There are just too many overheads involved in getting the data to the GPU and getting it back. For example, for a 256x256 matrix, it takes 4 times as long just to download/upload the data to/from the GPU than the CPU takes the complete the calculation. Even if the GPU were infinitely fast in its calculation, the overall computation would be 4 times longer for this problem size.

The only chance that GPU computing would have for problems of this size is if you needed to solve multiple of these small matrices in parallel. In this case you could share the overhead involved in downloading and uploading and better utilize the GPUs parallel resources to get an overall speedup.

Why is it that you need to speed up the solve of matrices that are so small?

Dan

- dan
- Administrator
**Posts:**61**Joined:**Thu Jul 23, 2009 2:29 pm

### Re:sgesv in 1.1 is slow...

Boxed Cylon wrote:Just to report that Cula 1.2 Sgesv seems to be slow with:

Suse linux 11.1 64-bit

CUDA 3.0 (Yes, I know this is not yet supported...CUDA 2.3 gave a slow result also, I believe)

GTX 260

Matlab 7.9.0.529 (R2009b) 64-bit

Perhaps it is Matlab 32-bit vs. 64-bit that is the issue? This is a strange problem... (Perhaps there is a 32-bit flag that one could set when compiling the mex file?)

@Boxed Cylon

My results show a good result on 64-bit Windows but it's possible that this is a 64-bit issue on Linux. We have a 64-bit Ubuntu machine that I can give this a try on. If we don't see it there it could be a Suse-specific issue.

Dan

- dan
- Administrator
**Posts:**61**Joined:**Thu Jul 23, 2009 2:29 pm

### Re:sgesv in 1.1 is slow...

Hello:

Well, my results are done in a 64 bit machine using (I must ask this) a CentOS version....

About the MATLAB problem. Yes there is a problem with MATLAB...using the hybrid algorithms because as far as I know MATLAB uses a special 64 bit LAPACK version. And then, hybrid algorithms dont work in MATLAB.

Also, the problem is with CULA, because in results present before, when using CULA, the results are bad, But when not using CULA (the UJI libraries only use CUBLAS) the results are good....

Cjest, thank you very much for your code.. I will test as soon as possible

Dan, thank you very much for your work. About my results, they are done in:

Soft:

CentOS 64 bit

I am working with MATLAB R2009b.

CUDA 2.3

Hard:

Intel E5430 (2.6Ghz)

Quadro FX5800

jpeinado

Well, my results are done in a 64 bit machine using (I must ask this) a CentOS version....

About the MATLAB problem. Yes there is a problem with MATLAB...using the hybrid algorithms because as far as I know MATLAB uses a special 64 bit LAPACK version. And then, hybrid algorithms dont work in MATLAB.

Also, the problem is with CULA, because in results present before, when using CULA, the results are bad, But when not using CULA (the UJI libraries only use CUBLAS) the results are good....

Cjest, thank you very much for your code.. I will test as soon as possible

Dan, thank you very much for your work. About my results, they are done in:

Soft:

CentOS 64 bit

I am working with MATLAB R2009b.

CUDA 2.3

Hard:

Intel E5430 (2.6Ghz)

Quadro FX5800

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

jpeinado wrote:About the MATLAB problem. Yes there is a problem with MATLAB...using the hybrid algorithms because as far as I know MATLAB uses a special 64 bit LAPACK version. And then, hybrid algorithms dont work in MATLAB.

@jpeinado

It is true that CULA's routines are hybrid algorithms, but as our results (and some other users') have shown, it is not the case that there is a slowdown in all versions of Matlab. Right now the common thread appears to be 64-bit Linux versions of Matlab. I'm planning on getting this tested ASAP so when I get some results I'll post them here.

Dan

- dan
- Administrator
**Posts:**61**Joined:**Thu Jul 23, 2009 2:29 pm

### Re:sgesv in 1.1 is slow...

dan wrote:jpeinado wrote:About the MATLAB problem. Yes there is a problem with MATLAB...using the hybrid algorithms because as far as I know MATLAB uses a special 64 bit LAPACK version. And then, hybrid algorithms dont work in MATLAB.

@jpeinado

It is true that CULA's routines are hybrid algorithms, but as our results (and some other users') have shown, it is not the case that there is a slowdown in all versions of Matlab. Right now the common thread appears to be 64-bit Linux versions of Matlab. I'm planning on getting this tested ASAP so when I get some results I'll post them here.

Dan

Thank you very much Dan

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

So, my testing in Matlab 7.9 on Ubuntu 9.10 64-bit has shown no slowdown.

One of the things I've noticed is that the Matlab installer gave you the option of selecting an architecture. I chose x64 (the default) to match the platform; for those that are seeing problems, is it possible you selected something other than this?

Also, when you're doing your runtime link, are you making sure to link against the 64-bit versions of the CULA libs (/usr/local/cula/lib64), as opposed to the 32-bit versions (/usr/local/cula/lib) ?

@jpeinado

Can you find out what version of centos you are using?

One of the things I've noticed is that the Matlab installer gave you the option of selecting an architecture. I chose x64 (the default) to match the platform; for those that are seeing problems, is it possible you selected something other than this?

Also, when you're doing your runtime link, are you making sure to link against the 64-bit versions of the CULA libs (/usr/local/cula/lib64), as opposed to the 32-bit versions (/usr/local/cula/lib) ?

@jpeinado

Can you find out what version of centos you are using?

- dan
- Administrator
**Posts:**61**Joined:**Thu Jul 23, 2009 2:29 pm

### Re:sgesv in 1.1 is slow...

dan wrote:So, my testing in Matlab 7.9 on Ubuntu 9.10 64-bit has shown no slowdown.

Happy to hear this...

One of the things I've noticed is that the Matlab installer gave you the option of selecting an architecture. I chose x64 (the default) to match the platform; for those that are seeing problems, is it possible you selected something other than this?

No, in fact, the compiled MEX file changes the name if you uses the nvidia MATLAB plugin, with the corresponding Makefile. The extension is .mexa64

Also, when you're doing your runtime link, are you making sure to link against the 64-bit versions of the CULA libs (/usr/local/cula/lib64), as opposed to the 32-bit versions (/usr/local/cula/lib) ?

I have this pointing to /usr/local/cula/lib64

@jpeinado

Can you find out what version of centos you are using?

Yes, I will talk with the system administrator as soon as possible, and I will tell you this.

By the way, could your put your programs (makefiles, mex, etc) in a zip file to execute them in my system.

Than you very much

jpeinado

P.D. In next days I wil get a machine with a 64 bit Ubuntu version

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

@Dan

One test we could run is to have you compile the attached code on your Ubuntu 64-bit installation, and have me run it on my machine (who knows?) At least it will determine whether the problem lies in the compiler environment or the runtime environment, maybe.

This code is my debugging/timing routine - it just calculates X=A\B on the GPU. Call it from matlab with [X]= gpu_sgesv(A,B ) ;

[file name=gpu_sgesv-20100303.txt size=3876]http://www.culatools.com/images/fbfiles/files/gpu_sgesv-20100303.txt[/file]

Rename the file to gpu_sgesv.cu ...

One test we could run is to have you compile the attached code on your Ubuntu 64-bit installation, and have me run it on my machine (who knows?) At least it will determine whether the problem lies in the compiler environment or the runtime environment, maybe.

This code is my debugging/timing routine - it just calculates X=A\B on the GPU. Call it from matlab with [X]= gpu_sgesv(A,B ) ;

[file name=gpu_sgesv-20100303.txt size=3876]http://www.culatools.com/images/fbfiles/files/gpu_sgesv-20100303.txt[/file]

Rename the file to gpu_sgesv.cu ...

- Boxed Cylon
**Posts:**48**Joined:**Fri Oct 16, 2009 8:57 pm

### Re:sgesv in 1.1 is slow...

dan wrote:@jpeinado

Can you find out what version of centos you are using?

5.2

jpeinado

- jpeinado
**Posts:**37**Joined:**Mon Sep 14, 2009 10:48 am

### Re:sgesv in 1.1 is slow...

dan wrote:

Why is it that you need to speed up the solve of matrices that are so small?

I do have a "for-loop" for the system solver with small matrices. Therefor I do need to solve multiple of small matrices in parallel. The exe. time over the entire loop needs to be increased.

Any hint?

Can i use Cgesv then?

Why is it that you need to speed up the solve of matrices that are so small?

I do have a "for-loop" for the system solver with small matrices. Therefor I do need to solve multiple of small matrices in parallel. The exe. time over the entire loop needs to be increased.

Any hint?

Can i use Cgesv then?

- cjest
**Posts:**12**Joined:**Wed Feb 10, 2010 3:01 pm

### Re:sgesv in 1.1 is slow...

Boxed Cylon wrote:One test we could run is to have you compile the attached code on your Ubuntu 64-bit installation, and have me run it on my machine (who knows?) At least it will determine whether the problem lies in the compiler environment or the runtime environment, maybe.

This code is my debugging/timing routine - it just calculates X=A\B on the GPU. Call it from matlab with [X]= gpu_sgesv(A,B ) ;

@Boxed Cylon

I ran your mex file. Here are the results of an example run:

- Code: Select all
`>> A = rand(2048,2048,'single');`

>> B = rand(2048,64,'single');

>> tic; A\B; toc;

Elapsed time is 0.932283 seconds.

>> tic; [x] = gpu_sgesv(A,B); toc;

Initializing CULA...

$$$$$$$$$$ 0.334 s

X-top = -9.435948e-02 3.882708e-01 -6.582417e-02

X-bottom = 4.102392e-01 4.056736e-01 -2.996187e-01

Elapsed time is 0.706791 seconds.

>> tic; [x] = gpu_sgesv(A,B); toc;

Initializing CULA...

$$$$$$$$$$ 0.403 s

X-top = -9.435948e-02 3.882708e-01 -6.582417e-02

X-bottom = 4.102392e-01 4.056736e-01 -2.996187e-01

Elapsed time is 0.429575 seconds.

The first run took 0.71 seconds, while the second took 0.43. The difference between the two is the init time which I've measured at around 0.3-0.4 seconds which these results back up.

With these results, I think it's fair to say that the problem isn't in your mex file. I'm attaching your compiled mex to this message (note that I've called it culaGesv2 to compare it against our homegrown mex file). Let me know how you make out with this.

Dan [file name=culaGesv2.zip size=3722]http://www.culatools.com/images/fbfiles/files/culaGesv2.zip[/file]

- dan
- Administrator
**Posts:**61**Joined:**Thu Jul 23, 2009 2:29 pm

### Re:sgesv in 1.1 is slow...

Humm...a bit of answer develops:

My original routine, running your set of matlab lines

Running the same, using your compiled mex:

That is all fabulous, and I think consistent with what I had before!

My test routine was doing something like this, the only difference being 64 -> 5000:

So the question is not so much why sgesv is slow, as why is it slow when the 2nd dimension of B is large. In my own application, this dimension is about 1000. This result seems odd - presumably all the computing is in setting up the inverse; I would have thought the timing would be rather independent of the 2nd dimension of B.

Its nice to finally have a bit of a handle on the issue!

Just to complete the story, the graph of T_cpu/T_gpu for the case where the second dimension of B is 64 is:

Incidentally, X-top and X-bottom should be equal to X(1,1:3) and X(end,(end-2:end)) if X=A\B (within numerical error of single precision, etc.).

My original routine, running your set of matlab lines

- Code: Select all
`A = rand(2048,2048,'single');`

B = rand(2048,64,'single');

tic; A\B; toc;

Elapsed time is 0.656199 seconds.

tic; [x] = gpu_sgesv(A,B); toc;

Initializing CULA...

$$$$$$$$$$ 0.173 s

X-top = -9.435948e-02 3.882708e-01 -6.582417e-02

X-bottom = 4.102392e-01 4.056736e-01 -2.996187e-01

Elapsed time is 0.593104 seconds.

tic; [x] = gpu_sgesv(A,B); toc;

Initializing CULA...

$$$$$$$$$$ 0.179 s

X-top = -9.435948e-02 3.882708e-01 -6.582417e-02

X-bottom = 4.102392e-01 4.056736e-01 -2.996187e-01

Elapsed time is 0.196459 seconds.

Running the same, using your compiled mex:

- Code: Select all
`tic; A\B; toc;`

Elapsed time is 0.664783 seconds.

>> tic; [x] = culaGesv2(A,B); toc;

Initializing CULA...

$$$$$$$$$$ 0.196 s

X-top = 9.946308e-01 2.093953e+00 -6.212948e-01

X-bottom = -3.948123e-01 -1.171244e+00 1.621502e+00

Elapsed time is 0.211650 seconds.

>> tic; [x] = culaGesv2(A,B); toc;

Initializing CULA...

$$$$$$$$$$ 0.189 s

X-top = 9.946308e-01 2.093953e+00 -6.212948e-01

X-bottom = -3.948123e-01 -1.171244e+00 1.621502e+00

Elapsed time is 0.201372 seconds.

That is all fabulous, and I think consistent with what I had before!

My test routine was doing something like this, the only difference being 64 -> 5000:

- Code: Select all
`A = rand(2048,2048,'single');`

B = rand(2048,5000,'single');

tic; A\B; toc;

Elapsed time is 3.444536 seconds.

tic; A\B; toc;

Elapsed time is 3.426751 seconds.

tic; [x] = culaGesv2(A,B); toc;

Initializing CULA...

$$$$$$$$$$ 5.432 s

X-top = 3.081349e-01 1.812777e-01 3.940895e-01

X-bottom = 1.791551e+00 -8.022842e-01 3.020534e-02

Elapsed time is 5.505551 seconds.

tic; [x] = culaGesv2(A,B); toc;

Initializing CULA...

$$$$$$$$$$ 5.426 s

X-top = 3.081349e-01 1.812777e-01 3.940895e-01

X-bottom = 1.791551e+00 -8.022842e-01 3.020534e-02

Elapsed time is 5.550902 seconds.

So the question is not so much why sgesv is slow, as why is it slow when the 2nd dimension of B is large. In my own application, this dimension is about 1000. This result seems odd - presumably all the computing is in setting up the inverse; I would have thought the timing would be rather independent of the 2nd dimension of B.

Its nice to finally have a bit of a handle on the issue!

Just to complete the story, the graph of T_cpu/T_gpu for the case where the second dimension of B is 64 is:

Incidentally, X-top and X-bottom should be equal to X(1,1:3) and X(end,(end-2:end)) if X=A\B (within numerical error of single precision, etc.).

- Boxed Cylon
**Posts:**48**Joined:**Fri Oct 16, 2009 8:57 pm

### Who is online

Users browsing this forum: No registered users and 1 guest