## Least Squares with CULA

14 posts
• Page

**1**of**1**### Least Squares with CULA

Hi All,

I am looking for a least squares solver for our CUDA application and CULA looks quite promising. However when I trying to make use of GELS functionality, i get output that is not sensible. I would like to know what i am doing wrong:

Let's say I have a input data as 6 pairs of X and Y values that roughly correspond to a 2nd degree polynomial. I am forming an "A" matrix out of X values in the following way

Matrix is stored in column-major order.

Dependent Variables form vector B with 6 components.

I pass the following arguments to the CULA function

where M=6 (number of rows), N = 3 (number of columns), NRHS = 1(number of right hand sides) A is the matrix above, Leading dim of A = 6 and Leading dimension of B = 6

If i read the documentation properly, the results should be returned in memory originally allocated for B. Status is culaNoError but the results are rubbish.

Where is my mistake?

Thanks!

Vlad

Operating system: Windows Server 2008 R2

CUDA version installed: 4.0

GPU model: Tesla C2070

I am looking for a least squares solver for our CUDA application and CULA looks quite promising. However when I trying to make use of GELS functionality, i get output that is not sensible. I would like to know what i am doing wrong:

Let's say I have a input data as 6 pairs of X and Y values that roughly correspond to a 2nd degree polynomial. I am forming an "A" matrix out of X values in the following way

- Code: Select all
`1 X1 X1*X1`

1 X2 X2*X2

1 X3 X3*X3

1 X4 X4*X4

1 X5 X4*X5

1 X6 X4*X6

Matrix is stored in column-major order.

Dependent Variables form vector B with 6 components.

I pass the following arguments to the CULA function

- Code: Select all
`status = culaSgels('N', M, N, NRHS, A, LDA, B, LDB);`

where M=6 (number of rows), N = 3 (number of columns), NRHS = 1(number of right hand sides) A is the matrix above, Leading dim of A = 6 and Leading dimension of B = 6

If i read the documentation properly, the results should be returned in memory originally allocated for B. Status is culaNoError but the results are rubbish.

Where is my mistake?

Thanks!

Vlad

Operating system: Windows Server 2008 R2

CUDA version installed: 4.0

GPU model: Tesla C2070

- ambushed
**Posts:**9**Joined:**Thu Oct 13, 2011 5:37 am

### Re: Least Squares with CULA

Hi, can you post a complete program that loads and runs this data in the exact way you are using it in your program?

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: Least Squares with CULA

Thanks for your reply!

I expect to get back exactly the values that I have provided in the input h_Y since there is no error term in the data.

Vlad

- Code: Select all

const int VECTOR_SIZE = 6;

const int NUM_BETAS = 3;

h_A = (float*)malloc(sizeof(float)*VECTOR_SIZE * NUM_BETAS);

h_Y = (float*)malloc(VECTOR_SIZE*sizeof(float));

for(int i=1;i<=VECTOR_SIZE;i++)

{

int idx = i-1;

h_A[idx] = 1;

h_A[idx+VECTOR_SIZE] = i;

h_A[idx+2*VECTOR_SIZE] = i*i;

h_Y[idx] = 2*i*i-4*i + 2;

}

int NRHS = 1;

int LDA = VECTOR_SIZE;

int LDB = LDA;

status = culaSgels('N', VECTOR_SIZE, NUM_BETAS, NRHS, h_A, LDA, h_Y, LDB);

checkStatus(status);

I expect to get back exactly the values that I have provided in the input h_Y since there is no error term in the data.

Vlad

- ambushed
**Posts:**9**Joined:**Thu Oct 13, 2011 5:37 am

### Re: Least Squares with CULA

The values returned should be different from the input, if only because the X vector is only 3 long, where the B vector is 6. So they must be different.

I get the following:

I get the following:

- Code: Select all
`A =`

1 1 1

1 2 4

1 3 9

1 4 16

1 5 25

1 6 36

B= (input)

0

2

8

18

32

50

X= (output

2

-4

2

(then 3 unused entries)

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: Least Squares with CULA

John

My bad, the output is indeed the polynomial coefficients that one would expect out of Least Squares. Somehow i thought the solution vector would be the vector of fitted Y values.

I hope you dont mind me squeezing another question. I am running LS now with a device function and it works great. There is one issue still remaining. When i make a run with 2000 rows, it gives me culaRuntimeError "Invalid configuration argument". With 1000 rows it works just fine. What could be the problem?

Thanks so much for your help!

Vlad

My bad, the output is indeed the polynomial coefficients that one would expect out of Least Squares. Somehow i thought the solution vector would be the vector of fitted Y values.

I hope you dont mind me squeezing another question. I am running LS now with a device function and it works great. There is one issue still remaining. When i make a run with 2000 rows, it gives me culaRuntimeError "Invalid configuration argument". With 1000 rows it works just fine. What could be the problem?

Thanks so much for your help!

Vlad

- ambushed
**Posts:**9**Joined:**Thu Oct 13, 2011 5:37 am

### Re: Least Squares with CULA

Yeah, just keep in mind that CULA is a linear algebra library, so curve fitting isn't something we cover.

I don't quite understand your other question, but if you were to post a code example again, then it would probably be easy enough to spot.

I don't quite understand your other question, but if you were to post a code example again, then it would probably be easy enough to spot.

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: Least Squares with CULA

For the second question the source code is essentially the same, the only difference being the number of rows in the A matrix. The original example has 6 rows, while the realistic problem of my domain has around 2000 rows (and this is where i get the error), i.e.

Least Squares is, imho, very much a linear algebra problem. It boils down to matrix multiplication, inversion and transposing..

- Code: Select all
`const int VECTOR_SIZE = 2000`

Least Squares is, imho, very much a linear algebra problem. It boils down to matrix multiplication, inversion and transposing..

- ambushed
**Posts:**9**Joined:**Thu Oct 13, 2011 5:37 am

### Re: Least Squares with CULA

I receive culaNoError when I increase VECTOR_SIZE to 2000. You will need to specify further details, such as GPU type.

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: Least Squares with CULA

Operating system: Windows Server 2008 R2

CUDA version installed: 4.0

GPU model: Tesla C2070

CUDA version installed: 4.0

GPU model: Tesla C2070

- ambushed
**Posts:**9**Joined:**Thu Oct 13, 2011 5:37 am

### Re: Least Squares with CULA

We've done fairly exhaustive tests using the example code you set (with VECTOR_SIZE=2000) and haven't turned up any problems. Just to doublecheck, when you run the example code alone in an executable, you receive this error?

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: Least Squares with CULA

Two Visual Studio projects, i.e. I run example code from a boost unit-test by making a function call to a statically linked library. Both 64 bit. The same problem appears when i increase the column count from 3 to 4.

- ambushed
**Posts:**9**Joined:**Thu Oct 13, 2011 5:37 am

### Re: Least Squares with CULA

Thank you very much for your help. The issue is resolved and is rooted in human error. Both culaDeviceSgels and culaSgels work as expected 2000 matrix rows and higher.

- ambushed
**Posts:**9**Joined:**Thu Oct 13, 2011 5:37 am

### Re: Least Squares with CULA

Glad to hear it! In case other users have the same problem, would you mind explaining what went wrong?

Thanks!

Thanks!

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: Least Squares with CULA

The device function didn't work because my kernel didn't initialize device memory correctly, i was trying to create more than 1024 threads per block, very carelessly. Every row was initialized in separate thread. This explains why for 1000 rows it worked and for 2000 it didn't. As for host interface, i must have been drunk, it should have worked and it works fine.

We are working in a shared environment where several people are contending for the GPUs so it is easy to get confused.. Thanks for help!@

We are working in a shared environment where several people are contending for the GPUs so it is easy to get confused.. Thanks for help!@

- ambushed
**Posts:**9**Joined:**Thu Oct 13, 2011 5:37 am

14 posts
• Page

**1**of**1**### Who is online

Users browsing this forum: No registered users and 1 guest