Page 1 of 1

Band Matrix Solver in R10

PostPosted: Wed Feb 02, 2011 10:26 am
by aeschoen
I cannot find a solver associated with ...Sgbtrf. xTBTRF is mentioned in the announcement but it isn't in the manual or the header file.

Re: Band Matrix Solver in R10

PostPosted: Wed Feb 02, 2011 12:39 pm
by dan
Hi aeschoen,

The function you're interested in is in the premium version of CULA; you can see a list of the extra functions in the premium version here: http://www.culatools.com/html_guide/#cu ... -functions

Dan

Re: Band Matrix Solver in R10

PostPosted: Thu Feb 03, 2011 9:11 am
by aeschoen
I am using the premium version. The factorization routine; ...Sgbtrf is there, but I don't find the associated solver. I use the device variants. Perhaps I'm not looking for the right one. I assume it ends with ..btrs. Nothing in the header file or documentation matches this pattern.

Re: Band Matrix Solver in R10

PostPosted: Thu Feb 03, 2011 9:36 am
by dan
You're right, we don't currently offer an implementation of the solver you're referring to. The reason we don't offer it is that the after-factorization solver would require a lot of work to implement and would give little payoff over a CPU solver. We'd recommend that after factorizing on the GPU you simply download the data to the host and run the final step on the CPU and you should still see a significant speedup over running both steps on the CPU.

Dan

Re: Band Matrix Solver in R10

PostPosted: Sun Jul 06, 2014 11:57 am
by stumarcus
Were you able to write some code that used CULADense's pbtrf on the GPU, pulled the Cholesky fatorization back to the CPU, and then used that factorization in LAPACK's pbtrs on the CPU to solve the linear system of equations?

Re: Band Matrix Solver in R10

PostPosted: Tue Nov 11, 2014 2:12 am
by sabahat
I realized that I've not considered something. The program that printed the matrices of the SVD was truncating the decimals after the comma (I did not write the program), so obviously the result was not exact when the original matrix was reconstructed via A = U * S * VT.