PostPosted: Sat Mar 09, 2013 8:14 pm
by gvarela13
Hi, just to confirm if I got it right while I was reading the forum:

1- I understood that "culaDeviceSgesvd" cannot be called from a __device__ function to work within each thread inside a kernel.

Because of these, I cannot use CULA R.16b to calculate the SVD() of 128*128*70 # 3x3 matrices.

Suggestions & comments.

Regards! :)

PostPosted: Wed Mar 13, 2013 8:41 am
by john
That's correct - we have no _device_ calls in our lib. We do have device-pointer calls that are called from the host. But our algorithms are designed for larger matrices - tuning for a large batch of 3x3 matrices requires completely different code. We could build this as custom code for you, but I'd advise you that based on prior experiments that the solution time would likely be break-even against the CPU.