PostPosted: Thu Oct 31, 2013 8:50 pm
by mark_joshi
I have a complex derivatives pricing application using Monte carlo on the GPU. Almost
everything is done on the GPU so there is very little data transfer between CPU and GPU.

The main bottleneck is


which is called about 200 times. A typical call has 320k rows and 10 columns.

On the K20, it spends about 4.4 secs in this routine in total. On the QUADRO FX 5800, it's about 3.6 secs.

Is this behaviour to be expected?

PostPosted: Fri Nov 01, 2013 8:56 am
by john
Our GELS isn't specifically optimized for the extremely rectangular cases, like the one you have here.

PostPosted: Sun Nov 03, 2013 4:04 pm
by mark_joshi
Do you have any suggestions on how to proceed?