General discussion for CULA.

Postby headfirst » Mon Oct 31, 2011 12:51 am

Hi, i have found a very interesting thing when i compare CULA standard function with my own CUDA function. i use "culaSgemm", my own CUDA and C matrix multiplication to do small size operation. Then i compared their time consumption and found that "culaSgemm" and C is 0,but my CUDA matrix multiplication is 0.025 s.
How do it happen? When i reference the CULA Programmers Guide, i found that CULA uses the internal memory mechanism. But what is the internal memory mechanism? When i put that in the google? i just get nothing.
I am very interesting in the internal memory mechanism and I wish someone can tell me some thing about that.
Thank you advance.
