Page 1 of 1

matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Mon Aug 22, 2011 4:00 pm
by chester1248
No one seems to have noticed (or I'm doing something wrong?):
The CULA link interface for Matlab
(as discussed at http://www.culatools.com/blog/category/interfaces/)
fails to engage the GPU when calling the MATLAB svd command.

Looking at the log the interface dumps shows that that matlab svd command is now calling LAPACK sgesdd, instead of sgesvd. Looking at the supported LAPACK svd functions in CULA, it seems that sgesdd is not one of the supported/implemented routines, so it makese sense that the link interface just falls back to the Intel MLK BLAS/LAPACK for sgesdd (on the CPU only -- no speedup using a GPU for MATLAB svd, for now).

It seems that sgesdd is not even supported in CULA premium, which is especially disappointing.

Has anyone else noticed this problem? Is anyone using CULA link with MATLAB?

Given that gsesvd is slower than sgedd, I'm also noticing that the user defined MEX routines (also mentioned on the above interfaces blog page) are not faster than the MATLAB svd command. That is, culaSvd(A) mex/cula routine is not faster than the default MATLAB svd(A) [with link interface not used).

So, there is no version of CULA (link iterface or user mex call to CULA routines) that I know of that is faster than Matlab's (version 2011a) own built in svd (running on 8 cores on dual xeon). Even with high end GPU cards (tried both TELSA 2050 and GTX 580). Is this really true, or am I doing something wrong?

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Mon Aug 22, 2011 6:45 pm
by kyle
Are you calling a full SVD, [U,S,V] = svd(A), or a partial one, [~,S,~] = svd(A)? Also, what size are you calling?

I don't have access to MATLAB at the moment, but I know off the top of my head that in MATLAB 2011a, a full SVD will call xGESVD under-the-hood.

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Mon Aug 22, 2011 9:18 pm
by chester1248
Thanks for the quick feedback -- but I do think sgesdd is being called in 2011a:

I don't have access to my Windows7 x64 machine running Matlab 2011a, but I just checked and I get the same result with my Mac OSX (Snow Leopard) machine at home, also running Matlab 2011a. So here are some details (which look identical to what I get on Windows):

Using the cula link interface (using latest CULA R12 Free, for now) when start up matlab:

>> randn('seed',1);A=randn(2000,2000,'single');
>> tic;svd(A);toc
cpu_id: x86 Family 6 Model 5 Stepping 5, GenuineIntel
libmwlapack: trying environment...
libmwlapack: loading libcula_link.dylib
libmwlapack: loaded libcula_link.dylib@0x1305950c0
libmwlapack: libcula_link.dylib is not a compatibility layer.
Elapsed time is 4.572858 seconds.
>> tic;svd(A);toc
Elapsed time is 2.981600 seconds.
>> tic;[u,s,v]=svd(A);toc
Elapsed time is 5.100251 seconds.
>> tic;[u,s,v]=svd(A);toc
Elapsed time is 5.145978 seconds.

The log dumped by CULA link interface indicates that svd(A) with no nargouts does call sgesvd, but that [u,s,v]=svd(A) calls sgesdd instead (this might be something Mathworks very recently changed -- I think sgesdd is suppose to be faster than sgesvd in many cases, so they probably got wise -- and so CULA needs to provide a sgesvdd to keep up?:

cula info: sgesvd (N, N, 2000, 2000, 0x13407b000, 2000, 0x107553000, 0x0, 2000, 0x0, 2000)
cula info: issuing to CPU (work query)
cula info: CPU library is lapackcpu.dylib
cula info: work query returned 70000
cula info: done
cula info: sgesvd (N, N, 2000, 2000, 0x13407b000, 2000, 0x107553000, 0x0, 2000, 0x0, 2000)
cula info: issuing to GPU (over threshold)
cula info: done
cula info: sgesvd (N, N, 2000, 2000, 0x13407b000, 2000, 0x14337e000, 0x0, 2000, 0x0, 2000)
cula info: issuing to CPU (work query)
cula info: work query returned 70000
cula info: done
cula info: sgesvd (N, N, 2000, 2000, 0x13407b000, 2000, 0x14337e000, 0x0, 2000, 0x0, 2000)
cula info: issuing to GPU (over threshold)
cula info: done
cula info: sgesdd ()
cula info: issuing to CPU (no GPU function available)
cula info: work query returned 12014000
cula info: work query returned 0
cula info: done
cula info: sgesdd ()
cula info: issuing to CPU (no GPU function available)
cula info: done
cula info: sgesdd ()
cula info: issuing to CPU (no GPU function available)
cula info: work query returned 12014000
cula info: work query returned 0
cula info: done
cula info: sgesdd ()
cula info: issuing to CPU (no GPU function available)
cula info: done

BTW, without cula link, matlab gives these times:

>> randn('seed',1); A=randn(2000,2000,'single');
>> tic;svd(A);toc
Elapsed time is 3.238710 seconds.
>> tic;svd(A);toc
Elapsed time is 3.228221 seconds.
>> tic;[u,s,v]=svd(A);toc
Elapsed time is 5.330816 seconds.
>> tic;[u,s,v]=svd(A);toc
Elapsed time is 5.363748 seconds.

So, the GPU (330M, for MBP 2010) [for nargouts=0 call to svd, the only case which actually runs CULA GPU code] gets 2.98 secs, whereas CPU (duo core -- both go to 100%) gets 3.2-- they are about the same speed, which I think is about right, given the peak GFLOPS for the 330M. [On my Windows 8-core dual Xeon machine, the CULA svd (with nargouts=0) running on GPU (most recently, I was testing a GTX 560ti right now -- for which CUBLAS SGEMM gets me about 500 GFlops) is about 3x faster than matlab built-in svd]

This same result occurs for other matrix sizes, on both Windows 7 x64 and OSX -- so I think it is the case that Matlab 2011a *does* now call sgesdd when outputs ARE requested.

Hopefully sgesdd is something that CULA can add in the next/soon release -- I would love a good excuse to buy the CULA Premium, and then for dgesdd as well ... [and then hopefully in the near future see CULA get some spMv routines as well ... :)]

--Dennis

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Mon Aug 22, 2011 9:43 pm
by chester1248
BTW, I also just now tried Kyle's [~,s,~]=svd(A) example (I never realized the convenient "~" output syntax existed in Matlab ...), but the CULA log indicates that in that case Matlab's svd also calls sgesdd().

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Tue Aug 23, 2011 5:11 am
by kyle
Hmm, this must be something new in 2011a.

The "SVDD" is simply the divide-and-conquer method. The great majority of the code is similar to a "normal SVD"; that is the algorithmic flow is the same:

1) Bidiagonalization
2) Orthogonalization
3) Iterative singular value extraction + singular vector (if requested)

In "SVDD" and the 1st and 2nd steps are identical to the "normal SVD". However, more parallelism can be extracted from the 3rd step as, depending on the data, the problem can be broken into independent sub-problems.

So, the point being is that we have the majority of the work done. We'll look into the work required for implementing "divide-and-conquer" portion of step 3.

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Tue Aug 23, 2011 5:14 am
by kyle
Also, the scaling of SVD is fairly poor until larger (over 4k) sizes are reached. Below this, memory bound routines like matrix-vector products will dominate the entire runtime. Speed-ups at largest sizes are obtained because the compute bound routines like matrix-matrix products begin to dominate the total runtime.

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Tue Aug 23, 2011 6:09 am
by john
It's an interesting design decision from Mathworks, since they have made a change which will result in the users observing different behavior (and possibly different quality of result) from version to version. I'd have preferred a flag or a different routine name, myself, but I guess the routine is called "SVD" not "SVD via GESVD." Interesting find, thanks for writing in.

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Wed Nov 02, 2011 5:14 am
by yujif
Hi, Kyle, I am wondering if CULA team has any future plan to give SVDS function as Matlab, which can perform SVD but select number of singular values? So far, I do not find any, because it is really important for us to use this one instead of complete SVD.
Thank you in advance.
---JiFeng

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Wed Nov 02, 2011 6:02 am
by john
To my knowledge, there is no LAPACK equivalent of this. My testing, at least in Matlab, is that it's faster to run the full SVD and to then cut down the U,S,V matrices to the number of values that you want than it is to run the SVDS command.

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Wed Nov 02, 2011 6:19 am
by yujif
Hi, john, thank you for you concerning and quick reply. Probably, you are talking about SVD for singular value only or small matrix size (say, (100,100)), yes, in this case, full SVD is quicker than SVDS. But if you also need U, V and matrix size is a little large, than full SVD becomes much slower. That's also why we need SVDS indeed. What do you think?

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Wed Nov 02, 2011 6:50 am
by kyle
SVDS isn't a LAPACK routine under the hood. I believe it calls an iterative routine in a package called PROPACK. On the other hand, SVD calls the LAPACK routine xGESVD or xGESVDD.

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Thu Nov 03, 2011 10:28 pm
by yujif
Yeah, I take a look at the SVDS subroutine, it calls a function EIGS (similar to EIG) from library ARPACK instead of LAPACK, and I will check in more details. Thanks for replies.

Re: matlab svd (cula_link) fails (sgesdd vs sgesvd)

PostPosted: Fri Nov 04, 2011 5:31 am
by john
The ARPACK library uses a few LAPACK and BLAS3 calls under the hood. It might (or might not) gain speedups if you try the CULA Link Interface.