This guide documents CULA’s Programming Interface. CULA™ is an implementation of the Linear Algebra PACKage (LAPACK) interface for CUDA™-enabled NVIDIA® graphics processing units (GPUs). It is a companion document to the CULA Programmer’s Guide.
This guide is split into the following sections:
- Framework Functions - These section documents functions are used in initializing CULA, shutting it down, and querying information about errors.
- Linear Algebra Routines - This section documents the Linear Algebra functions that CULA provides
- Differences Between CULA and LAPACK - This section lists some of the ways in which CULA differs from LAPACK.
- Common Errors - This section lists some of the common errors that apply to usage of several functions.
This work has been made possible by the NASA Small Business Innovation Research (SBIR) program. We recognize NVIDIA for their support.
CULA is built on NVIDIA CUDA 2.3 and NVIDIA CUBLAS.
CULA uses the Intel® Math Kernel Library (MKL) internally. For more information, please see the MKL product page at http://www.intel.com/software/products/mkl.
The original version of LAPACK from which CULA implements a similar interface can be obtained at http://www.netlib.org/lapack. Much of this Reference Manual is based upon the documentation released with netlib.
This section documents the functions that are used in initializing CULA, shutting it down, and querying information about errors.
Description
Initializes CULA Must be called before using any other function. Some functions have an exception to this rule: culaGetDeviceCount, culaSelectDevice
Returns
culaNoError on a successful initialization or the culaStatus enum that specifies an error
Description
Returns the last status code returned from a CULA function
Returns
The last CULA status code
Description
Associates a culaStatus enum with a readable error string
Parameters
Returns
A string that corresponds with the specified culaStatus enum
Description
This function is used to provide extended functionality that LAPACK’s info parameter typically provides
Returns
Extended information about the last error or zero if it is unavailable
Description
Reports the number of GPU devices Can be called before culaInitialize
Parameters
Returns
culaNoError on sucess, culaArgumentError on invalid pointer
Description
Selects a device with which CULA will operate To bind without error, this function must be called before culaInitialize
Parameters
Returns
culaNoError on sucess, culaArgumentError on an invalid device id, culaRuntimeError if the running thread has already been bound to a GPU device
Description
Reports the id of the GPU device executing CULA
Parameters
Returns
culaNoError on sucess, culaArgumentError on invalid pointer
Description
Prints information to a buffer about a specified device
Parameters
Returns
culaNoError on sucess, culaArgumentError on invalid buf pointer, invalid device id, or invalid bufsize
Description
Calculates a pitch that is optimal for CULA when using the device interface
Parameters
Returns
culaNoError on successful allocation, culaInsufficientMemory on failure
Description
Allocates memory on the device in a pitch that is optimal for CULA
Parameters
Returns
culaNoError on successful allocation, culaInsufficientMemory on failure
Description
Frees memory that has been allocated with culaDeviceMalloc
Parameters
Returns
culaNoError on successful free, culaArgumentError on failure
This section documents the Linear Algebra functions that CULA provides. For each function, a high-level description of that function is given, followed by a listing of each of the function’s parameters. Where applicable, differences from LAPACK will also be listed.
CULA provides 4 data types with which you can perform computations, with one function for each data type. Rather than document each routine separately, this guide includes only one reference for each of the functions, and instead documents the differences between each of these functions (if any) in the generic function description.
Most functions only take pointers to one data type that applies to the S, D, C, or Z variant of the function in question. For the majority of functions, these parameters will be denoted as S/D/C/Z. For those functions that have parameters that differ from their variant, the difference will be noted. For example, for a ‘C/Z’ function that has real (non-complex) parameters, these parameters will be denoted as S/D (although others may be denoted as S/D/C/Z). For an ‘S’ function that has complex parameters, these parameters will be denoted as (C/Z).
| Symbol | Host Interface Type | Device Interface Type |
|---|---|---|
| S | culaFloat | culaDeviceFloat |
| D | culaDouble | culaDeviceDouble |
| C | culaFloatComplex | culaDeviceFloatComplex |
| Z | culaDoubleComplex | culaDeviceDoubleComplex |
For Host interface functions, all matrices/vectors are submitted as pointers to host data. For the Device interface, all matrices/vectors are submitted as pointers to GPU data, as allocated by either the CUDA toolkit or via culaDeviceMalloc (available in CULA premium). All Device interface routines have the word Device as part of the function name; all other functions are Host interface.
There are some pointer arguments for which this not the case; these will often be output scalar arguments rather than matrices or vectors. These are denoted as “(type) Pointer, always host” etc.
All LAPACK matrices are specified as a pointer and a “leading dimension” parameter. The leading dimension describes the allocated size of the matrix, which may be equal to or larger than the actual matrix height. Thus if a matrix input is described as size “(LDA,N)” it simply means that the storage for the matrix is at least LDA x N in size. The section of that array that contains valid data will be described by other parameters, often M and N. There will typically be a note differentiating between these.
CULA Routines
The BDSQR functionality is implemented by the following CULA routines:
Description
BDSQR computes the singular values and, optionally, the right and/or left singular vectors from the singular value decomposition (SVD) of a real N-by-N (upper or lower) bidiagonal matrix B using the implicit zero-shift QR algorithm. The SVD of B has the form
B = Q * S * PT
where S is the diagonal matrix of singular values, Q is an orthogonal matrix of left singular vectors, and P is an orthogonal matrix of right singular vectors. If left singular vectors are requested, this subroutine actually returns U*Q instead of Q, and, if right singular vectors are requested, this subroutine returns PT * VT instead of PT, for given real input matrices U and VT. When U and VT are the orthogonal/unitary matrices that reduce a general matrix A to bidiagonal form: A = U * B * VT, as computed by GEBRD, then
A = (U * Q) * S * (PT * VT)
is the SVD of A. Optionally, the subroutine may also compute QT * C for a given real input matrix C.
See “Computing Small Singular Values of Bidiagonal Matrices With Guaranteed High Relative Accuracy,” by J. Demmel and W. Kahan, LAPACK Working Note #3 (or SIAM J. Sci. Statist. Comput. vol. 11, no. 5, pp. 873-912, Sept 1990) and “Accurate singular values and differential qd algorithms,” by B. Parlett and V. Fernando, Technical Report CPAM-554, Mathematics Department, University of California at Berkeley, July 1992 for a detailed description of the algorithm.
Parameters
uplo
- Type: char
- Direction: Input
= ‘U’: B is upper bidiagonal;
= ‘L’: B is lower bidiagonal.
n
- Type: int
- Direction: Input
The order of the matrix B. N >= 0.
ncvt
- Type: int
- Direction: Input
The number of columns of the matrix VT. NCVT >= 0.
nru
- Type: int
- Direction: Input
The number of rows of the matrix U. NRU >= 0.
ncc
- Type: int
- Direction: Input
The number of columns of the matrix C. NCC >= 0. NCC > 0 is currently not supported and will return culaFeatureNotImplemented.
d
- Type: S/D Pointer
- Direction: Input/Output
- Dimension: (N)
On entry, the n diagonal elements of the bidiagonal matrix B.
On exit, if culaNoError is returned, the singular values of B in decreasing order.
e
- Type: S/D Pointer
- Direction: Input/Output
- Dimension: (N-1)
On entry, the N-1 offdiagonal elements of the bidiagonal matrix B.
On exit, if culaNoError is returned, E is destroyed; if culaDataError is returned, D and E will contain the diagonal and superdiagonal elements of a bidiagonal matrix orthogonally equivalent to the one given as input.
vt
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDVT, NCVT)
On entry, an N-by-NCVT matrix VT.
On exit, VT is overwritten by PT * VT. Not referenced if NCVT = 0.
ldvt
- Type: int
- Direction: Input
The leading dimension of the array VT. LDVT >= max(1,N) if NCVT > 0; LDVT >= 1 if NCVT = 0.
u
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDU, N)
On entry, an NRU-by-N matrix U.
On exit, U is overwritten by U * Q.
Not referenced if NRU = 0.
ldu
- Type: int
- Direction: Input
The leading dimension of the array U. LDU >= max(1,NRU).
c
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDC, NCC)
On entry, an N-by-NCC matrix C.
On exit, C is overwritten by QT * C.
Not referenced if NCC = 0.
ldc
- Type: int
- Direction: Input
The leading dimension of the array C. LDC >= max(1,N) if NCC > 0; LDC >=1 if NCC = 0.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GEBRD functionality is implemented by the following CULA routines:
Description
GEBRD reduces a general real M-by-N matrix A to upper or lower bidiagonal form B by an orthogonal/unitary transformation: QT * A * P = B.
If m >= n, B is upper bidiagonal; if m < n, B is lower bidiagonal.
Parameters
m
- Type: int
- Direction: Input
The number of rows in the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns in the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N general matrix to be reduced.
On exit, if m >= n, the diagonal and the first superdiagonal are overwritten with the upper bidiagonal matrix B; the elements below the diagonal, with the array TAUQ, represent the orthogonal/unitary matrix Q as a product of elementary reflectors, and the elements above the first superdiagonal, with the array TAUP, represent the orthogonal/unitary matrix P as a product of elementary reflectors;
On exit, if m < n, the diagonal and the first subdiagonal are overwritten with the lower bidiagonal matrix B; the elements below the first subdiagonal, with the array TAUQ, represent the orthogonal/unitary matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal/unitary matrix P as a product of elementary reflectors. See Further Details.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
d
- Type: S/D Pointer
- Direction: Output
- Dimension: (min(M,N))
The diagonal elements of the bidiagonal matrix B: D(i) = A(i,i).
e
- Type: S/D Pointer
- Direction: Output
- Dimension: (min(M,N)-1)
The off-diagonal elements of the bidiagonal matrix B: if m >= n, E(i) = A(i,i+1) for i = 1,2,...,n-1; if m < n, E(i) = A(i+1,i) for i = 1,2,...,m-1.
tauq
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (min(M,N))
The scalar factors of the elementary reflectors which represent the orthogonal/unitary matrix Q. See Further Details.
taup
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (min(M,N))
The scalar factors of the elementary reflectors which represent the orthogonal/unitary matrix P. See Further Details.
Further Details
The matrices Q and P are represented as products of elementary reflectors:
If m >= n,
Q = H(1) H(2) . . . H(n) and P = G(1) G(2) . . . G(n-1)
Each H(i) and G(i) has the form:
H(i) = I - tauq * v * v’ and G(i) = I - taup * u * u’
where tauq and taup are real scalars, and v and u are real vectors; v(1:i-1) = 0, v(i) = 1, and v(i+1:m) is stored on exit in A(i+1:m,i); u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in A(i,i+2:n); tauq is stored in TAUQ(i) and taup in TAUP(i).
If m < n,
Q = H(1) H(2) . . . H(m-1) and P = G(1) G(2) . . . G(m)
Each H(i) and G(i) has the form:
H(i) = I - tauq * v * v’ and G(i) = I - taup * u * u’
where tauq and taup are real scalars, and v and u are real vectors; v(1:i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in A(i+2:m,i); u(1:i-1) = 0, u(i) = 1, and u(i+1:n) is stored on exit in A(i,i+1:n); tauq is stored in TAUQ(i) and taup in TAUP(i).
The contents of A on exit are illustrated by the following examples:
m = 6 and n = 5 (m > n): m = 5 and n = 6 (m < n):
( d e u1 u1 u1 ) ( d u1 u1 u1 u1 u1 )
( v1 d e u2 u2 ) ( e d u2 u2 u2 u2 )
( v1 v2 d e u3 ) ( v1 e d u3 u3 u3 )
( v1 v2 v3 d e ) ( v1 v2 e d u4 u4 )
( v1 v2 v3 v4 d ) ( v1 v2 v3 e d u5 )
( v1 v2 v3 v4 v5 )
where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of the vector defining H(i), and ui an element of the vector defining G(i).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GEEV functionality is implemented by the following CULA routines:
Description
GEEV computes for an N-by-N real nonsymmetric matrix A, the eigenvalues and, optionally, the left and/or right eigenvectors.
The right eigenvector v(j) of A satisfies
A * v(j) = lambda(j) * v(j)
where lambda(j) is its eigenvalue.
The left eigenvector u(j) of A satisfies
u(j)H * A = lambda(j) * u(j)H
where u(j)H denotes the conjugate transpose of u(j).
The computed eigenvectors are normalized to have Euclidean norm equal to 1 and largest component real.
Parameters
jobvl
- Type: char
- Direction: Input
= ‘N’: left eigenvectors of A are not computed;
= ‘V’: left eigenvectors of A are computed.
jobvr
- Type: char
- Direction: Input
= ‘N’: right eigenvectors of A are not computed;
= ‘V’: right eigenvectors of A are computed.
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the N-by-N matrix A.
On exit, A has been overwritten.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
wr (Real variants (S/D) only)
- Type: S/D Pointer
- Direction: Output
- Dimension: (N)
wi (Real variants (S/D) only)
- Type: S/D Pointer
- Direction: Output
- Dimension: (N)
WR and WI contain the real and imaginary parts, respectively, of the computed eigenvalues. Complex conjugate pairs of eigenvalues appear consecutively with the eigenvalue having the positive imaginary part first.
w (Complex variants (C/Z) only)
- Type: C/Z Pointer
- Direction: Output
- Dimension: (N)
W contains the computed eigenvalues.
Note: the wi and wr parameters only appear in the real (non-complex) variants of geev; in the complex variants these are bundled into one w.
vl
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (LDVL,N)
If JOBVL = ‘V’, the left eigenvectors u(j) are stored one after another in the columns of VL, in the same order as their eigenvalues.
If JOBVL = ‘N’, VL is not referenced.
For real variants (S/D): If the j-th eigenvalue is real, then u(j) = VL(:,j), the j-th column of VL. If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then u(j) = VL(:,j) + i*VL(:,j+1) and u(j+1) = VL(:,j) - i*VL(:,j+1).
For complex variants (C/Z): u(j) = VL(:,j), the j-th column of VL.
ldvl
- Type: int
- Direction: Input
The leading dimension of the array VL. LDVL >= 1; if JOBVL = ‘V’, LDVL >= N.
vr
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (LDVR,N)
If JOBVR = ‘V’, the right eigenvectors v(j) are stored one after another in the columns of VR, in the same order as their eigenvalues.
If JOBVR = ‘N’, VR is not referenced.
If the j-th eigenvalue is real, then v(j) = VR(:,j), the j-th column of VR. If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then v(j) = VR(:,j) + i*VR(:,j+1) and v(j+1) = VR(:,j) - i*VR(:,j+1).
For complex variants (C/Z): v(j) = VR(:,j), the j-th column of VR.
ldvr
- Type: int
- Direction: Input
The leading dimension of the array VR. LDVR >= 1; if JOBVR = ‘V’, LDVR >= N.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GEHRD functionality is implemented by the following CULA routines:
Description
GEHRD reduces a real/complex general matrix A to upper Hessenberg form H by an unitary similarity transformation: Q’ * A * Q = H .
Parameters
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
ilo
- Type: int
- Direction: Input
ihi
- Type: int
- Direction: Input
It is assumed that A is already upper triangular in rows and columns 1:ILO-1 and IHI+1:N. ILO and IHI are normally set by a previous call to GEBAL; otherwise they should be set to 1 and N respectively. See Further Details. 1 <= ILO <= IHI <= N, if N > 0; ILO=1 and IHI=0, if N=0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the N-by-N general matrix to be reduced.
On exit, the upper triangle and the first subdiagonal of A are overwritten with the upper Hessenberg matrix H, and the elements below the first subdiagonal, with the array TAU, represent the unitary matrix Q as a product of elementary reflectors. See Further Details.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
tau
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (N-1)
The scalar factors of the elementary reflectors (see Further Details). Elements 1:ILO-1 and IHI:N-1 of TAU are set to zero.
Further Details
The matrix Q is represented as a product of (ihi-ilo) elementary reflectors
Q = H(ilo) H(ilo+1) . . . H(ihi-1).
Each H(i) has the form
H(i) = I - tau * v * v’
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0, v(i+1) = 1 and v(ihi+1:n) = 0; v(i+2:ihi) is stored on exit in A(i+2:ihi,i), and tau in TAU(i).
The contents of A are illustrated by the following example, with n = 7, ilo = 2 and ihi = 6:
on entry, on exit,
( a a a a a a a ) ( a a h h h h a )
( a a a a a a ) ( a h h h h a )
( a a a a a a ) ( h h h h h h )
( a a a a a a ) ( v2 h h h h h )
( a a a a a a ) ( v2 v3 h h h h )
( a a a a a a ) ( v2 v3 v4 h h h )
( a ) ( a )
where a denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GELQF functionality is implemented by the following CULA routines:
Description
GELQF computes an LQ factorization of a real M-by-N matrix A: A = L * Q.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix A.
On exit, the elements on and below the diagonal of the array contain the m-by-min(m,n) lower trapezoidal matrix L (L is lower triangular if m <= n); the elements above the diagonal, with the array TAU, represent the orthogonal/unitary matrix Q as a product of elementary reflectors (see Further Details).
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
tau
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (min(M,N))
The scalar factors of the elementary reflectors (see Further Details).
Further Details
The matrix Q is represented as a product of elementary reflectors
Q = H(k) . . . H(2) H(1), where k = min(m,n).
Each H(i) has the form
H(i) = I - tau * v * v’
where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:n) is stored on exit in A(i,i+1:n), and tau in TAU(i).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GELS functionality is implemented by the following CULA routines:
Description
GELS solves overdetermined or underdetermined real linear systems involving an M-by-N matrix A, or its transpose, using a QR or LQ factorization of A. It is assumed that A has full rank.
The following options are provided:
Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
Parameters
trans
- Type: char
- Direction: Input
= ‘N’: the linear system involves A;
= ‘T’: the linear system involves AT.
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix A. N >= 0.
nrhs
- Type: int
- Direction: Input
The number of right hand sides, i.e., the number of columns of the matrices B and X. NRHS >=0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix A.
On exit, if M >= N, A is overwritten by details of its QR factorization as returned by GEQRF;
if M < N, A is overwritten by details of its LQ factorization as returned by GELQF.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
b
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,NRHS)
On entry, the matrix B of right hand side vectors, stored columnwise; B is M-by-NRHS if TRANS = ‘N’, or N-by-NRHS if TRANS = ‘T’.
On exit, if culaNoError is returned, B is overwritten by the solution vectors, stored columnwise:
if TRANS = ‘N’ and m >= n, rows 1 to n of B contain the least squares solution vectors; the residual sum of squares for the solution in each column is given by the sum of squares of elements N+1 to M in that column;
if TRANS = ‘N’ and m < n, rows 1 to N of B contain the minimum norm solution vectors;
if TRANS = ‘T’ and m >= n, rows 1 to M of B contain the minimum norm solution vectors;
if TRANS = ‘T’ and m < n, rows 1 to M of B contain the least squares solution vectors; the residual sum of squares for the solution in each column is given by the sum of squares of elements M+1 to N in that column.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= MAX(1,M,N).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GEQLF functionality is implemented by the following CULA routines:
Description
GEQLF computes a QL factorization of a real/complex M-by-N matrix A: A = Q * L.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer,
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix A.
On exit, if m >= n, the lower triangle of the subarray A(m-n+1:m,1:n) contains the N-by-N lower triangular matrix L; if m <= n, the elements on and below the (n-m)-th superdiagonal contain the M-by-N lower trapezoidal matrix L; the remaining elements, with the array TAU, represent the orthogonal/unitary matrix Q as a product of elementary reflectors (see Further Details).
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
tau
- Type: S/D/C/Z Pointer,
- Direction: Output
- Dimension: (min(M,N))
The scalar factors of the elementary reflectors (see Further Details).
Further Details
The matrix Q is represented as a product of elementary reflectors
Q = H(k) . . . H(2) H(1), where k = min(m,n).
Each H(i) has the form
H(i) = I - tau * v * v’
where tau is a real/complex scalar, and v is a real/complex vector with v(m-k+i+1:m) = 0 and v(m-k+i) = 1; v(1:m-k+i-1) is stored on exit in A(1:m-k+i-1,n-k+i), and tau in TAU(i).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GEQRF functionality is implemented by the following CULA routines:
Description
GEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix A.
On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal/unitary matrix Q as a product of min(m,n) elementary reflectors (see Further Details).
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
tau
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (min(M,N))
The scalar factors of the elementary reflectors (see Further Details).
Further Details
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I - tau * v * v’
where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GEQRS functionality is implemented by the following CULA routines:
Description
Solve the least squares problem
min || A*X - B ||
using the QR factorization
A = Q*R
computed by GEQRF.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix A. M >= N >= 0.
nrhs
- Type: int
- Direction: Input
The number of columns of B. NRHS >= 0.
a
- Type: S/D/C/Z Pointer,
- Direction: Input
- Dimension: (LDA,N)
Details of the QR factorization of the original matrix A as returned by GEQRF.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= M.
tau
- Type: S/D/C/Z Pointer,
- Direction: Input
- Dimension: (N)
Details of the orthogonal matrix Q.
b
- Type: S/D/C/Z Pointer,
- Direction: Input/Output
- Dimension: (LDB,NRHS)
On entry, the M-by-NRHS right hand side matrix B.
On exit, the N-by-NRHS solution matrix X.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= M.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GERQF functionality is implemented by the following CULA routines:
Description
GERQF computes an RQ factorization of a real M-by-N matrix A: A = R * Q.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix A.
On exit, if m <= n, the upper triangle of the subarray A(1:m,n-m+1:n) contains the M-by-M upper triangular matrix R; if m >= n, the elements on and above the (m-n)-th subdiagonal contain the M-by-N upper trapezoidal matrix R; the remaining elements, with the array TAU, represent the orthogonal/unitary matrix Q as a product of min(m,n) elementary reflectors (see Further Details).
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
tau
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (min(M,N))
The scalar factors of the elementary reflectors (see Further Details).
Further Details
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I - tau * v * v’
where tau is a real scalar, and v is a real vector with v(n-k+i+1:n) = 0 and v(n-k+i) = 1; v(1:n-k+i-1) is stored on exit in A(m-k+i,1:n-k+i-1), and tau in TAU(i).
Differences from LAPACK
See No Workspace Parameters section.
The GESV functionality is implemented by the following CULA routines:
Host Memory
- culaSgesv
- culaDgesv
- culaCgesv
- culaZgesv
- culaGesv (C++ style, type overloaded)
Device Memory
- culaDeviceSgesv
- culaDeviceDgesv
- culaDeviceCgesv
- culaDeviceZgesv
- culaDeviceGesv (C++ style, type overloaded)
Description
GESV computes the solution to a real system of linear equations
A * X = B,
where A is an N-by-N matrix and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as
A = P * L * U,
where P is a permutation matrix, L is unit lower triangular, and U is upper triangular. The factored form of A is then used to solve the system of equations A * X = B.
Parameters
n
- Type: int
- Direction: Input
The number of linear equations, i.e., the order of the matrix A. N >= 0.
nrhs
- Type: int
- Direction: Input
The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the N-by-N coefficient matrix A.
On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
ipiv
- Type: int Pointer
- Direction: Output
- Dimension: (N)
The pivot indices that define the permutation matrix P; row i of the matrix was interchanged with row IPIV(i).
b
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,NRHS)
On entry, the N-by-NRHS matrix of right hand side matrix B.
On exit, if culaNoError is returned, the N-by-NRHS solution matrix X.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= max(1,N).
Further Details
See Pivot Arrays section.
The GESV (Iteratively Refined) functionality is implemented by the following CULA routines:
Host Memory
- culaDsgesv
- culaZcgesv
- culaGesv (C++ style, type overloaded)
Device Memory
- culaDeviceDsgesv
- culaDeviceZcgesv
- culaDeviceGesv (C++ style, type overloaded)
Description
GESV computes the solution to a real/complex system of linear equations
A * X = B,
where A is an N-by-N matrix and X and B are N-by-NRHS matrices.
GESV first attempts to factorize the matrix in single-precision and use this factorization within an iterative refinement procedure to produce a solution with double-precision normwise backward error quality (see below). If the approach fails the method switches to a double-precision factorization and solve.
The iterative refinement process is stopped if
ITER > ITERMAX
or for all the RHS we have:
RNRM < SQRT(N)*XNRM*ANRM*EPS*BWDMAX
where
- ITER is the number of the current iteration in the iterative refinement process
- RNRM is the infinity-norm of the residual
- XNRM is the infinity-norm of the solution
- ANRM is the infinity-operator-norm of the matrix A
- EPS is the machine epsilon for double precision
The value ITERMAX and BWDMAX are fixed to 30 and 1.0D+00 respectively.
Parameters
n
- Type: int
- Direction: Input
The number of linear equations, i.e., the order of the matrix A. N >= 0.
nrhs
- Type: int
- Direction: Input
The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
a
- Type: D/Z Pointer
- Direction: Input or Input/Output
- Dimension: (LDA,N)
On entry, the N-by-N coefficient matrix A.
On exit, if iterative refinement has been successfully used (culaNoError is returned and iter > 0, see description below), then A is unchanged, if double precision factorization has been used (culaNoError is returned and iter < 0, see description below), then the array A contains the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
ipiv
- Type: int Pointer
- Direction: Output
- Dimension: (N)
The pivot indices that define the permutation matrix P; row i of the matrix was interchanged with row IPIV(i). Corresponds either to the single precision factorization (if culaNoError is returned and iter > 0) or the double precision factorization (if culaNoError is returned and iter < 0).
b
- Type: D/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,NRHS)
The N-by-NRHS right hand side matrix B.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= max(1,N).
x
- Type: D/Z Pointer
- Direction: Input/Output
- Dimension: (LDX,NRHS)
If culaNoError is returned, the N-by-NRHS solution matrix X.
ldx
- Type: int
- Direction: Input
The leading dimension of the array X. LDX >= max(1,N).
iter
- Type: int Pointer
- Direction: Output
- Dimension: (1)
> 0: iterative refinement has been sucessfully used. Returns the number of iterations
< 0: iterative refinement has failed, double-precision factorization has been performed
| Value | Operation |
|---|---|
| -1 | the routine fell back to full precision for implementation- or machine-specific reasons |
| -2 | narrowing the precision induced an overflow, the routine fell back to full precision |
| -3 | failure of single precision GETRF |
| -31 | stop the iterative refinement after the 30th iterations |
Differences from LAPACK
See No Workspace Parameters section. See Pivot Arrays section.
CULA Routines
The GESVD functionality is implemented by the following CULA routines:
Description
GESVD computes the singular value decomposition (SVD) of a real M-by-N matrix A, optionally computing the left and/or right singular vectors. The SVD is written
A = U * SIGMA * transpose(V)
where SIGMA is an M-by-N matrix which is zero except for its min(m,n) diagonal elements, U is an M-by-M orthogonal/unitary matrix, and V is an N-by-N orthogonal/unitary matrix. The diagonal elements of SIGMA are the singular values of A; they are real and non-negative, and are returned in descending order. The first min(m,n) columns of U and V are the left and right singular vectors of A.
Note that the routine returns VT, not V. This Toutput of GESVD is notable because other implementations, such as Matlab, return the non-transposed version via syntax like [U S V] = svd(A);. The V matrix then needs to be transposed to reconstruct the original matrix, such as U*S*V’. LAPACK avoids this by pre-transposing this output, but for those working with both LAPACK and Matlab code, this is a common pitfall.
Parameters
jobu
- Type: char
- Direction: Input
Specifies options for computing all or part of the matrix U:
= ‘A’: all M columns of U are returned in array U:
= ‘S’: the first min(m,n) columns of U (the left singular vectors) are returned in the array U;
= ‘O’: the first min(m,n) columns of U (the left singular vectors) are overwritten on the array A;
= ‘N’: no columns of U (no left singular vectors) are computed.
jobvt
- Type: char
- Direction: Input
Specifies options for computing all or part of the matrix VT :
= ‘A’: all N rows of VT are returned in the array VT;
= ‘S’: the first min(m,n) rows of VT (the right singular vectors) are returned in the array VT;
= ‘O’: the first min(m,n) rows of VT (the right singular vectors) are overwritten on the array A;
= ‘N’: no rows of VT (no right singular vectors) are computed.
JOBVT and JOBU cannot both be ‘O’.
m
- Type: int
- Direction: Input
The number of rows of the input matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the input matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix A.
On exit,
if JOBU = ‘O’, A is overwritten with the first min(m,n) columns of U (the left singular vectors, stored columnwise);
if JOBVT = ‘O’, A is overwritten with the first min(m,n) rows of VT (the right singular vectors, stored rowwise);
if JOBU != ‘O’ and JOBVT != ‘O’, the contents of A are destroyed.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
s
- Type: S/D Pointer
- Direction: Output
- Dimension: (min(M,N))
The singular values of A, sorted so that S(i) >= S(i+1).
u
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (LDU,UCOL)
(LDU,M) if JOBU = ‘A’ or (LDU,min(M,N)) if JOBU = ‘S’. If JOBU = ‘A’, U contains the M-by-M orthogonal/unitary matrix U; if JOBU = ‘S’, U contains the first min(m,n) columns of U (the left singular vectors, stored columnwise); if JOBU = ‘N’ or ‘O’, U is not referenced.
ldu
- Type: int
- Direction: Input
The leading dimension of the array U. LDU >= 1; if JOBU = ‘S’ or ‘A’, LDU >= M.
vt
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (LDVT,N)
If JOBVT = ‘A’, VT contains the N-by-N orthogonal/unitary matrix VT ; if JOBVT = ‘S’, VT contains the first min(m,n) rows of VT (the right singular vectors, stored rowwise); if JOBVT = ‘N’ or ‘O’, VT is not referenced.
ldvt
- Type: int
- Direction: Input
The leading dimension of the array VT. LDVT >= 1; if JOBVT = ‘A’, LDVT >= N; if JOBVT = ‘S’, LDVT >= min(M,N).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GETRF functionality is implemented by the following CULA routines:
Host Memory
- culaSgetrf
- culaDgetrf
- culaCgetrf
- culaZgetrf
- culaGetrf (C++ style, type overloaded)
Device Memory
- culaDeviceSgetrf
- culaDeviceDgetrf
- culaDeviceCgetrf
- culaDeviceZgetrf
- culaDeviceGetrf (C++ style, type overloaded)
Description
GETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form
A = P * L * U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix to be factored.
On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
ipiv
- Type: int Pointer
- Direction: Output
- Dimension: (min(M,N))
The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
Further Details
See Pivot Arrays section.
CULA Routines
The GETRI functionality is implemented by the following CULA routines:
Description
GETRI computes the inverse of a matrix using the LU factorization computed by GETRF.
This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
If solution to a system of linear equations, please favor the routines GESV, GETRF/GETRS, or GELS instead as they will provide more accurate answers in this case.
Parameters
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the factors L and U from the factorization A = P*L*U as computed by GETRF.
On exit, if culaNoError is returned, the inverse of the original matrix A.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
ipiv
- Type: int Pointer
- Direction: Input
- Dimension: (N)
The pivot indices from GETRF; for 1<=i<=N, row i of the matrix was interchanged with row IPIV(i).
Further Details
See Pivot Arrays section.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GETRS functionality is implemented by the following CULA routines:
Description
GETRS solves a system of linear equations
A * X = B or A’ * X = B
with a general N-by-N matrix A using the LU factorization computed by GETRF.
Parameters
trans
- Type: char
- Direction: Input
Specifies the form of the system of equations: = ‘N’: A * X = B (No transpose)
= ‘T’: A’* X = B (Transpose)
= ‘C’: A’* X = B (Conjugate transpose = Transpose)
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
nrhs
- Type: int
- Direction: Input
The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (LDA,N)
The factors L and U from the factorization A = P*L*U as computed by GETRF.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
ipiv
- Type: int Pointer
- Direction: Input
- Dimension: (N)
The pivot indices from GETRF; for 1<=i<=N, row i of the matrix was interchanged with row IPIV(i).
b
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,NRHS)
On entry, the right hand side matrix B.
On exit, the solution matrix X.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= max(1,N).
Further Details
See Pivot Arrays section.
CULA Routines
The GGLSE functionality is implemented by the following CULA routines:
Description
GGLSE solves the linear equality-constrained least squares (LSE) problem:
minimize || c - A*x ||_2 subject to B*x = d
where A is an M-by-N matrix, B is a P-by-N matrix, c is a given M-vector, and d is a given P-vector. It is assumed that P <= N <= M+P, and
- rank(B) = P and rank( (A) ) = N.
- ( (B) )
These conditions ensure that the LSE problem has a unique solution, which is obtained using a generalized RQ factorization of the matrices (B, A) given by
B = (0 R)*Q, A = Z*T*Q.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrices A and B. N >= 0.
p
- Type: int
- Direction: Input
The number of rows of the matrix B. 0 <= P <= N <= M+P.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix A.
On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix T.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
b
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,N)
On entry, the P-by-N matrix B.
On exit, the upper triangle of the subarray B(1:P,N-P+1:N) contains the P-by-P upper triangular matrix R.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= max(1,P).
c
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (M)
On entry, C contains the right hand side vector for the least squares part of the LSE problem.
On exit, the residual sum of squares for the solution is given by the sum of squares of elements N-P+1 to M of vector C.
d
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (P)
On entry, D contains the right hand side vector for the constrained equation.
On exit, D is destroyed.
x
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (N)
On exit, X is the solution of the LSE problem.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The GGRQF functionality is implemented by the following CULA routines:
Description
GGRQF computes a generalized RQ factorization of an M-by-N matrix A and a P-by-N matrix B:
A = R*Q, B = Z*T*Q,
where Q is an N-by-N orthogonal/unitary matrix, Z is a P-by-P orthogonal/unitary matrix, and R and T assume one of the forms:
where R12 or R21 is upper triangular, and
where T11 is upper triangular.
In particular, if B is square and nonsingular, the GRQ factorization of A and B implicitly gives the RQ factorization of A*inv(B):
A*inv(B) = (R*inv(T))*Z’
where inv(B) denotes the inverse of the matrix B, and Z’ denotes the transpose of the matrix Z.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix A. M >= 0.
p
- Type: int
- Direction: Input
The number of rows of the matrix B. P >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrices A and B. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the M-by-N matrix A.
On exit, if M <= N, the upper triangle of the subarray A(1:M,N-M+1:N) contains the M-by-M upper triangular matrix R; if M > N, the elements on and above the (M-N)-th subdiagonal contain the M-by-N upper trapezoidal matrix R; the remaining elements, with the array TAUA, represent the orthogonal/unitary matrix Q as a product of elementary reflectors (see Further Details).
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
taua
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (min(M,N))
The scalar factors of the elementary reflectors which represent the orthogonal/unitary matrix Q (see Further Details).
b
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,N)
On entry, the P-by-N matrix B.
On exit, the elements on and above the diagonal of the array contain the min(P,N)-by-N upper trapezoidal matrix T (T is upper triangular if P >= N); the elements below the diagonal, with the array TAUB, represent the orthogonal/unitary matrix Z as a product of elementary reflectors (see Further Details).
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= max(1,P).
taub
- Type: S/D/C/Z Pointer
- Direction: Output
- Dimension: (min(P,N))
The scalar factors of the elementary reflectors which represent the orthogonal/unitary matrix Z (see Further Details).
Further Details
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I - taua * v * v’
where taua is a real scalar, and v is a real vector with v(n-k+i+1:n) = 0 and v(n-k+i) = 1; v(1:n-k+i-1) is stored on exit in A(m-k+i,1:n-k+i-1), and taua in TAUA(i). To form Q explicitly, use LAPACK subroutine ORGRQ. To use Q to update another matrix, use LAPACK subroutine ORMRQ/UNMRQ.
The matrix Z is represented as a product of elementary reflectors
Z = H(1) H(2) . . . H(k), where k = min(p,n).
Each H(i) has the form
H(i) = I - taub * v * v’
where taub is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:p) is stored on exit in B(i+1:p,i), and taub in TAUB(i). To form Z explicitly, use LAPACK subroutine ORGQR/UNGQR. To use Z to update another matrix, use LAPACK subroutine ORMQR/UNMQR.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORGBR/UNGBR functionality is implemented by the following CULA routines:
Description
ORGBR/UNGBR generates one of the real/complex orthogonal/unitary matrices Q or PT determined by GEBRD when reducing a real matrix A to bidiagonal form: A = Q * B * PT. Q and PT are defined as products of elementary reflectors H(i) or G(i) respectively.
If VECT = ‘Q’, A is assumed to have been an M-by-K matrix, and Q is of order M: if m >= k, Q = H(1) H(2) . . . H(k) and ORGBR returns the first n columns of Q, where m >= n >= k; if m < k, Q = H(1) H(2) . . . H(m-1) and ORGBR returns Q as an M-by-M matrix.
If VECT = ‘P’, A is assumed to have been a K-by-N matrix, and PT is of order N: if k < n, PT = G(k) . . . G(2) G(1) and ORGBR returns the first m rows of PT , where n >= m >= k; if k >= n, PT = G(n-1) . . . G(2) G(1) and ORGBR returns PT as an N-by-N matrix.
Parameters
vect
- Type: char
- Direction: Input
Specifies whether the matrix Q or the matrix PT is required, as defined in the transformation applied by GEBRD:
= ‘Q’: generate Q;
= ‘P’: generate PT.
m
- Type: int
- Direction: Input
The number of rows of the matrix Q or PT to be returned. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix Q or PT to be returned. N >= 0. If VECT = ‘Q’, M >= N >= min(M,K); if VECT = ‘P’, N >= M >= min(N,K).
k
- Type: int
- Direction: Input
If VECT = ‘Q’, the number of columns in the original M-by-K matrix reduced by GEBRD. If VECT = ‘P’, the number of rows in the original K-by-N matrix reduced by GEBRD. K >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the vectors which define the elementary reflectors, as returned by GEBRD.
On exit, the M-by-N matrix Q or PT.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,M).
tau
- Type: S/D/C/Z Pointer, dimension
- Direction: Input
(min(M,K)) if VECT = ‘Q’ (min(N,K)) if VECT = ‘P’
TAU(i) must contain the scalar factor of the elementary reflector H(i) or G(i), which determines Q or PT, as returned by GEBRD in its array argument TAUQ or TAUP.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORGHR/UNGHR functionality is implemented by the following CULA routines:
Description
ORGHR/UNGHR generate a real/complex orthogonal/unitary matrix Q which is defined as the product of IHI-ILO elementary reflectors of order N, as returned by GEHRD:
Q = H(ilo) H(ilo+1) . . . H(ihi-1).
Parameters
n
- Type: int
- Direction: Input
The order of the matrix Q. N >= 0.
ilo
- Type: int
- Direction: Input
ihi
- Type: int
- Direction: Input
ILO and IHI must have the same values as in the previous call of GEHRD. Q is equal to the unit matrix except in the submatrix Q(ilo+1:ihi,ilo+1:ihi). 1 <= ILO <= IHI <= N, if N > 0; ILO=1 and IHI=0, if N=0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the vectors which define the elementary reflectors, as returned by GEHRD.
On exit, the N-by-N unitary matrix Q.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
tau
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (N-1)
TAU(i) must contain the scalar factor of the elementary reflector H(i), as returned by GEHRD.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORGLQ/UNGLQ functionality is implemented by the following CULA routines:
Description
ORGLQ/UNGLQ generates an M-by-N real matrix Q with orthonormal rows, which is defined as the first M rows of a product of K elementary reflectors of order N
Q = H(k) . . . H(2) H(1)
as returned by GELQF.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix Q. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix Q. N >= M.
k
- Type: int
- Direction: Input
The number of elementary reflectors whose product defines the matrix Q. M >= K >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the i-th row must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by GELQF in the first k rows of its array argument A.
On exit, the M-by-N matrix Q.
lda
- Type: int
- Direction: Input
The first dimension of the array A. LDA >= max(1,M).
tau
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (K)
TAU(i) must contain the scalar factor of the elementary reflector H(i), as returned by GELQF.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORGQL/UNGQL functionality is implemented by the following CULA routines:
Description
ORGQL/UNGQL generates an M-by-N real/complex matrix Q with orthonormal columns, which is defined as the last N columns of a product of K elementary reflectors of order M
Q = H(k) . . . H(2) H(1)
as returned by GELQF.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix Q. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix Q. M >= N >= 0.
k
- Type: int
- Direction: Input
The number of elementary reflectors whose product defines the matrix Q. N >= K >= 0.
a
- Type: S/D/C/Z Pointer,
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the (n-k+i)-th column must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by GELQF in the last k columns of its array argument A.
On exit, the M-by-N matrix Q.
lda
- Type: int
- Direction: Input
The first dimension of the array A. LDA >= max(1,M).
tau
- Type: S/D/C/Z Pointer,
- Direction: Input
- Dimension: (K)
TAU(i) must contain the scalar factor of the elementary reflector H(i), as returned by GELQF.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORGQR/UNGQR functionality is implemented by the following CULA routines:
Description
ORGQR/UNGQR generates an M-by-N real/complex matrix Q with orthonormal columns, which is defined as the first N columns of a product of K elementary reflectors of order M
Q = H(1) H(2) . . . H(k)
as returned by GEQRF.
Parameters
m
- Type: int
- Direction: Input
The number of rows of the matrix Q. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix Q. M >= N >= 0.
k
- Type: int
- Direction: Input
The number of elementary reflectors whose product defines the matrix Q. N >= K >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the i-th column must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by GEQRF in the first k columns of its array argument A.
On exit, the M-by-N matrix Q.
lda
- Type: int
- Direction: Input
The first dimension of the array A. LDA >= max(1,M).
tau
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (K)
TAU(i) must contain the scalar factor of the elementary reflector H(i), as returned by GEQRF.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORMLQ/UNMLQ functionality is implemented by the following CULA routines:
Description
ORMLQ/UNMLQ overwrite the general real/complex M-by-N matrix C with
| SIDE = ‘L’ | SIDE = ‘R’ | |
|---|---|---|
| TRANS = ‘N’ | Q * C | C * Q |
| TRANS = ‘T’ | QT * C | C * QT |
where Q is a real/complex orthogonal/unitary matrix defined as the product of k elementary reflectors
Q = H(k) . . . H(2) H(1)
as returned by GELQF. Q is of order M if SIDE = ‘L’ and of order N if SIDE = ‘R’.
Parameters
side
- Type: char
- Direction: Input
= ‘L’: apply Q or QT from the Left;
= ‘R’: apply Q or QT from the Right.
trans
- Type: char
- Direction: Input
= ‘N’: No transpose, apply Q;
= ‘T’: Transpose, apply QT.
m
- Type: int
- Direction: Input
The number of rows of the matrix C. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix C. N >= 0.
k
- Type: int
- Direction: Input
The number of elementary reflectors whose product defines the matrix Q. If SIDE = ‘L’, M >= K >= 0; if SIDE = ‘R’, N >= K >= 0.
a
- Type: S/D/C/Z Pointer, dimension
- Direction: Input
(LDA,M) if SIDE = ‘L’, (LDA,N) if SIDE = ‘R’
The i-th row must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by GELQF in the first k rows of its array argument A. A is modified by the routine but restored on exit.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,K).
tau
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (K)
TAU(i) must contain the scalar factor of the elementary reflector H(i), as returned by GELQF.
c
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDC,N)
On entry, the M-by-N matrix C.
On exit, C is overwritten by Q * C or QT * C or C * QT or C * Q.
ldc
- Type: int
- Direction: Input
The leading dimension of the array C. LDC >= max(1,M).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORMQL/UNMQL functionality is implemented by the following CULA routines:
Description
ORMQL/UNMQL overwrites the general real/complex M-by-N matrix C with
| SIDE = ‘L’ | SIDE = ‘R’ | |
|---|---|---|
| TRANS = ‘N’: | Q * C | C * Q |
| TRANS = ‘C’: | Q**H * C | C * Q**H |
where Q is a real/complex orthogonal/unitary matrix defined as the product of k elementary reflectors
Q = H(k) . . . H(2) H(1)
as returned by GELQF. Q is of order M if SIDE = ‘L’ and of order N if SIDE = ‘R’.
Parameters
side
- Type: char
- Direction: Input
= ‘L’: apply Q or Q**H from the Left;
= ‘R’: apply Q or Q**H from the Right.
trans
- Type: char
- Direction: Input
= ‘N’: No transpose, apply Q;
= ‘C’: Transpose, apply Q**H.
m
- Type: int
- Direction: Input
The number of rows of the matrix C. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix C. N >= 0.
k
- Type: int
- Direction: Input
The number of elementary reflectors whose product defines the matrix Q.
If SIDE = ‘L’, M >= K >= 0;
if SIDE = ‘R’, N >= K >= 0.
a
- Type: S/D/C/Z Pointer,
- Direction: Input
- Dimension: (LDA,K)
The i-th column must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by GELQF in the last k columns of its array argument A. A is modified by the routine but restored on exit.
lda
- Type: int
- Direction: Input
The leading dimension of the array A.
If SIDE = ‘L’, LDA >= max(1,M);
if SIDE = ‘R’, LDA >= max(1,N).
tau
- Type: S/D/C/Z Pointer,
- Direction: Input
- Dimension: (K)
TAU(i) must contain the scalar factor of the elementary reflector H(i), as returned by GELQF.
c
- Type: S/D/C/Z Pointer,
- Direction: Input/Output
- Dimension: (LDC,N)
On entry, the M-by-N matrix C.
On exit, C is overwritten by Q*C or Q**H*C or C*Q**H or C*Q.
ldc
- Type: int
- Direction: Input
The leading dimension of the array C. LDC >= max(1,M).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORMQR/UNMQR functionality is implemented by the following CULA routines:
Description
ORMQR/UNMQR overwrites the general real M-by-N matrix C with
| SIDE = ‘L’ | SIDE = ‘R’ | |
|---|---|---|
| TRANS = ‘N’ | Q * C | C * Q |
| TRANS = ‘T’ | QT * C | C * QT |
where Q is a real/complex orthogonal/unitary matrix defined as the product of k elementary reflectors
Q = H(1) H(2) . . . H(k)
as returned by GEQRF. Q is of order M if SIDE = ‘L’ and of order N if SIDE = ‘R’.
Parameters
side
- Type: char
- Direction: Input
= ‘L’: apply Q or QT from the Left;
= ‘R’: apply Q or QT from the Right.
trans
- Type: char
- Direction: Input
= ‘N’: No transpose, apply Q;
= ‘T’: Transpose, apply QT.
m
- Type: int
- Direction: Input
The number of rows of the matrix C. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix C. N >= 0.
k
- Type: int
- Direction: Input
The number of elementary reflectors whose product defines the matrix Q. If SIDE = ‘L’, M >= K >= 0; if SIDE = ‘R’, N >= K >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (LDA,K)
The i-th column must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by GEQRF in the first k columns of its array argument A. A is modified by the routine but restored on exit.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. If SIDE = ‘L’, LDA >= max(1,M); if SIDE = ‘R’, LDA >= max(1,N).
tau
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (K)
TAU(i) must contain the scalar factor of the elementary reflector H(i), as returned by GEQRF.
c
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDC,N)
On entry, the M-by-N matrix C.
On exit, C is overwritten by Q * C or QT * C or C * QT or C * Q.
ldc
- Type: int
- Direction: Input
The leading dimension of the array C. LDC >= max(1,M).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The ORMRQ/UNMRQ functionality is implemented by the following CULA routines:
Description
ORMRQ/UNMRQ overwrites the general real M-by-N matrix C with
| SIDE = ‘L’ | SIDE = ‘R’ | |
|---|---|---|
| TRANS = ‘N’ | Q * C | C * Q |
| TRANS = ‘T’ | QT * C | C * QT |
where Q is a real/complex orthogonal/unitary matrix defined as the product of k elementary reflectors
Q = H(1) H(2) . . . H(k)
as returned by GERQF. Q is of order M if SIDE = ‘L’ and of order N if SIDE = ‘R’.
Parameters
side
- Type: char
- Direction: Input
= ‘L’: apply Q or QT from the Left;
= ‘R’: apply Q or QT from the Right.
trans
- Type: char
- Direction: Input
= ‘N’: No transpose, apply Q;
= ‘T’: Transpose, apply QT.
m
- Type: int
- Direction: Input
The number of rows of the matrix C. M >= 0.
n
- Type: int
- Direction: Input
The number of columns of the matrix C. N >= 0.
k
- Type: int
- Direction: Input
The number of elementary reflectors whose product defines the matrix Q. If SIDE = ‘L’, M >= K >= 0; if SIDE = ‘R’, N >= K >= 0.
a
- Type: S/D/C/Z Pointer, dimension
- Direction: Input
(LDA,M) if SIDE = ‘L’, (LDA,N) if SIDE = ‘R’
The i-th row must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by GERQF in the last k rows of its array argument A. A is modified by the routine but restored on exit.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,K).
tau
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (K)
TAU(i) must contain the scalar factor of the elementary reflector H(i), as returned by GERQF.
c
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDC,N)
On entry, the M-by-N matrix C.
On exit, C is overwritten by Q * C or QT * C or C * QT or C * Q.
ldc
- Type: int
- Direction: Input
The leading dimension of the array C. LDC >= max(1,M).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The POSV functionality is implemented by the following CULA routines:
Description
POSV computes the solution to a real system of linear equations
A * X = B,
where A is an N-by-N symmetric positive definite matrix and X and B are N-by-NRHS matrices.
The Cholesky decomposition is used to factor A as
A = UT * U, if UPLO = ‘U’, or A = L * LT, if UPLO = ‘L’,
where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of A is then used to solve the system of equations A * X = B.
Parameters
uplo
- Type: char
- Direction: Input
= ‘U’: Upper triangle of A is stored;
= ‘L’: Lower triangle of A is stored.
n
- Type: int
- Direction: Input
The number of linear equations, i.e., the order of the matrix A. N >= 0.
nrhs
- Type: int
- Direction: Input
The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the symmetric matrix A. If UPLO = ‘U’, the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = ‘L’, the leading N-by-N lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced.
On exit, if culaNoError is returned, the factor U or L from the Cholesky factorization A = UT * U or A = L * LT .
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
b
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,NRHS)
On entry, the N-by-NRHS right hand side matrix B.
On exit, if culaNoError is returned, the N-by-NRHS solution matrix X.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= max(1,N).
CULA Routines
The POTRF functionality is implemented by the following CULA routines:
Description
POTRF computes the Cholesky factorization of a real symmetric positive definite matrix A.
The factorization has the form
A = UT * U, if UPLO = ‘U’, or A = L * LT, if UPLO = ‘L’,
where U is an upper triangular matrix and L is lower triangular.
Parameters
uplo
- Type: char
- Direction: Input
= ‘U’: Upper triangle of A is stored;
= ‘L’: Lower triangle of A is stored.
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the symmetric matrix A. If UPLO = ‘U’, the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = ‘L’, the leading N-by-N lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced.
On exit, if culaNoError is returned, the factor U or L from the Cholesky factorization A = UT * U or A = L * LT.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
CULA Routines
The POTRS functionality is implemented by the following CULA routines:
Description
POTRS solves a system of linear equations A*X = B with a symmetric positive definite matrix A using the Cholesky factorization A = UT * U or A = L * LT computed by POTRF.
Parameters
uplo
- Type: char
- Direction: Input
= ‘U’: Upper triangle of A is stored;
= ‘L’: Lower triangle of A is stored.
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
nrhs
- Type: int
- Direction: Input
The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (LDA,N)
The triangular factor U or L from the Cholesky factorization A = UT * U or A = L * LT, as computed by POTRF.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
b
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,NRHS)
On entry, the right hand side matrix B.
On exit, the solution matrix X.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= max(1,N).
CULA Routines
The STEBZ functionality is implemented by the following CULA routines:
Description
STEBZ computes the eigenvalues of a symmetric tridiagonal matrix T. The user may ask for all eigenvalues, all eigenvalues in the half-open interval (VL, VU], or the IL-th through IU-th eigenvalues.
To avoid overflow, the matrix must be scaled so that its largest element is no greater than overflow**(1/2) * underflow**(1/4) in absolute value, and for greatest accuracy, it should not be much smaller than that.
See W. Kahan “Accurate Eigenvalues of a Symmetric Tridiagonal Matrix”, Report CS41, Computer Science Dept., Stanford University, July 21, 1966.
Please note that complex versions of this function do not exist.
Parameters
range
- Type: char
- Direction: Input
= ‘A’: (“All”) all eigenvalues will be found.
= ‘V’: (“Value”) all eigenvalues in the half-open interval (VL, VU] will be found.
= ‘I’: (“Index”) the IL-th through IU-th eigenvalues (of the entire matrix) will be found.
order
- Type: char
- Direction: Input
= ‘B’: (“By Block”) the eigenvalues will be grouped by split-off block (see IBLOCK, ISPLIT) and ordered from smallest to largest within the block. This code behaves similarly to “E” and will report only one block.
= ‘E’: (“Entire matrix”) the eigenvalues for the entire matrix will be ordered from smallest to largest.
n
- Type: int
- Direction: Input
The order of the tridiagonal matrix T. N >= 0.
vl
- Type: S/D Value
- Direction: Input
vu
- Type: S/D Value
- Direction: Input
If RANGE=’V’, the lower and upper bounds of the interval to be searched for eigenvalues. Eigenvalues less than or equal to VL, or greater than VU, will not be returned. VL < VU. Not referenced if RANGE = ‘A’ or ‘I’.
il
- Type: int
- Direction: Input
iu
- Type: int
- Direction: Input
If RANGE=’I’, the indices (in ascending order) of the smallest and largest eigenvalues to be returned. 1 <= IL <= IU <= N, if N > 0; IL = 1 and IU = 0 if N = 0. Not referenced if RANGE = ‘A’ or ‘V’.
abstol
- Type: S/D
- Direction: Input
The absolute tolerance for the eigenvalues. An eigenvalue (or cluster) is considered to be located if it has been determined to lie in an interval whose width is ABSTOL or less. If ABSTOL is less than or equal to zero, then ULP*| T | will be used, where | T | means the 1-norm of T.
Eigenvalues will be computed most accurately when ABSTOL is set to twice the underflow threshold, not zero.
d
- Type: S/D Pointer
- Direction: Input
- Dimension: (N)
The N diagonal elements of the tridiagonal matrix T.
e
- Type: S/D Pointer
- Direction: Input
- Dimension: (N-1)
The (N-1) off-diagonal elements of the tridiagonal matrix T. Not referenced if N <= 1.
m
- Type: int Pointer, always host
- Direction: Output
The actual number of eigenvalues found. 0 <= M <= N.
nsplit
- Type: int Pointer, always host
- Direction: Output
The number of diagonal blocks in the matrix T. 1 <= NSPLIT <= N.
w
- Type: S/D Pointer
- Direction: Output
- Dimension: (N)
On exit, the first M elements of W will contain the eigenvalues. (STEBZ may use the remaining N-M elements as workspace.)
iblock
- Type: int Pointer
- Direction: Output
- Dimension: (N)
At each row/column j where E(j) is zero or small, the matrix T is considered to split into a block diagonal matrix. On exit, if culaNoError is returned, IBLOCK(i) specifies to which block (from 1 to the number of blocks) the eigenvalue W(i) belongs. (STEBZ may use the remaining N-M elements as workspace.)
isplit
- Type: int Pointer
- Direction: Output
- Dimension: (N)
The splitting points, at which T breaks up into submatrices. The first submatrix consists of rows/columns 1 to ISPLIT(1), the second of rows/columns ISPLIT(1)+1 through ISPLIT(2), etc., and the NSPLIT-th consists of rows/columns ISPLIT(NSPLIT-1)+1 through ISPLIT(NSPLIT)=N.
(Only the first NSPLIT elements will actually be used, but since the user cannot know a priori what value NSPLIT will have, N words must be reserved for ISPLIT.)
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The STEQR functionality is implemented by the following CULA routines:
Description
STEQR computes all eigenvalues and, optionally, eigenvectors of a symmetric tridiagonal matrix using the implicit QL or QR method. The eigenvectors of a full or band real/complex symmetric/Hermitian matrix can also be found if HETRD or HPTRD or HBTRD has been used to reduce this matrix to tridiagonal form.
Parameters
compz
- Type: char
- Direction: Input
= ‘N’: Compute eigenvalues only.
= ‘V’: Compute eigenvalues and eigenvectors of the original symmetric/Hermitian matrix. On entry, Z must contain the unitary matrix used to reduce the original matrix to tridiagonal form.
= ‘I’: Compute eigenvalues and eigenvectors of the tridiagonal matrix. Z is initialized to the identity matrix.
n
- Type: int
- Direction: Input
The order of the matrix. N >= 0.
d
- Type: S/D Pointer,
- Direction: Input/Output
- Dimension: (N)
On entry, the diagonal elements of the tridiagonal matrix.
On exit, if culaNoError is returned, the eigenvalues in ascending order.
e
- Type: S/D Pointer,
- Direction: Input/Output
- Dimension: (N-1)
On entry, the (n-1) subdiagonal elements of the tridiagonal matrix.
On exit, E has been destroyed.
z
- Type: S/D/C/Z Pointer,
- Direction: Input/Output
- Dimension: (LDZ, N)
On entry, if COMPZ = ‘V’, then Z contains the unitary matrix used in the reduction to tridiagonal form.
On exit, if culaNoError is returned, then if COMPZ = ‘V’, Z contains the orthonormal eigenvectors of the original symmetric/Hermitian matrix, and if COMPZ = ‘I’, Z contains the orthonormal eigenvectors of the symmetric tridiagonal matrix. If COMPZ = ‘N’, then Z is not referenced.
ldz
- Type: int
- Direction: Input
The leading dimension of the array Z. LDZ >= 1, and if eigenvectors are desired, then LDZ >= max(1,N).
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The SYEV/HEEV functionality is implemented by the following CULA routines:
Description
SYEV/HEEV computes all eigenvalues and, optionally, eigenvectors of a real/complex symmetric/Hermitian matrix A.
Parameters
jobz
- Type: char
- Direction: Input
= ‘N’: Compute eigenvalues only;
= ‘V’: Compute eigenvalues and eigenvectors.
uplo
- Type: char
- Direction: Input
= ‘U’: Upper triangle of A is stored;
= ‘L’: Lower triangle of A is stored.
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA, N)
On entry, the symmetric/Hermitian matrix A. If UPLO = ‘U’, the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A. If UPLO = ‘L’, the leading N-by-N lower triangular part of A contains the lower triangular part of the matrix A.
On exit, if JOBZ = ‘V’, then if culaNoError is returned, A contains the orthonormal eigenvectors of the matrix A. If JOBZ = ‘N’, then on exit the lower triangle (if UPLO=’L’) or the upper triangle (if UPLO=’U’) of A, including the diagonal, is destroyed.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
w
- Type: S/D Pointer
- Direction: Output
- Dimension: (N)
If culaNoError is returned, the eigenvalues in ascending order.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The SYEVX/HEEVX functionality is implemented by the following CULA routines:
Description
SYEVX/HEEVX computes selected eigenvalues and, optionally, eigenvectors of a real/complex symmetric/Hermitian matrix A. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for the desired eigenvalues.
Parameters
jobz
- Type: char
- Direction: Input
= ‘N’: Compute eigenvalues only;
= ‘V’: Compute eigenvalues and eigenvectors.
range
- Type: char
- Direction: Input
= ‘A’: all eigenvalues will be found.
= ‘V’: all eigenvalues in the half-open interval (VL,VU] will be found.
= ‘I’: the IL-th through IU-th eigenvalues will be found.
uplo
- Type: char
- Direction: Input
= ‘U’: Upper triangle of A is stored;
= ‘L’: Lower triangle of A is stored.
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer,
- Direction: Input/Output
- Dimension: (LDA, N)
On entry, the symmetric/Hermitian matrix A. If UPLO = ‘U’, the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A. If UPLO = ‘L’, the leading N-by-N lower triangular part of A contains the lower triangular part of the matrix A.
On exit, the lower triangle (if UPLO=’L’) or the upper triangle (if UPLO=’U’) of A, including the diagonal, is destroyed.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
vl
- Type: S/D/C/Z
- Direction: Input
vu
- Type: S/D/C/Z
- Direction: Input
If RANGE=’V’, the lower and upper bounds of the interval to be searched for eigenvalues. VL < VU. Not referenced if RANGE = ‘A’ or ‘I’.
il
- Type: int
- Direction: Input
iu
- Type: int
- Direction: Input
If RANGE=’I’, the indices (in ascending order) of the smallest and largest eigenvalues to be returned. 1 <= IL <= IU <= N, if N > 0; IL = 1 and IU = 0 if N = 0. Not referenced if RANGE = ‘A’ or ‘V’.
abstol
- Type: S/D/C/Z
- Direction: Input
The absolute error tolerance for the eigenvalues. An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width less than or equal to
ABSTOL + EPS * max( | a | , | b | ) ,
where EPS is the machine precision. If ABSTOL is less than or equal to zero, then EPS*| T | will be used in its place, where | T | is the 1-norm of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when ABSTOL is set to twice the underflow threshold, not zero. If this routine returns with culaDataError, indicating that some eigenvectors did not converge, try setting ABSTOL to the recommended value.
See “Computing Small Singular Values of Bidiagonal Matrices with Guaranteed High Relative Accuracy,” by Demmel and Kahan, LAPACK Working Note #3.
m
- Type: int Pointer, always host
- Direction: Output
The total number of eigenvalues found. 0 <= M <= N. If RANGE = ‘A’, M = N, and if RANGE = ‘I’, M = IU-IL+1.
w
- Type: S/D Pointer,
- Direction: Output
- Dimension: (N)
On normal exit, the first M elements contain the selected eigenvalues in ascending order.
z
- Type: S/D/C/Z Pointer,
- Direction: Output
- Dimension: (LDZ, max(1,M))
If JOBZ = ‘V’, then if culaNoError is returned, the first M columns of Z contain the orthonormal eigenvectors of the matrix A corresponding to the selected eigenvalues, with the i-th column of Z holding the eigenvector associated with W(i).
If an eigenvector fails to converge, then that column of Z contains the latest approximation to the eigenvector, and the index of the eigenvector is returned in IFAIL.
If JOBZ = ‘N’, then Z is not referenced.
Note: the user must ensure that at least max(1,M) columns are supplied in the array Z; if RANGE = ‘V’, the exact value of M is not known in advance and an upper bound must be used.
ldz
- Type: int
- Direction: Input
The leading dimension of the array Z. LDZ >= 1, and if JOBZ = ‘V’, LDZ >= max(1,N).
ifail
- Type: int Pointer
- Direction: Output
- Dimension: (N)
If JOBZ = ‘V’, then if culaNoError is returned, the first M elements of IFAIL are zero. If culaDataError is returned, then IFAIL contains the indices of the eigenvectors that failed to converge.
If JOBZ = ‘N’, then IFAIL is not referenced.
Differences from LAPACK
See No Workspace Parameters section.
CULA Routines
The SYRDB/HERDB functionality is implemented by the following CULA routines:
Description
Given a symmetric/Hermitian N-by-N matrix A, SYRDB/HERDB reduces A to a symmetric tridiagonal form (T) using a series of orthogonal transformations (Q), such that
A = Q * T * QT.
Note: This function should be used in place of SSYTRD, DSYTRD, CHETRD, and ZHETRD when an orthogonal Q is not needed as the performance of these functions does not scale to large matrix sizes.
See “A framework for symmetric band reduction,” by Bischof, Lang, and Sun, ACM Transactions on Mathematical Software (TOMS) archive Volume 26, Issue 4.
Parameters
jobz
- Type: char
- Direction: Input
Specifies which matrices outputs are contained within the outputs A and Z.
= ‘N’ : Forms the symmetric tridiagonal matrix T. Overwrites A with the banded matrix, B, and the information needed to construct QB.
= ‘V’ : Forms the symmetric tridiagonal matrix T. Overwrites A with Q.
= ‘U’ : Forms the symmetric tridiagonal matrix T. Overwrites A with the banded matrix, B, and the information needed to construct QB. Also, overwrites Z with Z*Q.
uplo
- Type: char
- Direction: Input
Specifies the triangular region where the symmetric input matrix is defined.
= ‘U’ : defined within the upper triangular part of A
= ‘L’ : defined within the lower triangular part of A
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
kd
- Type: int
- Direction: Input
The bandwidth of the banded output matrix, B. KD >= 1.
a
- Type: S/D/C/Z Pointer,
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, A is a real/complex symmetric/Hermitian matrix described by UPLO.
On exit, given the job code of ‘V’, A is overwritten by Q. Given a job code of ‘N’ or ‘U’, A is overwritten by B and QB as defined by UPLO and KD.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
d
- Type: S/D/C/Z Pointer,
- Direction: Output
- Dimension: max(1,N)
Holds the diagonal elements of the symmetric tridiagonal matrix T.
e
- Type: S/D/C/Z Pointer,
- Direction: Output
- Dimension: max(N-1)
Holds the off-diagonal elements of the symmetric tridiagonal matrix T.
tau
- Type: S/D/C/Z Pointer,
- Direction: Output
- Dimension: max(1,N-KD-1)
The scalar factors of the elementary reflectors that form QB.
z
- Type: S/D Pointer,
- Direction: Input/Output
- Dimension: (N-1)
An optional matrix that can be multiplied by Q given a ‘U’ job code.
= ‘U’ : contains the product of itself and Q (Z*Q).
= ‘N’, ‘V’ : not referenced
ldz
- Type: int
- Direction: Input
The leading dimension of the array Z. LDZ >= max(1,N).
Differences from LAPACK
See No Workspace Parameters section.
This function is not implemented in CULA. If a tridiagonal reduction is required, please see SYRDB/HERDB
CULA Routines
The TRTRI functionality is implemented by the following CULA routines:
Description
TRTRI computes the inverse of a real upper or lower triangular matrix A.
Parameters
uplo
- Type: char
- Direction: Input
= ‘U’: A is upper triangular;
= ‘L’: A is lower triangular.
diag
- Type: char
- Direction: Input
= ‘N’: A is non-unit triangular;
= ‘U’: A is unit triangular.
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDA,N)
On entry, the triangular matrix A. If UPLO = ‘U’, the leading N-by-N upper triangular part of the array A contains the upper triangular matrix, and the strictly lower triangular part of A is not referenced. If UPLO = ‘L’, the leading N-by-N lower triangular part of the array A contains the lower triangular matrix, and the strictly upper triangular part of A is not referenced. If DIAG = ‘U’, the diagonal elements of A are also not referenced and are assumed to be 1.
On exit, the (triangular) inverse of the original matrix, in the same storage format.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
CULA Routines
The TRTRS functionality is implemented by the following CULA routines:
Description
TRTRS solves a triangular system of the form
A * X = B or AT * X = B,
where A is a triangular matrix of order N, and B is an N-by-NRHS matrix. A check is made to verify that A is nonsingular.
Parameters
uplo
- Type: char
- Direction: Input
= ‘U’: A is upper triangular;
= ‘L’: A is lower triangular.
trans
- Type: char
- Direction: Input
Specifies the form of the system of equations: = ‘N’: A * X = B (No transpose)
= ‘T’: AT * X = B (Transpose)
= ‘C’: AH * X = B (Conjugate transpose = Transpose)
diag
- Type: char
- Direction: Input
= ‘N’: A is non-unit triangular;
= ‘U’: A is unit triangular.
n
- Type: int
- Direction: Input
The order of the matrix A. N >= 0.
nrhs
- Type: int
- Direction: Input
The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
a
- Type: S/D/C/Z Pointer
- Direction: Input
- Dimension: (LDA,N)
The triangular matrix A. If UPLO = ‘U’, the leading N-by-N upper triangular part of the array A contains the upper triangular matrix, and the strictly lower triangular part of A is not referenced. If UPLO = ‘L’, the leading N-by-N lower triangular part of the array A contains the lower triangular matrix, and the strictly upper triangular part of A is not referenced. If DIAG = ‘U’, the diagonal elements of A are also not referenced and are assumed to be 1.
lda
- Type: int
- Direction: Input
The leading dimension of the array A. LDA >= max(1,N).
b
- Type: S/D/C/Z Pointer
- Direction: Input/Output
- Dimension: (LDB,NRHS)
On entry, the right hand side matrix B.
On exit, if culaNoError is returned, the solution matrix X.
ldb
- Type: int
- Direction: Input
The leading dimension of the array B. LDB >= max(1,N).
The usage of some CULA functions differ slightly from their LAPACK equivalents, though they perform the same operations. This section details some of the API-wide ways that CULA and LAPACK differ.
Many LAPACK functions require a workspace for internal operation. For those LAPACK functions that utilize a workspace, workspace sizes are queried by providing a -1 argument to what is typically an LWORK parameter. Upon inspecting this parameter, the LAPACK function will determine the workspace required for this particular problem size and will return the value in the WORK parameter. LAPACK (and other similar packages) then require the programmer to provide a pointer to memory of sufficient size, which often requires that the programmer allocate new memory.
CULA uses both main and GPU workspace memories, and as such, LAPACK’s workspace query is not appropriate, as the LAPACK interface allows for the specification of only one workspace. Instead of providing a more complicated interface that adds parameters for both main and GPU workspace memories, CULA requires neither. Instead, any workspaces that are required are allocated and tracked internally. This organization yields no significant performance loss, and furthermore reduces the number of function calls by removing the need for a workspace query.
Note
Any workspaces that have been allocated internally may be cleared by calling culaFreeBuffers().
This section lists some of the common errors users make when using CULA and similar LAPACK packages.
This section applies to functions in the LU Family (getrf, gesv, getrs, etc.).
CULA pivot arrays follow LAPACK conventions. These arrays are created for serial evaluation and describe a series of row interchanges. The array [2 3 3] states “swap the first row with the second, then the second row with the third, then the third row is unchanged.” For those working with Matlab, note that Matlab follows a different convention, in which the pivot array describes a set of parallel row interchanges. The Matlab array [2 3 1] is equivalent to the first example, and states “for the first row, obtain the second row; for the second row obtain the third; and for the third row obtain the first.”
A common GPU usage pattern is to pad matrices to multiples of 8/16/32 elements in order to achieve performance. Routines such as GEMM (matrix-matrix multiply, as found in CUBLAS) are data-insensitive to these extra elements if they are zero. CULA routines function differently and in many cases will react poorly if the zeros are included in the computation space; for instance, solving a system in which the coefficient matrix has a full row or column of zeros will result in a culaDataError as the matrix is indeed singular. To avoid this problem, please be sure to set the LDx parameters to the padded size, but to set the remainder of the inputs that describe sizes (M, N, etc) to match the size of the valid data for the computation (that is: the size before padding.) This will avoid many data errors.