## Selecting a Sparse Solver and Preconditioner

Selecting the "best" sparse iterative solver and preconditioner is often a difficult decision. Very rarely can one simply know which combination will converge quickest to find a solution within the given constraints. Often the best answer requires knowledge pertaining to the structure of the matrix and the properties it exhibits. To help aid in the selection of a solver and preconditioner, we have constructed some flow charts to help gauge which solver and preconditioner might work best. Again, since there is no correct answer for a given system, we encourage users to experiment with different solvers, preconditioners, and options. These charts are only designed to give *suggestions,* and not absolute answers.

## Sparse 101: Calling a Sparse System Solve

In this post, I’ll show you how to call a matrix solve using our soon to be released CULA Sparse package. For this example, I will show the main tasks to calling our Cg solver with a Block Jacobi preconditioner for a CSR matrix with double-precision data.

The first set of parameters to consider is the matrix system to be solved, Ax=b. For these inputs, you need to consider which matrix formats and which precision you are using; see this blog post for a discussion of matrix formats. The relevant parameters for this system are:

- n - size of the matrix system
- nnz - number of non-zero elements
- A, colInd, rowPtr - a CSR representation of the matrix to be solved
- x - the solution vector
- b - the right-hand-side to solve against

These parameters will be passed directly to a function with “Dcsr” in its name to denote the double-precision data (D) and CSR representation (csr), such as in the line below:

culaDcsr{Function}(..., N, nnz, A, colInd, rowPtr, x, b, ...);

Now that I’ve discussed the matrix system, the next parameter to consider is the configuration structure for setting options that are common to all the solvers. Among these options are the solution relative tolerance, maximum number of iterations, maximum runtime, indexing-format, and whether debugging mode has been enabled. An example configuration may look like:

culaIterativeConfig config; culaIterativeConfigInit(&config); config.maxIterations = 1000; config.tolerance = 1e-6; config.indexing = 1;

As you can see, setting up problem options is an easy task. Each option is clear and self-documenting. The config initialization routine will set sensible defaults for most problems, but it’s worth double checking to see if they meet up with your own needs and perhaps overriding them as we have done above.

The last set of options to consider are the solver- and preconditioner-specific options. These options are done with a set of structures that are initialized similarly to the general configuration structure. To use the Cg solver with a Blockjacobi preconditioner, you would write:

culaCgOptions solverOptions; culaCgOptionsInit(&solverOptions); culaBlockjacobiOptions preOptions; culaBlockjacobiOptionsInit(&preOptions); preOptions.blockSize = 4;

Above, we have default initialized both structures but then overrided the Block Jacobi preconditioner block size. Because each solver and preconditioner is initialized very similarly, this makes trying out different solvers and preconditioners an easy task.

Putting it all of the parameters together, we end up with the following line:

culaDcsrCgBlockjacobi(&config, &solverOptions, &preOptions, N, nnz, A, colInd, rowPtr, x, b, &result);

That’s it! We built CULA Sparse so that it would be easy to set up and work with different options while making sure that the code is always clear about what it is doing.

## Sparse 101: Matrix Formats

With the release of the CULA Sparse beta, we thought it would be useful to present an introduction to sparse matrix formats. Traditionally, a matrix is considered sparse when the number of non-zero elements is significantly less than the number of zero elements. When represented in a program, sparse matrices, unlike the dense matrices used in CULA’s LAPACK functions, are a logical, compressed representation of a matrix. Whereas a dense matrix represents all elements of a matrix, including zeroes, a sparse matrix will represent only non-zero elements. An immediate benefit of this approach is that algorithmic speed improvements can be made by disregarding the zero elements for many operations. An arguably more important benefit (and a focus of this article) is that a representation that stores only non-zero elements allows the total memory used by a sparse matrix to be significantly less than it would be if it were stored densely.

Consider an 8x8 matrix, shown to the right. In this matrix, only 12 of the 64 entries (18%) are populated. If we were to adopt a sparse storage format for this matrix, we could reduce the storage by ~60%, from 512 bytes down to 192 with a compressed format.

The simplest compressed format, coordinate (COO), represents a matrix by its non-zero values and an index at which each non-zero is located. For example, in the example matrix, the value 3.0 is located at (2,2) using 1-based indexing. These indices do add to the storage cost for the matrix, but because the number of non-zeros is small, there is a net gain when compared with a dense representation. The full representation of this matrix in COO is the following:

values = [ 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ] column index = [ 1 1 2 2 3 3 4 4 5 6 7 8 ] row index = [ 1 6 2 8 1 4 5 7 3 6 2 8 ]

There are several other popular sparse matrix formats in addition to coordinate format. Although coordinate format is the easiest to understand and implement, it is not always the preferred format, because other formats, such as compressed sparse row format (CSR), can increase the compression at the expense of a little bit more work. In fact, CSR is the preferred format for CULA Sparse, because of its size advantage and amenability to GPU acceleration.

In the above example, we showed only 18% of the entries as non-zero, but it is common for the sparsity (number of non-zeros) of the matrix to be much larger for some problem domains. Lower sparsity leads to lower storage requirements, which means that we can fit larger and larger problem within the memory that is available to us. For example, whereas in CULA matrices that can be solved on a typical workstation typically max out at about 16K by 16K, in CULA Sparse matrices can be as large as 100M by 100M, depending on its sparsity.