## Interpreting CULA Sparse Results

One design goal for CULA Sparse was to give the user informative output so to avoid the user having to write verbose checking routines. The routine culaIterativeResultString() is key here. This routine accepts a culaIterativeResult structure which is an output from each CULA Sparse solver (it is the last parameter). The output produced is shown below:

Solver: Cg Precond: Block Jacobi (block size 16) Flag: Converged successfully in 27 iterations Residual: 8.424304e-07 Total Time: 0.02827s (overhead + precond + solve) Overhead: 0.000569s Precond: 2.8e-05s Solve: 0.02767s

You will notice that basic stats are produced, such as the solver and preconditioner used. The Flag field helps to interpret the mathematical status of the solve process. The example here shows a successful convergence in 27 iterations, but the Flag can also indicate conditions such as solver stagnation (failing to make progress for several consecutive iterations) or numerical breakdown. The Residual field indicates the quality of the final answer.

There is then a timing output block, which shows a total execution time plus a breakdown of where the time was spent. The Overhead field shows time spent for GPU-specific operations such as device memory allocation and transfer. The Precond field shows the total time required to *generate* the preconditioner, because the time required to generate a given preconditioner can vary wildly among different matrices and different preconditioners. The final field, Solve, shows the time taken for the actual system solution.

In addition to the culaIterativeResult field, each solver *returns* a culaStatus that is used to indicate important runtime information, such as incorrect parameters (specifying a matrix size less than zero, for example) or not having the proper version of the CUDA driver installed. Users of CULA Dense will already be familiar with this parameter. In all cases, it is recommended to first check the returned status, followed then by obtaining the iterative result string. The examples in your CULA Sparse installation clearly show how to integrate this into your code.

## Selecting a Sparse Solver and Preconditioner

Selecting the "best" sparse iterative solver and preconditioner is often a difficult decision. Very rarely can one simply know which combination will converge quickest to find a solution within the given constraints. Often the best answer requires knowledge pertaining to the structure of the matrix and the properties it exhibits. To help aid in the selection of a solver and preconditioner, we have constructed some flow charts to help gauge which solver and preconditioner might work best. Again, since there is no correct answer for a given system, we encourage users to experiment with different solvers, preconditioners, and options. These charts are only designed to give *suggestions,* and not absolute answers.

## Sparse 101: Calling a Sparse System Solve

In this post, I’ll show you how to call a matrix solve using our soon to be released CULA Sparse package. For this example, I will show the main tasks to calling our Cg solver with a Block Jacobi preconditioner for a CSR matrix with double-precision data.

The first set of parameters to consider is the matrix system to be solved, Ax=b. For these inputs, you need to consider which matrix formats and which precision you are using; see this blog post for a discussion of matrix formats. The relevant parameters for this system are:

- n - size of the matrix system
- nnz - number of non-zero elements
- A, colInd, rowPtr - a CSR representation of the matrix to be solved
- x - the solution vector
- b - the right-hand-side to solve against

These parameters will be passed directly to a function with “Dcsr” in its name to denote the double-precision data (D) and CSR representation (csr), such as in the line below:

culaDcsr{Function}(..., N, nnz, A, colInd, rowPtr, x, b, ...);

Now that I’ve discussed the matrix system, the next parameter to consider is the configuration structure for setting options that are common to all the solvers. Among these options are the solution relative tolerance, maximum number of iterations, maximum runtime, indexing-format, and whether debugging mode has been enabled. An example configuration may look like:

culaIterativeConfig config; culaIterativeConfigInit(&config); config.maxIterations = 1000; config.tolerance = 1e-6; config.indexing = 1;

As you can see, setting up problem options is an easy task. Each option is clear and self-documenting. The config initialization routine will set sensible defaults for most problems, but it’s worth double checking to see if they meet up with your own needs and perhaps overriding them as we have done above.

The last set of options to consider are the solver- and preconditioner-specific options. These options are done with a set of structures that are initialized similarly to the general configuration structure. To use the Cg solver with a Blockjacobi preconditioner, you would write:

culaCgOptions solverOptions; culaCgOptionsInit(&solverOptions); culaBlockjacobiOptions preOptions; culaBlockjacobiOptionsInit(&preOptions); preOptions.blockSize = 4;

Above, we have default initialized both structures but then overrided the Block Jacobi preconditioner block size. Because each solver and preconditioner is initialized very similarly, this makes trying out different solvers and preconditioners an easy task.

Putting it all of the parameters together, we end up with the following line:

culaDcsrCgBlockjacobi(&config, &solverOptions, &preOptions, N, nnz, A, colInd, rowPtr, x, b, &result);

That’s it! We built CULA Sparse so that it would be easy to set up and work with different options while making sure that the code is always clear about what it is doing.