Page 1 of 1

Solve matrix equations with different RHS on version S5

PostPosted: Sun Mar 30, 2014 11:20 am
by guanjian
Hello, I am current using cula sparse to solve matrix equations with different RHS. Since the system matrix is not changed, the construction of preconditioner should be performed once no matter of the change of RHS. I just want to solve multiple times without repeating construction of preconditioner. But I fail to do that. Does anyone know how to organize the plan interfaces to get that? Thank you very much.

I just build a testing code from "iterativeSystemSolve" in S5 version
Code: Select all


/*
* CULA Example: iterativeSystemSolve
*
* This example demonstrates how to use the "expert interface" which allows for
* more fine grained control over the direction of the solve.
*/

#include <vector>
#include <algorithm>
#include <iostream>
#include <complex>

#include <cula_sparse.h>

using namespace std;

std::vector<int> get_square_grid(int n);
int build_lapacian(int n, std::vector<complex<double> >& values, std::vector<int>& rowInd, std::vector<int>& colInd);

class StatusChecker
{
public:
    StatusChecker(culaSparseHandle handle);
    void operator=(culaSparseStatus status);

private:
    culaSparseHandle handle_;
};

int main(int argc, char** argv)
{

    // string buffer
    const int bufsize = 512;
    char buffer[bufsize];

    // initialize cula sparse library
    culaSparseHandle handle;
    if (culaSparseCreate(&handle) != culaSparseNoError)
    {
        // this should only fail under extreme conditions
        std::cout << "fatal error: failed to create library handle!" << std::endl;
        exit(EXIT_FAILURE);
    }

    // error handling class
    StatusChecker sc(handle);

    // create a plan
    culaSparsePlan plan;
    sc = culaSparseCreatePlan(handle, &plan);

    // create a 2d laplacian matrix from a square grid
    const int grid_size = 512;
    std::vector<complex<double> > values;
    std::vector<int> rowInd;
    std::vector<int> colInd;
    int n = build_lapacian(grid_size, values, rowInd, colInd);
    int nnz = int(values.size());

    // allocate vectors
    complex<double> *x = new complex<double> [n];
    complex<double> *b = new complex<double> [n];

    // set configuration
    culaSparseConfig config;
    sc = culaSparseConfigInit(handle, &config);
    config.relativeTolerance = 1e-3;
    config.maxIterations = 1000;

    // print configuration information
    std::cout << "testing Laplacian matrix with n=" << n << " nnz=" << nnz << "...\n\n";
    sc = culaSparseGetConfigString(handle, &config, buffer, bufsize);
    std::cout << buffer << std::endl;

    culaSparseResult result;

    // this will gracefully return an error if no cuda platform is found
    if (culaSparsePreinitializeCuda(handle) == culaSparseNoError)
    {
        // change to cuda accelerated platform
        sc = culaSparseSetCudaPlatform(handle, plan, 0);

        // set ilu0 preconditioner
        sc = culaSparseSetIlu0Preconditioner(handle, plan, 0);

        sc = culaSparseSetBicgstabSolver(handle, plan, 0);

//1st time
        for(int i=0; i<n; i++) b[i] = complex<double>(1.0,2.0);
        sc = culaSparseSetZcooData(handle, plan, 0, n, nnz, (culaSparseComplexDouble*)&values[0], &rowInd[0], &colInd[0],
                (culaSparseComplexDouble*)x, (culaSparseComplexDouble*)b);
        sc = culaSparseExecutePlan(handle, plan, &config, &result);
        sc = culaSparseGetResultString(handle, &result, buffer, bufsize);
        std::cout << buffer << std::endl;
        for(int i=1; i<10; i++) cout<<i<<"\t"<<x[i]<<endl;

//2nd time
        for(int i=0; i<n; i++) b[i] = complex<double>(1.0,4.0);
        sc = culaSparseSetZcooData(handle, plan, 0, n, nnz, (culaSparseComplexDouble*)&values[0], &rowInd[0], &colInd[0],
                (culaSparseComplexDouble*)x, (culaSparseComplexDouble*)b);
        sc = culaSparseExecutePlan(handle, plan, &config, &result);
        sc = culaSparseGetResultString(handle, &result, buffer, bufsize);
        std::cout << buffer << std::endl;
        for(int i=1; i<10; i++) cout<<i<<"\t"<<x[i]<<endl;

//3rd time
        for(int i=0; i<n; i++) b[i] = complex<double>(3.0,4.0);
        sc = culaSparseExecutePlan(handle, plan, &config, &result);
        sc = culaSparseGetResultString(handle, &result, buffer, bufsize);
        std::cout << buffer << std::endl;
        for(int i=1; i<10; i++) cout<<i<<"\t"<<x[i]<<endl;

      
    }
    else
    {
        std::cout << "alert: no cuda capable gpu found" << std::endl;
    }

    // cleanup plan
    culaSparseDestroyPlan(plan);

    // cleanup handle
    culaSparseDestroy(handle);
}

std::vector<int> get_square_grid(int n)
{
    // allocate square grid with -1
    std::vector<int> grid(n*n, -1);

    // fill interior index map
    int idx = 0;
    for (int i = 1; i < n-1; i++)
        for (int j = 1; j < n-1; j++)
            grid[i*n + j] = idx++;

    return grid;
}

int build_lapacian(int n, std::vector<complex<double> >& values, std::vector<int>& rowInd, std::vector<int>& colInd)
{
    // get square grid
    std::vector<int> grid = get_square_grid(n);

    // build index set list
    std::vector<int> index_set;
    for (size_t i = 0; i < grid.size(); i++)
        if (grid[i] != -1)     
            index_set.push_back(int(i));

    // fill sparse vectors
    for (size_t i = 0; i < index_set.size(); i++)
    {
        // add self
        int self_index = grid[index_set[i]];
        values.push_back(4.0);
        rowInd.push_back(self_index);
        colInd.push_back(self_index);

        // add adjacent sides
        int degree = 1;
        int adj_index[4];
        adj_index[0] = grid[index_set[i] - 1];
        adj_index[1] = grid[index_set[i] + 1];
        adj_index[2] = grid[index_set[i] - n];
        adj_index[3] = grid[index_set[i] + n];
        for (int j = 0; j < 4; j++)
        {
            if (adj_index[j] != -1)
            {
                // values.push_back(-1.0);
                values.push_back(-1.0);
                rowInd.push_back(self_index);
                colInd.push_back(adj_index[j]);
                degree++;
            }
        }

    }

    return int(index_set.size());
}

StatusChecker::StatusChecker(culaSparseHandle handle) : handle_(handle) {}

void StatusChecker::operator=(culaSparseStatus status)
{
    // expected cases
    if (status == culaSparseNoError || status == culaSparseNonConvergence)
        return;

    // there was an error; get additional information
    const int buffer_size = 256;
    char buffer[buffer_size];

    if (culaSparseGetLastStatusString(handle_, buffer, buffer_size) != culaSparseNoError)
        std::cout << "error: could not retrieve status string" << std::endl;
    else
        std::cout << "error: " << buffer << std::endl;
}




I execute the solver 3 times. The result of 3rd time is the exact same with one in 2nd time, and no preconditioner construction cost in 3rd time. That means the data in system equation does not updated. While the result from 2nd time is different from 1st time, because the culaSparseSetZcooData is performed. But this way will reconstruct the preconditioner for system matrix, it is a redundancy. I guess the functionality of culaSparseSetZcooData is to copy the host data to device. Thus when I change the RHS at host, there is no update for device data. I do not known if the culaDevice platform will be helpful.

Please let me know if anyone has an approach to fix that. Thank you very much.

Re: Solve matrix equations with different RHS on version S5

PostPosted: Mon Mar 31, 2014 9:57 pm
by guanjian
Using cudaDevice interface can fix the problem as described.

Re: Solve matrix equations with different RHS on version S5

PostPosted: Mon Sep 15, 2014 11:39 pm
by prince66
very nice forum,,,,,,,,,,,,,,,,,,,,,,,