## Insufficient memory

6 posts
• Page

**1**of**1**### Insufficient memory

Hello.

1. We are performing full SVD of a 12500 \times 12500 matrix in double precision using culaDgesvd.

We are facing the problem:

"Insufficient memory to complete this operation"

but the memory taken by the input matrix is 12 500 * 12 500 * (8 bytes) = 1.16415322 gigabytes only.

System specifications: Dell Precision T7600 workstation, Dual 8-core Xeon (E5-2650) 2.0GHz, 64GB DDR3, Quadro K5000 (4 GB), CentOS release 6.4 (Final).

2. How to debug CULA programs? (Is there something similar to gdb / would cuda-gdb work to see the variables in the GPU memory?) Is there a GUI-based debugger / IDE similar to Visual Studio in linux to systematically debug CULA programs step-by-step during every operation?

Thank you very much.

1. We are performing full SVD of a 12500 \times 12500 matrix in double precision using culaDgesvd.

We are facing the problem:

"Insufficient memory to complete this operation"

but the memory taken by the input matrix is 12 500 * 12 500 * (8 bytes) = 1.16415322 gigabytes only.

System specifications: Dell Precision T7600 workstation, Dual 8-core Xeon (E5-2650) 2.0GHz, 64GB DDR3, Quadro K5000 (4 GB), CentOS release 6.4 (Final).

2. How to debug CULA programs? (Is there something similar to gdb / would cuda-gdb work to see the variables in the GPU memory?) Is there a GUI-based debugger / IDE similar to Visual Studio in linux to systematically debug CULA programs step-by-step during every operation?

Thank you very much.

- megadata
**Posts:**3**Joined:**Sun Apr 07, 2013 7:28 pm

### Re: Insufficient memory

The memory used in SVD is highly dependent on the flags you've passed to SVD. For example, the A/A case requires three whole matrices to fit on the GPU (plus a bit of scratch space), which is pushing the limits of your 4GB card.

As for debugging, CULA is designed to be black-box, which makes third-party debugging mostly futile, unfortunately.

As for debugging, CULA is designed to be black-box, which makes third-party debugging mostly futile, unfortunately.

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: Insufficient memory

Thank you John.

1. Are there CULA library functions to perform out of core very large matrix (that exceeds GPU memory capacity but fits in the main memory) multiplication, QR and SVD? So, for multiplication it appears we have to rewrite our custom Dgemm to implement multiplication by blocking the input matrices?

2. We have some very large but sparse matrices. Currently, we expand the matrix stored in a file into its full form and then use culaDgemm with 'n' or 't' as required for the input matrices (we checked CULA SPARSE but could not find a multiplication or SVD routine for multiplying matrices stored in sparse format, i.e., matrix is stored in a file with 3 columns similar to MATLAB - row index (int starting with 1), column index (int starting with 1), entry (double)).

3. How exactly is the SVD data-dependent (we performed large double precision matrix SVD at the GPU memory limits; it succeeds on some synthetic data but fails with at other times with other datasets)?

4. If we have 2 GPUs, we use culaSelectDevice to choose the GPU for a executing a particular program. Can we code such that the 2 GPUs effectively appear as a single GPU to the programmer (so that resources can be shared; for example, our available GPU memory would be doubled from 4GB to 8GB)?

5. Are there advanced/special gcc flags to optimize CULA code (currently, we are using some custom CUDA functions compiled using nvcc and linked with gcc with O3, loop unrolling, etc for the C code)?

Thank you very much.

1. Are there CULA library functions to perform out of core very large matrix (that exceeds GPU memory capacity but fits in the main memory) multiplication, QR and SVD? So, for multiplication it appears we have to rewrite our custom Dgemm to implement multiplication by blocking the input matrices?

2. We have some very large but sparse matrices. Currently, we expand the matrix stored in a file into its full form and then use culaDgemm with 'n' or 't' as required for the input matrices (we checked CULA SPARSE but could not find a multiplication or SVD routine for multiplying matrices stored in sparse format, i.e., matrix is stored in a file with 3 columns similar to MATLAB - row index (int starting with 1), column index (int starting with 1), entry (double)).

3. How exactly is the SVD data-dependent (we performed large double precision matrix SVD at the GPU memory limits; it succeeds on some synthetic data but fails with at other times with other datasets)?

4. If we have 2 GPUs, we use culaSelectDevice to choose the GPU for a executing a particular program. Can we code such that the 2 GPUs effectively appear as a single GPU to the programmer (so that resources can be shared; for example, our available GPU memory would be doubled from 4GB to 8GB)?

5. Are there advanced/special gcc flags to optimize CULA code (currently, we are using some custom CUDA functions compiled using nvcc and linked with gcc with O3, loop unrolling, etc for the C code)?

Thank you very much.

- megadata
**Posts:**3**Joined:**Sun Apr 07, 2013 7:28 pm

### Re: Insufficient memory

1. Not at present; all data must fit on the GPU.

2. I'm afraid we don't have SVD for sparse at this time.

3. There is an iterative component to the calculation of the singular vectors, if you are invoking that option.

4. Splitting a problem across GPUs is in the domain of pCULA. There is a preview of this in the current CULA builds, but not for SVD. (Some routines in LAPACK split to multiple compute elements much more easily than others.)

5. CULA is distributed binary, so compiler flags will have no effect on it.

2. I'm afraid we don't have SVD for sparse at this time.

3. There is an iterative component to the calculation of the singular vectors, if you are invoking that option.

4. Splitting a problem across GPUs is in the domain of pCULA. There is a preview of this in the current CULA builds, but not for SVD. (Some routines in LAPACK split to multiple compute elements much more easily than others.)

5. CULA is distributed binary, so compiler flags will have no effect on it.

- john
- Administrator
**Posts:**587**Joined:**Thu Jul 23, 2009 2:31 pm

### Re: Insufficient memory

Hi, I'm currently working with cula in svd function, and I am facing the same problem. We need to work with large matrices bigger than GPU memory. So I would like to know if there is any solution to deal with this memory problem?

Thank you.

Thank you.

- gabusleon11
**Posts:**1**Joined:**Thu Nov 05, 2015 9:07 am

6 posts
• Page

**1**of**1**### Who is online

Users browsing this forum: No registered users and 1 guest