memory leak in culaDeviceSsyev
13 posts
• Page 1 of 1
memory leak in culaDeviceSsyev
Hi.
I've found that when I am using CULA R11 with WIN7 32bit, VS2008, the culaDeviceSsyev leaks the host memory (NOT video memory). In my program, I have to compute the eigen values/vectors frequently. As a result, I ran out of the host memory right the way. The part of the code is as the following:
float *dw;
float *dE;
CUDA_CALL(cudaMalloc((void**)&dw, N*sizeof(float)));
CUDA_CALL(cudaMalloc((void**)&dE, N*N*sizeof(float)));
CUDA_CALL(cudaMemcpy(dE, dA, N*N*sizeof(float), cudaMemcpyDeviceToDevice));
CULA_CALL(culaDeviceSsyev('V', 'L', N, dE, N, dw));
Please advise.
I've found that when I am using CULA R11 with WIN7 32bit, VS2008, the culaDeviceSsyev leaks the host memory (NOT video memory). In my program, I have to compute the eigen values/vectors frequently. As a result, I ran out of the host memory right the way. The part of the code is as the following:
float *dw;
float *dE;
CUDA_CALL(cudaMalloc((void**)&dw, N*sizeof(float)));
CUDA_CALL(cudaMalloc((void**)&dE, N*N*sizeof(float)));
CUDA_CALL(cudaMemcpy(dE, dA, N*N*sizeof(float), cudaMemcpyDeviceToDevice));
CULA_CALL(culaDeviceSsyev('V', 'L', N, dE, N, dw));
Please advise.
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
Re: memory leak in culaDeviceSsyev
Hello, we would like to investigate, but will need a complete program that demonstrates this behavior. SYEV's operations can vary dramatically based on the data and the size of N.
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: memory leak in culaDeviceSsyev
Hi. It is great that someone is lookng at this problem.
I cannt give you the package that we are working on now due to the confidential issues. However, I will probably able to give you one exmple which shows how to reproduce it.
By the way, this problem happened in cuda 3.2 + cula-r11. I think it also can happen when cuda4.0 + cula-r12. However, My configuration for cuda-4.0 and cula-r12 is different to the official suggetstion from NVIDIA and CULA thus I cannt confirm it.
Also,I believe this problem can also happened on ubuntu 10.04 64bit + cuda 4.0. I am still looking at it.
Please let me know how can I send you the test code.
Sincerely yours,
I cannt give you the package that we are working on now due to the confidential issues. However, I will probably able to give you one exmple which shows how to reproduce it.
By the way, this problem happened in cuda 3.2 + cula-r11. I think it also can happen when cuda4.0 + cula-r12. However, My configuration for cuda-4.0 and cula-r12 is different to the official suggetstion from NVIDIA and CULA thus I cannt confirm it.
Also,I believe this problem can also happened on ubuntu 10.04 64bit + cuda 4.0. I am still looking at it.
Please let me know how can I send you the test code.
Sincerely yours,
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
Re: memory leak in culaDeviceSsyev
Hi I would like to update the status.
I've found the memory leak on a win7 32bits cuda-3.2 cula-r11 laptop, in which the video card shared the memory with the host.
The situation remains on a win7 32bits cuda-4.0.13 cula-r12 (in which I made copies of DLLs in order to fulfill the DLL version requirement of cula-r12).
Today I recompiled my code for a ubuntu 10.04 64bit cuda-4.0.17 cula-r12. The host memory leak is disappeared. But, within each iteration, I lost about 1 MB gpu memory. Sometimes these disappeared memory are released. However, the memory leak is increased smoothly (about 1 MB in each iteration).
On the ubuntu machine, I have two GPU cards and 1GB memory on each cards. On my laptop, the video cards has only about 256MB. This makes a huge different.
I guess it is a sort of cache mechanism inside of cula?
I've found the memory leak on a win7 32bits cuda-3.2 cula-r11 laptop, in which the video card shared the memory with the host.
The situation remains on a win7 32bits cuda-4.0.13 cula-r12 (in which I made copies of DLLs in order to fulfill the DLL version requirement of cula-r12).
Today I recompiled my code for a ubuntu 10.04 64bit cuda-4.0.17 cula-r12. The host memory leak is disappeared. But, within each iteration, I lost about 1 MB gpu memory. Sometimes these disappeared memory are released. However, the memory leak is increased smoothly (about 1 MB in each iteration).
On the ubuntu machine, I have two GPU cards and 1GB memory on each cards. On my laptop, the video cards has only about 256MB. This makes a huge different.
I guess it is a sort of cache mechanism inside of cula?
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
Re: memory leak in culaDeviceSsyev
Yes, there is some memory caching present in CULA. The cache will reduce itself periodically, or you can request it manually with culaFreeBuffers()
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: memory leak in culaDeviceSsyev
Hi,
I did test the culaFreeBuffers(). In the fact, I called the initialize and shutdown in each iteration as the following:
int initialize() {
cublasInit();
culaInitialize();
return EXIT_SUCCESS;
}
int shutdown() {
culaFreeBuffers();
culaShutdown();
cublasShutdown();
return EXIT_SUCCESS;
}
However, the memory leak is still there...I think either the culaFreeBuffers() or the culaShutdown() didn't release the memory right the way.
I did test the culaFreeBuffers(). In the fact, I called the initialize and shutdown in each iteration as the following:
int initialize() {
cublasInit();
culaInitialize();
return EXIT_SUCCESS;
}
int shutdown() {
culaFreeBuffers();
culaShutdown();
cublasShutdown();
return EXIT_SUCCESS;
}
However, the memory leak is still there...I think either the culaFreeBuffers() or the culaShutdown() didn't release the memory right the way.
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
Re: memory leak in culaDeviceSsyev
Ok then, with that tested then we will need to request from you a test example which exhibits this behavior. Please post here; you can send it to me via forum PM if you would like to keep it hidden.
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: memory leak in culaDeviceSsyev
Hi, These piece of code will eat all of the memory (on my Macbook it is the host memory. I think on standalone GPU card, it will be GPU memory)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CULA_USE_CUDA_COMPLEX
#include <culapackdevice.h>
#include <cuda_runtime.h>
#ifdef _MSC_VER
# pragma comment(lib, "cudart.lib")
#endif
void checkStatus(culaStatus status)
{
char buf[80];
if(!status)
return;
culaGetErrorInfoString(status, culaGetErrorInfo(), buf, sizeof(buf));
printf("%s\n", buf);
culaShutdown();
exit(EXIT_FAILURE);
}
void checkCudaError(cudaError_t err)
{
if(!err)
return;
printf("%s\n", cudaGetErrorString(err));
culaShutdown();
exit(EXIT_FAILURE);
}
int main(int argc, char** argv)
{
#ifdef NDEBUG
int M = 8192;
#else
int M = 1024;
#endif
int N = M;
int i;
cudaError_t err;
culaStatus status;
// point to host memory
float* A = NULL;
float* TAU = NULL;
// point to device memory
float* Ad = NULL;
float* TAUd = NULL;
printf("Allocating Matrices\n");
A = (float*)malloc(M*N*sizeof(float));
TAU = (float*)malloc(N*sizeof(float));
if(!A || !TAU)
exit(EXIT_FAILURE);
err = cudaMalloc((void**)&Ad, M*N*sizeof(float));
checkCudaError(err);
err = cudaMalloc((void**)&TAUd, N*sizeof(float));
checkCudaError(err);
printf("Initializing CULA\n");
status = culaInitialize();
checkStatus(status);
memset(A, 0, M*N*sizeof(float));
err = cudaMemcpy(Ad, A, M*N*sizeof(float), cudaMemcpyHostToDevice);
checkCudaError(err);
// printf("Calling culaDeviceSgeqrf\n");
// status = culaDeviceSgeqrf(M, N, Ad, M, TAUd);
for(i = 0; i < 10000; i ++) {
printf("Calling culaDeviceSsyev: %d\n", i);
status = culaDeviceSsyev('V', 'L', N, Ad, N, TAUd);
culaFreeBuffers();
checkStatus(status);
}
err = cudaMemcpy(A, Ad, M*N*sizeof(float), cudaMemcpyDeviceToHost);
checkCudaError(err);
err = cudaMemcpy(TAU, TAUd, N*sizeof(float), cudaMemcpyDeviceToHost);
checkCudaError(err);
printf("Shutting down CULA\n");
culaShutdown();
cudaFree(Ad);
cudaFree(TAUd);
free(A);
free(TAU);
return EXIT_SUCCESS;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CULA_USE_CUDA_COMPLEX
#include <culapackdevice.h>
#include <cuda_runtime.h>
#ifdef _MSC_VER
# pragma comment(lib, "cudart.lib")
#endif
void checkStatus(culaStatus status)
{
char buf[80];
if(!status)
return;
culaGetErrorInfoString(status, culaGetErrorInfo(), buf, sizeof(buf));
printf("%s\n", buf);
culaShutdown();
exit(EXIT_FAILURE);
}
void checkCudaError(cudaError_t err)
{
if(!err)
return;
printf("%s\n", cudaGetErrorString(err));
culaShutdown();
exit(EXIT_FAILURE);
}
int main(int argc, char** argv)
{
#ifdef NDEBUG
int M = 8192;
#else
int M = 1024;
#endif
int N = M;
int i;
cudaError_t err;
culaStatus status;
// point to host memory
float* A = NULL;
float* TAU = NULL;
// point to device memory
float* Ad = NULL;
float* TAUd = NULL;
printf("Allocating Matrices\n");
A = (float*)malloc(M*N*sizeof(float));
TAU = (float*)malloc(N*sizeof(float));
if(!A || !TAU)
exit(EXIT_FAILURE);
err = cudaMalloc((void**)&Ad, M*N*sizeof(float));
checkCudaError(err);
err = cudaMalloc((void**)&TAUd, N*sizeof(float));
checkCudaError(err);
printf("Initializing CULA\n");
status = culaInitialize();
checkStatus(status);
memset(A, 0, M*N*sizeof(float));
err = cudaMemcpy(Ad, A, M*N*sizeof(float), cudaMemcpyHostToDevice);
checkCudaError(err);
// printf("Calling culaDeviceSgeqrf\n");
// status = culaDeviceSgeqrf(M, N, Ad, M, TAUd);
for(i = 0; i < 10000; i ++) {
printf("Calling culaDeviceSsyev: %d\n", i);
status = culaDeviceSsyev('V', 'L', N, Ad, N, TAUd);
culaFreeBuffers();
checkStatus(status);
}
err = cudaMemcpy(A, Ad, M*N*sizeof(float), cudaMemcpyDeviceToHost);
checkCudaError(err);
err = cudaMemcpy(TAU, TAUd, N*sizeof(float), cudaMemcpyDeviceToHost);
checkCudaError(err);
printf("Shutting down CULA\n");
culaShutdown();
cudaFree(Ad);
cudaFree(TAUd);
free(A);
free(TAU);
return EXIT_SUCCESS;
}
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
Re: memory leak in culaDeviceSsyev
I have run this program to completion (10000 SYEV calls) now on two platforms:
* Linux 64, C2050
* Mac OSX 10.6 (64-bit) on a Macbook Pro (9600 GPU)
Both machines are CUDA 4 and CULA R12.
In neither case do I observe a leak; on the Mac the usage sits at 56.7 MB of allocated RAM for the full duration. Hopefully we can find a configuration that exhibits the error.
* Linux 64, C2050
* Mac OSX 10.6 (64-bit) on a Macbook Pro (9600 GPU)
Both machines are CUDA 4 and CULA R12.
In neither case do I observe a leak; on the Mac the usage sits at 56.7 MB of allocated RAM for the full duration. Hopefully we can find a configuration that exhibits the error.
- john
- Administrator
- Posts: 587
- Joined: Thu Jul 23, 2009 2:31 pm
Re: memory leak in culaDeviceSsyev
Oh this example is not good enough. Later on I will give you another.
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
Re: memory leak in culaDeviceSsyev
Hi I will PM you with an example.
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
Re: memory leak in culaDeviceSsyev
Ok new thing I just discovered is that when using cuda 4 with cula-r12 (win7 32bit), the machine won't crash. But it still ran out of the memory.
In cuda 3 with cula r11, it just crash my computer. I think the cuda driver has some thing to do in this.
In cuda 3 with cula r11, it just crash my computer. I think the cuda driver has some thing to do in this.
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
Re: memory leak in culaDeviceSsyev
Also, I've found it somehow is related to the loop. For example, if you set M to 16, it will run faster, also, it will also consume the memory faster.
- huangchbii
- Posts: 15
- Joined: Wed Jul 07, 2010 1:27 am
13 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 0 guests