gpu programming - Memory Error in CUDA Program for Fermi GPU -
i facing following problem on geforce gtx 580 (fermi-class) gpu.
just give background, reading single-byte samples packed in following manner in file: real(signal 1), imaginary(signal 1), real(signal 2), imaginary(signal 2). (each byte signed char, taking values between, -128 , 127.) read these char4 array, , use kernel given below copy them 2 float2 arrays corresponding each signal. (this isolated part of larger program.)
when run program using cuda-memcheck, either unqualified unspecified launch failure
, or same message along user stack overflow or breakpoint hit
or invalid __global__ write of size 8
@ random thread , block indices.
the main kernel , launch-related code reproduced below. the strange thing code works (and cuda-memcheck throws no error) on non-fermi-class gpu have access to. thing observed fermi gives no error n
less 16384.
#define n 32768 int main(int argc, char *argv[]) { char4 *pc4buf_h = null; char4 *pc4buf_d = null; float2 *pf2inx_d = null; float2 *pf2iny_d = null; dim3 dimbcopy(1, 1, 1); dim3 dimgcopy(1, 1); ... /* check errors in actual code */ pc4buf_h = (char4 *) malloc(n * sizeof(char4)); (void) cudamalloc((void **) &pc4buf_d, n * sizeof(char4)); (void) cudamalloc((void **) &pf2inx_d, n * sizeof(float2)); (void) cudamalloc((void **) &pf2iny_d, n * sizeof(float2)); ... dimbcopy.x = 1024; /* number of threads in block, gpu */ dimgcopy.x = n / 1024; copydataforfft<<<dimgcopy, dimbcopy>>>(pc4buf_d, pf2inx_d, pf2iny_d); ... } __global__ void copydataforfft(char4 *pc4data, float2 *pf2fftinx, float2 *pf2fftiny) { int = (blockidx.x * blockdim.x) + threadidx.x; pf2fftinx[i].x = (float) pc4data[i].x; pf2fftinx[i].y = (float) pc4data[i].y; pf2fftiny[i].x = (float) pc4data[i].z; pf2fftiny[i].y = (float) pc4data[i].w; return; }
one other thing noticed in program if comment out 2 char-to-float assignment statements in kernel, there's no memory error. 1 other thing noticed in program if comment out either first 2 or last 2 char-to-float assignment statements in kernel, there's no memory error. if comment out 1 first 2 (pf2fftinx
), , second 2 (pf2fftiny
), errors still crop up, less frequently. kernel uses 6 registers 4 assignment statements uncommented, , uses 5 4 registers 2 assignment statements commented out.
i tried 32-bit toolkit in place of 64-bit toolkit, 32-bit compilation -m32
compiler option, running without x windows, etc. program behaviour same.
i use cuda 4.0 driver , runtime (also tried cuda 3.2) on rhel 5.6. gpu compute capability 2.0.
please help! post entire code if interested in running on fermi cards.
update: heck of it, inserted __syncthreads()
between pf2fftinx
, pf2fftiny
assignment statements, , memory errors disappeared n
= 32768. @ n
= 65536, still errors.<--
didn't last long. still getting errors.
update: in continuing weird behaviour, when run program using cuda-memcheck, these 16x16 blocks of multi-coloured pixels distributed randomly on screen. not happen if run program directly.
the problem bad gpu card (see comments). [i'm adding answer remove question unanswered list , make more useful.]
Comments
Post a Comment