[CUDA]Determining threads per block, no. of blocks and Grid
Posted: Mon Mar 15, 2010 3:12 pm
Hi Evryone,
How to determine threads per block, no. of blocks on a CUDA Device?
For example: i need to multiply two single dimensioned arrays A, B and Copy the result into C Array.
How to determine threads per block, no. of blocks on a CUDA Device?
For example: i need to multiply two single dimensioned arrays A, B and Copy the result into C Array.
Code: Select all
int N = 10; //Array Containing Maximum of 10 elements
size_t size = N*sizeof(float);
...
cudaMalloc((**void &&)a_d, size);
cudaMalloc((**void &&)b_d, size);
cudaMalloc((**void &&)c_d, size);
...
...
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b_h, size, cudaMemcpyHostToDevice);
//How to determine no. of threads here???
int threadsPerBlock = ???
int noOfBlocks = ??
fmultiply<<>>(a_d, b_d, c_d);
cudaMemcpy(c_d, c_h, size, cudaMemcpyDeviceToHost);
...
...