-
Notifications
You must be signed in to change notification settings - Fork 87
Open
Description
I am very new to ManagedCUDA. I am trying to the sum of the squares on a large 2D float array.
Here is the C# method:
public static float CalculateSumDSQ(float[,] Maxtrix, float DataMean)
{
int numRows = Maxtrix.GetLength(0);
int numColumns = Maxtrix.GetLength(1);
#region Use GPU
// Initialize the CUDA context
CudaContext ctx = new CudaContext();
// Allocate GPU memory for the data
CudaDeviceVariable<float> dataMatrixUmDev = new CudaDeviceVariable<float>(Maxtrix.Length);
dataMatrixUmDev.CopyToDevice(Maxtrix); // Copy data to the GPU
//Module loading from precompiled .ptx in a project output folder
CUmodule cumodule = ctx.LoadModule(@"CUDAFunctions\sumdsq.ptx");
CudaKernel sumdsqKernel = new CudaKernel("Sumdsq_Kernel", cumodule, ctx);
#region Setup Blocks
//**********************************************************************************************************
//URGENT:Need to figure out how to set the BlockDimensions and GridDimensions based on the size of the array
//**********************************************************************************************************
int maxThreadsPerBlock = CudaContext.GetDeviceInfo(0).MaxThreadsPerBlock; // Result = 1024
var maxBlockDim = CudaContext.GetDeviceInfo(0).MaxBlockDim; //Result = 1024:64:64
int blockSizeX = 16;
int blockSizeY = 16;
int gridDimX = (numRows + blockSizeX - 1) / blockSizeX;
int gridDimY = (numColumns + blockSizeY - 1) / blockSizeY;
sumdsqKernel.BlockDimensions = new dim3(gridDimX, gridDimY);
sumdsqKernel.GridDimensions = new dim3(blockSizeX, blockSizeY);
#endregion
CudaDeviceVariable<float> sumdsqResultDev = new CudaDeviceVariable<float>(1);
sumdsqKernel.Run(dataMatrixUmDev.DevicePointer, numRows, numColumns, DataMean, sumdsqResultDev.DevicePointer);
// Copy the result back to the host
float result = 0f;
sumdsqResultDev.CopyToHost(ref result);
dataMatrixUmDev.Dispose();
sumdsqResultDev.Dispose();
ctx.Dispose();
return result;
#endregion
}
Here is my Kernel:
extern "C" __global__ void Sumdsq_Kernel(float* dataMatrixUm, int numRows, int numCols, float dataMeanUm, float* sumdsq)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i < numRows && j < numCols)
{
float val = dataMatrixUm[i * numCols + j];
if (!isnan(val))
{
atomicAdd(sumdsq, powf(val - dataMeanUm, 2));
}
}
}
The input Matrix dimensions can vary between 100X100 and 3000X3000. I am struggling on how to define the GridDimensions and BlockDimensions on CudaKernel. I am sure I am doing something wrong because depending on the size of the array I send in it will either work or fail. I have tried multiple ways to configure the Grid and Block but I just do not understand how they work
Any suggestions? And thank you for any help!
Metadata
Metadata
Assignees
Labels
No labels