Issues setting GridDimensions and BlockDimensions on CudaKernel for large 2D array

I am very new to ManagedCUDA.  I am trying to the sum of the squares on a large 2D float array.

Here is the C# method:

```
       public static float CalculateSumDSQ(float[,] Maxtrix, float DataMean)
        {
            int numRows = Maxtrix.GetLength(0);
            int numColumns = Maxtrix.GetLength(1);

            #region Use GPU

            // Initialize the CUDA context
            CudaContext ctx = new CudaContext();

            // Allocate GPU memory for the data
            CudaDeviceVariable<float> dataMatrixUmDev = new CudaDeviceVariable<float>(Maxtrix.Length);
            dataMatrixUmDev.CopyToDevice(Maxtrix);  // Copy data to the GPU

            //Module loading from precompiled .ptx in a project output folder
            CUmodule cumodule = ctx.LoadModule(@"CUDAFunctions\sumdsq.ptx");

            CudaKernel sumdsqKernel = new CudaKernel("Sumdsq_Kernel", cumodule, ctx);

            #region Setup Blocks

            //**********************************************************************************************************
            //URGENT:Need to figure out how to set the BlockDimensions and GridDimensions based on the size of the array
            //**********************************************************************************************************

            int maxThreadsPerBlock = CudaContext.GetDeviceInfo(0).MaxThreadsPerBlock; // Result = 1024
            var maxBlockDim = CudaContext.GetDeviceInfo(0).MaxBlockDim; //Result = 1024:64:64

            int blockSizeX = 16;
            int blockSizeY = 16;
            int gridDimX = (numRows + blockSizeX - 1) / blockSizeX;
            int gridDimY = (numColumns + blockSizeY - 1) / blockSizeY;

            sumdsqKernel.BlockDimensions = new dim3(gridDimX, gridDimY);
            sumdsqKernel.GridDimensions = new dim3(blockSizeX, blockSizeY);

            #endregion

            CudaDeviceVariable<float> sumdsqResultDev = new CudaDeviceVariable<float>(1);

            sumdsqKernel.Run(dataMatrixUmDev.DevicePointer, numRows, numColumns, DataMean, sumdsqResultDev.DevicePointer);

            // Copy the result back to the host
            float result = 0f;
            sumdsqResultDev.CopyToHost(ref result);

            dataMatrixUmDev.Dispose();
            sumdsqResultDev.Dispose();
            ctx.Dispose();

            return result;
            #endregion
        }

```

Here is my Kernel:
```
extern "C" __global__ void Sumdsq_Kernel(float* dataMatrixUm, int numRows, int numCols, float dataMeanUm, float* sumdsq)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int j = blockIdx.y * blockDim.y + threadIdx.y;

    if (i < numRows && j < numCols)
    {
        float val = dataMatrixUm[i * numCols + j];
        
        if (!isnan(val))
        {
            atomicAdd(sumdsq, powf(val - dataMeanUm, 2));
        }
    }
}
```

The input Matrix dimensions can vary between 100X100 and 3000X3000.  I am struggling on how to define the GridDimensions and BlockDimensions on CudaKernel.  I am sure I am doing something wrong because depending on the size of the array I send in it will either work or fail.  I have tried multiple ways to configure the Grid and Block but I just do not understand how they work 

Any suggestions?  And thank you for any help!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues setting GridDimensions and BlockDimensions on CudaKernel for large 2D array #117

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issues setting GridDimensions and BlockDimensions on CudaKernel for large 2D array #117

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions