Skip to content

Issues setting GridDimensions and BlockDimensions on CudaKernel for large 2D array #117

@jdanielpa

Description

@jdanielpa

I am very new to ManagedCUDA. I am trying to the sum of the squares on a large 2D float array.

Here is the C# method:

       public static float CalculateSumDSQ(float[,] Maxtrix, float DataMean)
        {
            int numRows = Maxtrix.GetLength(0);
            int numColumns = Maxtrix.GetLength(1);

            #region Use GPU

            // Initialize the CUDA context
            CudaContext ctx = new CudaContext();

            // Allocate GPU memory for the data
            CudaDeviceVariable<float> dataMatrixUmDev = new CudaDeviceVariable<float>(Maxtrix.Length);
            dataMatrixUmDev.CopyToDevice(Maxtrix);  // Copy data to the GPU

            //Module loading from precompiled .ptx in a project output folder
            CUmodule cumodule = ctx.LoadModule(@"CUDAFunctions\sumdsq.ptx");

            CudaKernel sumdsqKernel = new CudaKernel("Sumdsq_Kernel", cumodule, ctx);

            #region Setup Blocks

            //**********************************************************************************************************
            //URGENT:Need to figure out how to set the BlockDimensions and GridDimensions based on the size of the array
            //**********************************************************************************************************

            int maxThreadsPerBlock = CudaContext.GetDeviceInfo(0).MaxThreadsPerBlock; // Result = 1024
            var maxBlockDim = CudaContext.GetDeviceInfo(0).MaxBlockDim; //Result = 1024:64:64

            int blockSizeX = 16;
            int blockSizeY = 16;
            int gridDimX = (numRows + blockSizeX - 1) / blockSizeX;
            int gridDimY = (numColumns + blockSizeY - 1) / blockSizeY;

            sumdsqKernel.BlockDimensions = new dim3(gridDimX, gridDimY);
            sumdsqKernel.GridDimensions = new dim3(blockSizeX, blockSizeY);

            #endregion

            CudaDeviceVariable<float> sumdsqResultDev = new CudaDeviceVariable<float>(1);

            sumdsqKernel.Run(dataMatrixUmDev.DevicePointer, numRows, numColumns, DataMean, sumdsqResultDev.DevicePointer);

            // Copy the result back to the host
            float result = 0f;
            sumdsqResultDev.CopyToHost(ref result);

            dataMatrixUmDev.Dispose();
            sumdsqResultDev.Dispose();
            ctx.Dispose();

            return result;
            #endregion
        }

Here is my Kernel:

extern "C" __global__ void Sumdsq_Kernel(float* dataMatrixUm, int numRows, int numCols, float dataMeanUm, float* sumdsq)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int j = blockIdx.y * blockDim.y + threadIdx.y;

    if (i < numRows && j < numCols)
    {
        float val = dataMatrixUm[i * numCols + j];
        
        if (!isnan(val))
        {
            atomicAdd(sumdsq, powf(val - dataMeanUm, 2));
        }
    }
}

The input Matrix dimensions can vary between 100X100 and 3000X3000. I am struggling on how to define the GridDimensions and BlockDimensions on CudaKernel. I am sure I am doing something wrong because depending on the size of the array I send in it will either work or fail. I have tried multiple ways to configure the Grid and Block but I just do not understand how they work

Any suggestions? And thank you for any help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions