>> Sunday, September 25, 2011
In my opinion, the most important OpenCL function to know is clEnqueueNDRangeKernel, which enqueues a kernel-execution command. Once you understand how its parameters affect the kernel's execution, the rest of OpenCL is fairly straightforward.
The second most important function, in my opinion, is clSetKernelArg. This configures the kernel's arguments so it has data to process while it executes. Its signature is given as follows:
cl_int clSetKernelArg(cl_kernel kernel, cl_uint arg_index,
size_t arg_size, const void *arg_value)
When I first encountered it, I thought this function was simple. But in addition to specifying the data for a kernel argument, this function also configures which address space (global/constant, local, or private) the data will be stored in. There are three main rules:
- If arg_value points to a memory object, the argument's data will be stored in global or constant memory, depending on the modifier used by the kernel function.
- If arg_value points to something that isn't a memory object, the argument's data will be stored in private memory.
- If arg_value is NULL, the argument's data won't be initialized, but arg_size bytes will be allocated from local memory to store the data.
But there's a problem. Unless the total number of work-items is a multiple of the maximum number of work-items per work-group, I have no way of knowing how many work-items will be assigned to each group. I can set the local_size parameter of clEnqueueNDRangeKernel, but sometimes I don't even know the total number of work-items I'll need. This happens frequently in OpenCL-OpenGL interoperability, in which I don't know in advance how many triangles each vertex buffer object will contain.
The solution isn't too hard: allocate all the local memory the device can provide. To find out how much local memory is available, call clGetDeviceInfo with CL_DEVICE_LOCAL_MEM_SIZE. The following code shows how this works:
sizeof(local_mem_size), &local_mem_size, NULL);
Remember that the data returned by the function is a cl_ulong, not a size_t. I made this mistake repeatedly, and though it worked fine on my Linux system, my Windows development system didn't like it at all.