>> Friday, October 3, 2014
Recently, someone asked me about the difference between the two methods of allocating local memory. That is, why would you use one method over the other? This isn't an easy question, and before I answer it here, I'd like to explain what the two methods are.
The first allocation method is performed inside the kernel. If a variable is declared in the kernel function, its declaration must identify which memory region contains its data. If the declaration is preceded by __local, the variable will be allocated from local memory. For example, the following declaration sets aside 1024 bytes of local memory to be accessed as an array named foo.
__local float foo;
The second method is performed in host code. clSetKernelArg assigns data to be passed into the kernel function as an argument. The last parameter of clSetKernelArg points to the argument's data. If the last argument points to a memory object (cl_mem), the kernel will access the data from global/constant memory. If the argument points to a primitive like an int, the kernel will access the data from private memory. But if the last argument is NULL, no data will be passed to the kernel. In this case, the purpose of clSetKernelArg is to allocate local memory for the kernel's use.
An example will help make this clear. Consider the following calls to clSetKernelArg:
clSetKernelArg(kernel, 0, sizeof(cl_mem), &example_buffer);
clSetKernelArg(kernel, 1, sizeof(int), &example_int);
clSetKernelArg(kernel, 2, 256 * sizeof(float), NULL);
These lines define the first three arguments of the kernel. If the kernel function's name is foo, the following code shows how foo's arguments might be accessed as function parameters:
__kernel void foo(__global float4* ex_buffer,
__local float* local_var)
In the third call to clSetKernelArg, the data pointer is set to NULL. This tells the kernel that its third argument can be allocated from global memory or from local memory. This data is uninitialized—the kernel will read/write to the memory as it performs its computation.
So, the two methods of allocating local memory are declaring a local variable in a kernel and calling clSetKernelArg with a NULL data pointer. When do you use one over the other? The answer is straightforward.
If the size of the local memory is constant, use the first method because it's simpler. But if the kernel must be executed multiple times with different sizes of local memory, use the second method. This is because the second method makes it possible to control the local memory size with a variable.