Is Your Local Memory Really Local?

>> Friday, March 4, 2011

CUDA makes a clear distinction between shared memory, which is SRAM located on the GPU, and local memory, which is located in DRAM, off the GPU. Both memory types are specific to a given processing unit, but shared memory has much less latency than local memory or global memory. For this reason, CUDA routines generally copy input data from global memory to shared memory, process the data in shared memory, and write the output to global memory.

OpenCL, on the other hand, doesn't make any distinction between shared memory and local memory. Both types are referred to as local. So here's the question: how do you know if the local memory you're working with is high-speed memory on the GPU or low-speed memory off the GPU?

It turns out that the clGetDeviceInfo function has a field called CL_DEVICE_LOCAL_MEM_TYPE, which can be either CL_LOCAL or CL_GLOBAL. If the type is CL_GLOBAL, then there's no point copying data from global memory to local memory because both memory types are essentially global. But if the type is CL_LOCAL, then the memory is close to the processing unit and it's a good idea to store frequently-accessed data there.

Kind of a nuisance, isn't it? It seems like the only way to ensure high-performance is to check the local memory type of a device and send it a different kernel depending on whether it's CL_LOCAL or CL_GLOBAL.


sadhana,  February 9, 2015 at 3:14 AM  

Hi, I am using ATI Mobility Radeon 4650. I got CL_GLOBAL for the query. Does that mean my global/constant memory is equal to the performance of local memories and there is no point in optimizing by transferring data into local memory?

Matt Scarpino February 9, 2015 at 7:56 PM  

Yes. If the memory type is CL_GLOBAL, it isn't dedicated local memory. You can still access it like regular local memory, but it's physically located in global memory, so you won't get increased performance.

Post a Comment

  © Blogger template Werd by 2009

Back to TOP