Eureka

>> Wednesday, April 27, 2011

NOTE: Though theoretically sound, this mutex code fails to execute on some systems. In particular, it fails to execute on my Linux/AMD test platform. Be warned.

I've spent some time coding a global memory barrier using OpenCL's atomic functions, and I think I've succeeded. This is different than barrier(CLK_GLOBAL_MEM_FENCE), which only synchronizes work-items within a work-group. In my tests, the code synchronizes work-items across different work-groups. Here's the code:

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable

#define LOCK(a) atom_cmpxchg(a, 0, 1)
#define UNLOCK(a) atom_xchg(a, 0)

__kernel void mutex_test(__global int *count, __global int *mutex) {

   if(get_local_id(0) == 0) {

      /* Increment the count */
      while(LOCK(mutex));
      *count += 1;
      UNLOCK(mutex);

      /* Wait for everyone else to increment the count */
      int waiting = 1;
      while(waiting) {
         while(LOCK(mutex));
         if(*count == get_num_groups(0)) {
            waiting = 0;
         }
         UNLOCK(mutex);
      }
   }
}

In this code, LOCK atomically reads the value at mutex and checks to see if it's unlocked (0). If it's unlocked, LOCK sets the mutex value to its locked value (1), and then accesses the critical section. Then, UNLOCK sets the mutex value to its unlocked value (0).

The barrier works in two stages. First, Work-Item 0 in each work-group atomically increments the count variable. Then it waits to see if every other Work-Item 0 has done the same. Once count equals the result of get_num_groups(), each work-item has passed the barrier.

It's late, so I'll post this on the Khronos OpenCL forum tomorrow. I'll be interested to see if this works on other systems. If so, I think this will be a big deal. It means that it's no longer necessary to restart new kernels in order to synchronize global memory across work-groups.

0 comments:

Post a Comment

  © Blogger template Werd by Ourblogtemplates.com 2009

Back to TOP