OpenCL Image Filtering on the Nexus 10

>> Wednesday, March 27, 2013

I've coded a practical OpenCL example for the Nexus 10 and the code is on Github. The app works in six stages:

  1. The user clicks a button to take a picture.
  2. The Activity receives image data from the camera, places it in a ByteBuffer, and passes it to the native function.
  3. The native function sends the buffer data to the GPU, which executes the OpenCL kernel.
  4. The kernel performs 3x3 spatial convolution and places the result in an output buffer.
  5. The native function reads the GPU's result and places the data in a ByteBuffer.
  6. The Activity creates a bitmap from the ByteBuffer containing the filtered image.
I tried using OpenCL's image processing routines, but the Mali GPU doesn't support the CL_ARGB format. I tried using two-dimensional work-items, but that gave me an odd error. So here's the kernel:
__kernel void camera_filter(__global int *input_img,
                            __global int *out_img,
                            int width, int height) {

   int offset, red, green, blue;
   int row = get_global_id(0)/width;
   int col = get_global_id(0) - row * width;
   int4 pixels[3], color_vec;

   /* Set filter coefficients */
   int4 k0 = (int4)(-1, -1,  0,  0);
   int4 k1 = (int4)(-1,  0,  1,  0);
   int4 k2 = (int4)( 0,  1,  1,  0);

   /* Set filter denominator */
   int denom = 1;

   if((row > 0) && (col > 0) && (col < width-2) && (row < height-1)) {

      /* Read input image data into private memory */
      offset = (row-1) * width + (col-1);
      pixels[0] = vload4(0, input_img + offset);
      pixels[1] = vload4(0, input_img + offset + width);
      pixels[2] = vload4(0, input_img + offset + 2*width);

      /* Compute red component */
      color_vec = ((pixels[0] & 0x00ff0000) >> 16) * k0 +
                  ((pixels[1] & 0x00ff0000) >> 16) * k1 +
                  ((pixels[2] & 0x00ff0000) >> 16) * k2;
      red = clamp((color_vec.s0 + color_vec.s1 + color_vec.s2)/denom, 0, 255);

      /* Compute green component */
      color_vec = ((pixels[0] & 0x0000ff00) >> 8) * k0 +
                  ((pixels[1] & 0x0000ff00) >> 8) * k1 +
                  ((pixels[2] & 0x0000ff00) >> 8) * k2;
      green = clamp((color_vec.s0 + color_vec.s1 + color_vec.s2)/denom, 0, 255);

      /* Compute blue component */
      color_vec = (pixels[0] & 0x000000ff) * k0 +
                  (pixels[1] & 0x000000ff) * k1 +
                  (pixels[2] & 0x000000ff) * k2;
      blue = clamp((color_vec.s0 + color_vec.s1 + color_vec.s2)/denom, 0, 255);

      /* Update output pixel in global memory */
      out_img[get_global_id(0)] = 0xff000000 + (red << 16) + (green << 8) + blue;
   else {
      out_img[get_global_id(0)] = input_img[get_global_id(0)];

The filter's operation is determined by the coefficients in the three int4 vectors. In this case, the filter gives a 3D shadow to the image. The coefficients are defined in the kernel at the moment, but ideally, they'd be set by the user at runtime. I'll see what I can do.

On a scale from 1 to 10, I'd give this project's difficulty a solid 10. The problem isn't the OpenCL, but interfacing Java and OpenCL through the JNI. It would be nice if Android had something like Aparapi, which makes it possible to directly execute OpenCL kernels from Java.


Google and OpenCL

>> Wednesday, March 20, 2013

As of Android 4.1, the Renderscript graphics engine has been deprecated in favor of OpenGL. Since then, I've hoped Renderscript computation would be deprecated in favor of OpenCL. But this message from a Google engineer explains why this hasn't happened.

His concern with OpenCL/CUDA is that "it gets peak performance at the expense of performance portability." He goes on to say that an application targeting one device won't execute optimally on another due to differences in work-group size, shared memory availability, and so on.

With proper coding, this isn't an issue. clGetDeviceInfo and clGetKernelWorkGroupInfo examine the device at runtime and return information abouts its work-group size, shared memory availability, preferred vector widths, and the sizes of its memory buffers. With these functions, applications can tailor their operation to take full advantage of the target device, no matter what its characteristics are.

For example, the following code identifies how many work-items should be generated per group to execute a given kernel on a given device:

size_t group_size;
clGetKernelWorkGroupInfo(kernel, device, CL_KERNEL_WORK_GROUP_SIZE,
                         sizeof(group_size), &group_size, NULL);

If Google is going to stick with Renderscript for high performance, I'll grit my teeth and learn it. But I wish they'd support other options. I also wish they'd join the Khronos Group.



>> Monday, March 11, 2013

I'm working on an app that reads image data from the Android's camera and processes it with OpenCL. In the meantime, I thought I'd mention three upcoming conferences:

  1. AMD Developer Summit (11/11/13 in San Jose) - a conference devoted to OpenCL and AMD technology. Presentation topics can be submitted here and the deadline is March 15.
  2. LEAP Conference (5/21/13 in London) - LEAP stands for Low Energy Application Parallelism, and the conference focuses on high-performance computing with low-power hardware. The call for papers is here and the deadline is March 15.
  3. GTC (3/18/13 in San Jose) - Nvidia calls it the GPU Technology Conference, but it's really just for Nvidia GPUs.


OpenCL Device Test for Android (Nexus 10)

>> Sunday, March 3, 2013

I've received a few e-mails asking about the OpenCL-Android example. In response, I've taken two steps:

  • I've uploaded a free app called OpenCL Device Test to Google Play. It checks for compliant devices and if it finds one, it lists the device's processing characteristics.
  • I've uploaded the source code for the OpenCL Device Test to Github. This contains the entire SDK project, including the Java and JNI code.
The first task was a lot harder than I'd expected. To deploy an app on Google Play, you have to manually exclude every device that shouldn't be listed as a target. In this case, I excluded every device except the Nexus 10. This is because the app requires, which isn't available on other platforms except the Nexus 4 (which I didn't test).

I just checked Google Play, and the app isn't available yet. This is my first time trying this, so I hope you'll bear with me.


  © Blogger template Werd by 2009

Back to TOP