OpenCL Disabled on Android

>> Saturday, August 10, 2013

Alas, Rahul Garg's assessment of OpenCL on Android is accurate. As of version 4.3, Android devices do not allow OpenCL kernels to be compiled. I've verified this on the Nexus 10, but I can't speak for other devices.

Google has tolerated OpenCL in the past, but as of version 4.3, they've put their foot down. For high-performance computing on Android devices, Renderscript Compute is now the only option.

ETA: Vincent Hindriksen has an excellent write-up on the topic here. I agree wholeheartedly.


Article on Missing Libraries

I submitted this post without fully examining OpenCL on the Android. Please see the next post.

I have a high opinion of Rahul Garg, but his article conflicts with my experience. I just installed the 4.3 factory image for the Nexus 10, and using adb, I verified that was still in the system/vendor/lib/egl directory. When I launched my OpenCL testing app, it told me the GPU supports OpenCL 1.1. For the Nexus 10 at least, it appears that OpenCL development is still available.

I don't own a Nexus 7, so I can't answer as to whether its libraries are still available or not.



>> Saturday, July 27, 2013

I received an e-mail pointing out that the links to my old articles are broken. In response, I've transplanted three articles to

Outside of codeproject, here are two OpenCL articles of note:


OpenCL 2.0

>> Monday, July 22, 2013

Hooray! The OpenCL 2.0 provisional specification has been released and it can be downloaded here (PDF). Many incredible changes, including pipes, dynamic paralellism, shared virtual memory, improved atomic functions, and read/write image objects (it's about time). But the best part is the Android Installable Client Driver Extension, which "enables OpenCL implementations to be discovered and loaded as a shared object on Android systems." Anything that simplifies OpenCL on Android will be very welcome.

I'm particularly interested in version 2.0 of the SPIR (Standard Portable Intermediate Representation). I've read about it but I've never seen it used in practice. Now it appears that Clang can generate SPIR. I'll have to look into this.



>> Friday, June 28, 2013

I apologize for not writing sooner. This may sound odd, but I was writing a paper on differential equations. I'd hoped to submit it to, but I don't know any professional mathematicians to serve as sponsors. So I submitted the paper to the Journal of Differential Equations.

The journal decided the paper wasn't worthy, but I still think it's interesting. So here it is.

I'm about to start a new job, so my updates may be even less frequent than they've been in the past. But I'll do my best to keep up on things.


A Sense of hUMA

>> Saturday, May 4, 2013

In OpenCL, transferring data between the CPU and GPU can take a significant amount of time. This is a problem for all GPGPU applications, and to solve it, AMD has developed the Heterogeneous Uniform Memory Access, or hUMA. Ars Technica has a great article on the subject here.

It looks wonderful to me, particularly the cache coherence. With hUMA, CPU-GPU data transfer will only consist of sending pointers back and forth. And if this will be available for the upcoming PS4, that will be even better.


WebCL News

>> Tuesday, April 23, 2013

Here are two items of interest related to WebCL, the browser-based implementation of OpenCL:

  1. I wrote an article that explains how to execute WebCL kernels from Firefox.
  2. The Khronos Group has a Request for Quote (PDF) for a tool capable of validating WebCL kernels. The goal is to prevent bad kernels from crashing the user's computer.
I considered submitting a quote, but I just don't have the time. And I'd imagine validating a kernel is a tricky task.


Ten Reasons Why Android Should Support OpenCL

>> Wednesday, April 3, 2013

In an earlier post, I explained why a Google engineer's grievance with OpenCL was mistaken. Now here are ten reasons why Google should make OpenCL the high-performance language of choice on Android:

10. GPU vendors are the driving force behind OpenCL and their tools provide stable, high-speed kernel execution. When GPU technology improves, OpenCL improves immediately.

9. If Android becomes a desktop OS, it should be able to access desktop GPUs and mobile GPUs. OpenCL has broad support on both platforms.

8. Aparapi makes it easy to launch OpenCL kernels from Java. It's open-source and GSS Mahadevan has successfully used it on Android.

7. In addition to CPUs and GPUs, OpenCL kernels can be executed on DSPs and FPGAs. Future high-performance devices will be more likely to support OpenCL than any other language.

6. When OpenCL devices are added to a context, they can work together to execute kernels. With OpenCL, embedded devices have the potential of accessing more powerful systems to crunch data.

5. One of OpenCL's chief advantages is OpenCL-OpenGL interoperability, which allows OpenCL kernels to process OpenGL buffer data before rendering starts. It would take a lot of work to add a similar capability for Renderscript.

4. Currently, native Android developers need to learn x86 and ARM/NEON instructions to ensure high-performance. Intel and ARM are both strong supporters of OpenCL, so if Android adopted OpenCL, native developers would only have to learn one language.

3. Google has put aside O3D in favor of WebGL and the Renderscript graphics engine in favor of OpenGL. If history is any guide, Google will choose OpenCL over Renderscript computation.

2. The general-purpose GPU (GPGPU) community is small and fragmented. It's unlikely that a new, OS-specific language will attract a developer base large enough to justify its existence.

1. If iOS supports OpenCL and Android doesn't, GPU-accelerated apps will run faster on iOS. High-performance mobile computing isn't a big deal yet, but there's no telling what the future may bring.


OpenCL Image Filtering on the Nexus 10

>> Wednesday, March 27, 2013

I've coded a practical OpenCL example for the Nexus 10 and the code is on Github. The app works in six stages:

  1. The user clicks a button to take a picture.
  2. The Activity receives image data from the camera, places it in a ByteBuffer, and passes it to the native function.
  3. The native function sends the buffer data to the GPU, which executes the OpenCL kernel.
  4. The kernel performs 3x3 spatial convolution and places the result in an output buffer.
  5. The native function reads the GPU's result and places the data in a ByteBuffer.
  6. The Activity creates a bitmap from the ByteBuffer containing the filtered image.
I tried using OpenCL's image processing routines, but the Mali GPU doesn't support the CL_ARGB format. I tried using two-dimensional work-items, but that gave me an odd error. So here's the kernel:
__kernel void camera_filter(__global int *input_img,
                            __global int *out_img,
                            int width, int height) {

   int offset, red, green, blue;
   int row = get_global_id(0)/width;
   int col = get_global_id(0) - row * width;
   int4 pixels[3], color_vec;

   /* Set filter coefficients */
   int4 k0 = (int4)(-1, -1,  0,  0);
   int4 k1 = (int4)(-1,  0,  1,  0);
   int4 k2 = (int4)( 0,  1,  1,  0);

   /* Set filter denominator */
   int denom = 1;

   if((row > 0) && (col > 0) && (col < width-2) && (row < height-1)) {

      /* Read input image data into private memory */
      offset = (row-1) * width + (col-1);
      pixels[0] = vload4(0, input_img + offset);
      pixels[1] = vload4(0, input_img + offset + width);
      pixels[2] = vload4(0, input_img + offset + 2*width);

      /* Compute red component */
      color_vec = ((pixels[0] & 0x00ff0000) >> 16) * k0 +
                  ((pixels[1] & 0x00ff0000) >> 16) * k1 +
                  ((pixels[2] & 0x00ff0000) >> 16) * k2;
      red = clamp((color_vec.s0 + color_vec.s1 + color_vec.s2)/denom, 0, 255);

      /* Compute green component */
      color_vec = ((pixels[0] & 0x0000ff00) >> 8) * k0 +
                  ((pixels[1] & 0x0000ff00) >> 8) * k1 +
                  ((pixels[2] & 0x0000ff00) >> 8) * k2;
      green = clamp((color_vec.s0 + color_vec.s1 + color_vec.s2)/denom, 0, 255);

      /* Compute blue component */
      color_vec = (pixels[0] & 0x000000ff) * k0 +
                  (pixels[1] & 0x000000ff) * k1 +
                  (pixels[2] & 0x000000ff) * k2;
      blue = clamp((color_vec.s0 + color_vec.s1 + color_vec.s2)/denom, 0, 255);

      /* Update output pixel in global memory */
      out_img[get_global_id(0)] = 0xff000000 + (red << 16) + (green << 8) + blue;
   else {
      out_img[get_global_id(0)] = input_img[get_global_id(0)];

The filter's operation is determined by the coefficients in the three int4 vectors. In this case, the filter gives a 3D shadow to the image. The coefficients are defined in the kernel at the moment, but ideally, they'd be set by the user at runtime. I'll see what I can do.

On a scale from 1 to 10, I'd give this project's difficulty a solid 10. The problem isn't the OpenCL, but interfacing Java and OpenCL through the JNI. It would be nice if Android had something like Aparapi, which makes it possible to directly execute OpenCL kernels from Java.


Google and OpenCL

>> Wednesday, March 20, 2013

As of Android 4.1, the Renderscript graphics engine has been deprecated in favor of OpenGL. Since then, I've hoped Renderscript computation would be deprecated in favor of OpenCL. But this message from a Google engineer explains why this hasn't happened.

His concern with OpenCL/CUDA is that "it gets peak performance at the expense of performance portability." He goes on to say that an application targeting one device won't execute optimally on another due to differences in work-group size, shared memory availability, and so on.

With proper coding, this isn't an issue. clGetDeviceInfo and clGetKernelWorkGroupInfo examine the device at runtime and return information abouts its work-group size, shared memory availability, preferred vector widths, and the sizes of its memory buffers. With these functions, applications can tailor their operation to take full advantage of the target device, no matter what its characteristics are.

For example, the following code identifies how many work-items should be generated per group to execute a given kernel on a given device:

size_t group_size;
clGetKernelWorkGroupInfo(kernel, device, CL_KERNEL_WORK_GROUP_SIZE,
                         sizeof(group_size), &group_size, NULL);

If Google is going to stick with Renderscript for high performance, I'll grit my teeth and learn it. But I wish they'd support other options. I also wish they'd join the Khronos Group.



>> Monday, March 11, 2013

I'm working on an app that reads image data from the Android's camera and processes it with OpenCL. In the meantime, I thought I'd mention three upcoming conferences:

  1. AMD Developer Summit (11/11/13 in San Jose) - a conference devoted to OpenCL and AMD technology. Presentation topics can be submitted here and the deadline is March 15.
  2. LEAP Conference (5/21/13 in London) - LEAP stands for Low Energy Application Parallelism, and the conference focuses on high-performance computing with low-power hardware. The call for papers is here and the deadline is March 15.
  3. GTC (3/18/13 in San Jose) - Nvidia calls it the GPU Technology Conference, but it's really just for Nvidia GPUs.


OpenCL Device Test for Android (Nexus 10)

>> Sunday, March 3, 2013

I've received a few e-mails asking about the OpenCL-Android example. In response, I've taken two steps:

  • I've uploaded a free app called OpenCL Device Test to Google Play. It checks for compliant devices and if it finds one, it lists the device's processing characteristics.
  • I've uploaded the source code for the OpenCL Device Test to Github. This contains the entire SDK project, including the Java and JNI code.
The first task was a lot harder than I'd expected. To deploy an app on Google Play, you have to manually exclude every device that shouldn't be listed as a target. In this case, I excluded every device except the Nexus 10. This is because the app requires, which isn't available on other platforms except the Nexus 4 (which I didn't test).

I just checked Google Play, and the app isn't available yet. This is my first time trying this, so I hope you'll bear with me.


OpenCL on the Nexus 10, A Simple Example

>> Saturday, February 23, 2013

It's taken a while, but I finished coding an Android app that uses OpenCL to access the Mali GPU on the Nexus 10. It's not exciting, but here's the code for the top-level Activity:

public class TestNdkActivity extends Activity {

  static {

  private native int getNumDevices();

  public void onCreate(Bundle b) {
    TextView tv = (TextView)findViewById(;
    tv.setText("Number of connected devices: " + getNumDevices());

To execute, the application needs two shared libraries: and The first can be found in the /system/vendor/lib/egl directory on the Nexus 10. The second is compiled by ndk-build using the following makefile (
LOCAL_PATH := $(call my-dir)

include $(CLEAR_VARS)

LOCAL_LDLIBS      := $(LOCAL_PATH)/../external/
LOCAL_SRC_FILES   := test_ndk.c
LOCAL_ARM_MODE    := arm


To get the compilation to work, I put a copy of in the project's 'external' directory. I also copied the $MALI_SDK/include/CL folder into the project's 'include' directory. There must be a better way to do this.

Here's my simple JNI code (test_ndk.c). It finds the first OpenCL platform and returns the number of available devices.
#include <CL/cl.h>
#include "test_ndk.h"

JNIEXPORT jint JNICALL Java_com_testndk_TestNdkActivity_getNumDevices
  (JNIEnv *env, jobject obj) {

    cl_platform_id platform;
    cl_device_id device;
    cl_uint num_devices;

    clGetPlatformIDs(1, &platform, NULL);
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, NULL, &num_devices);
    return num_devices;

When I run the app, the TextView tells me that one device is connected. I'll try to get a more interesting example working in the next few days. For now, I'm content.



>> Sunday, February 17, 2013

I just got back from a training class, so I haven't had time to try out OpenCL on the Nexus 10. But I want to thank everyone who sent me this link, which provides more information. I'll get started tomorrow and I'll post my findings within the week.

On an unrelated note, my favorite book on the history of technology is called Accidental Empires. The author, Robert Cringely, has freely released the first few chapters on his site, and he may release the entire book.


ARM and OpenCL, Part 2

>> Thursday, February 7, 2013

I've just installed the Mali OpenCL SDK, which makes it possible to execute OpenCL kernels on the Mali GPU (the GPU inside the Nexus 10). I can't wait to get started.

Thank you, ARM!



>> Tuesday, February 5, 2013

I apologize for being distant but I'm still coming up to speed on Android/ARM programming. Not much has happened lately, but here are some points of interest:

  • Amdahl Software has released their CodeBench tools. Free trials are available.
  • This fascinating blog entry discusses Renderscript development from the perspective of an OpenCL/CUDA developer.
  • This blog entry discusses why we should use WebGL, a topic I've wondered about for some time.


  © Blogger template Werd by 2009

Back to TOP