ARM and OpenCL

>> Sunday, December 2, 2012

On the whole, I'm happy with the Nexus 10. The Android SDK is easy to use and I've been climbing steadily up the learning curve. I've coded some basic and intermediate applications and I'm currently writing code to access Near Field Communication (NFC).

But in one respect, I'm deeply disappointed. The tablet's GPU is OpenCL-compliant but I can't use it to execute OpenCL kernels. ARM has stated that unless Samsung purchases the drivers, they will remain unavailable.

So an idea struck me: why don't I code an OpenCL driver myself? It's just a matter of translating OpenCL routines into ARM-Mali functions. How hard could it be? To find out, I've downloaded an evaluation copy of ARM's DS-5 development tools.

Of course, I'll also need a run-time compiler. Thankfully, the folks at have released oclTools, which looks like it will be helpful. Their site is minimalistic and light-hearted, and I was interested in learning more about them. But when I went to the About Us page, the only link leads to Apple, champion of all that is closed-source and litigious. Bizarre.


Interesting Times

>> Tuesday, November 13, 2012

  • I managed to order a Nexus 10 this morning, but I'm concerned. Where is the OpenCL development kit for the Mali T-604? I've searched through ARM's Mali developer site, but none of their tools mention OpenCL. Roberto Mijat gave a presentation on OpenCL programming at the recent ARM TechCon, but there were no announcements related to OpenCL. Strange. Why would ARM go to the trouble of seeking OpenCL-compliance for their GPU if they're not going to let anyone program it?

  • AMD has denied that they're hoping to sell the company, but there's no question that they're in deep financial trouble. It's heartbreaking. I had high hopes for the Fusion, but the performance just wasn't there. Throughout the Fusion Developer Summit, AMD's corporate officers said they were "betting the company" on OpenCL. Perhaps they lost.

  • To pressure gamers to upgrade, Microsoft has stated that their upcoming release of DirectX, version 11.1, will only be available for Windows 8. I hope that this behavior, along with Gabe Newell's efforts, will make game developers choose OpenGL over Direct3D. But Microsoft has always been very persuasive.


OpenCL and the Dot Product

>> Sunday, November 11, 2012

In an earlier post, I whined about OpenCL's lack of atomic functions for floating-point operations. This makes it hard to code a high-performance dot product in OpenCL, but by using vectors and local memory, we can still do pretty well.

I've coded an application that computes the dot product of two vectors with 2^18 floating-point values each. The source files are on github and the kernel looks like this:

__kernel void dot_product(__global float4* a_vec, __global float4* b_vec,                              __global float* output, __local float4* partial_dot) {

   int gid = get_global_id(0);
   int lid = get_local_id(0);
   int group_size = get_local_size(0);

   /* Place product of global values into local memory */
   partial_dot[lid] = a_vec[gid] * b_vec[gid];

   /* Repeatedly add values in local memory */
   for(int i = group_size/2; i>0; i >>= 1) {
      if(lid < i) {
         partial_dot[lid] += partial_dot[lid + i];

   /* Transfer final result to global memory */
   if(lid == 0) {
      output[get_group_id(0)] = dot(partial_dot[0], (float4)(1.0f));

Executing this kernel, the device doesn't compute the entire dot product. Instead, each work group returns a value to the host, and the host computes the final sum. My tests have shown that this runs much faster than a basic multiply-and-add algorithm. Still, I'm sure there's room for improvement.

I've decided to open this blog for comments. If you have any thoughts on this kernel or anything else on this blog, feel free to write.


OpenCL and Android

>> Sunday, October 28, 2012

I can't wait for the Nexus 10 to be released. I'm looking forward to the new Android 4.2 OS, but my main interest is its graphics processor. According to leaked reports, the Nexus 10 contains a Mali-T604 GPU. A few months ago, ARM submitted this device to the Khronos Group to verify its compliance with the OpenCL 1.1 standard. Not just the embedded version of OpenCL, but the full profile.

In other words, this tablet's GPU will be able to execute the same OpenCL kernels as a desktop GPU. Neat, huh? I've never forgiven myself for not having jumped on the Android bandwagon the moment it left the station. But it's not too late...

ETA: One reader has pointed out that a development board is about to be released that contains the Mali-T604 GPU and Samsung's Exynos processor (the CPU of the Nexus 10). Unfortunately, ARM hasn't provided any OpenCL drivers or development tools. It seems odd that they'd go to the trouble of obtaining OpenCL compliance and not release an SDK. Maybe ARM will announce its OpenCL strategy at the ARM TechCon. Hmm.


CLKernels Update

I've updated the CLKernels site, so it shouldn't cause any problems for current versions of Firefox. I'm working on a similar WebCL implementation for Chrome using its Native Client, but it's hard going and my time is limited.


Recent Events

>> Sunday, October 14, 2012

Interesting news:

  • Tom's Hardware has a great article on OpenCL processing on AMD and Nvidia graphics cards. It's thorough in many respects, but it doesn't mention that OpenCL 1.2 is supported by many AMD cards Nvidia barely supports version 1.1. (Thanks to Don for sending me the link.)
  • AMD has released its CodeXL tool for debugging/profiling applications on heterogeneous hardware. The demonstration at the Fusion Developer Summit was very impressive, and as soon as I find the time, I'll use CodeXL to examine my OpenCL-OpenGL applications.
  • Nokia hasn't updated WebCL to support Firefox 16, but as soon as they do, I'll update my CLKernels site. Just for grins, I uploaded a video demonstration to Youtube.


Online Petition

>> Wednesday, September 19, 2012

I just signed an online petition urging Nvidia to provide real support for OpenCL in its upcoming CUDA release. I doubt it will have any effect, as OpenCL and CUDA are in direct competition with one another. But I felt obligated to support the cause all the same.



>> Sunday, August 12, 2012

I've tested my site with a number of OpenCL kernels and it works really well. I'll put up a video demonstration on YouTube as soon as Amazon ships me a pop filter for the audio. Again, I'm grateful to Nokia for providing the WebCL extension for Firefox.

The site is great for executing small-to-intermediate kernels, but there's no way to execute multiple kernels in succession. This means there's no way to synchronize global memory as an application runs. So I can't test multi-stage applications like the bitonic sort or the fast Fourier transform. But so far, all the matrix kernels I've tested work fine.



>> Monday, August 6, 2012

  1. Nokia has updated its WebCL extension to support Firefox 14. If you install this, you can try out my CLKernels site, which makes it possible to configure and execute OpenCL applications in a browser. It still needs work, but I should have everything fixed in about a week.
  2. The Khronos Group has released new specs for OpenGL 4.3 and OpenGL ES 3.0. OpenGL 4.3 now features compute shaders, which can perform general-purpose processing like volume and physics computation. Yes, it looks like these shaders make OpenGL-OpenCL interoperability unnecessary.
  3. The COLLADA graphics format has become an ISO standard. Specifically, COLLADA is now ISO/PAS 17506:2012, which deals with industrial automation systems and integration.
  4. I have Van Morrison's Moondance inexplicably stuck in my head. I hope it never gets out.


User Interfaces and WebCL

>> Sunday, August 5, 2012

Some time ago, I pitched an idea to AMD to design an Eclipse-based graphical user interface for OpenCL. The GUI would allow the user to select target devices, enter kernel code, and graphically configure the kernel's arguments. Given this information, the application would automatically generate the host code. I thought this would greatly simplify OpenCL development, but AMD wasn't interested.

Recently, I've been working with Nokia's WebCL implementation for Firefox, and it occurred to me that I could use this to implement my OpenCL GUI as a web application. This way, anyone can select a device (probably their GPU) and execute/profile kernels without coding the host application. The site is called, and if you visit, you can see the overall method.

Unfortunately, Nokia hasn't updated its WebCL release to support the latest version of Firefox (v. 14), so the web application isn't usable just yet. But I've been told that Nokia will release a new WebCL extension in the next few days.


Windows 8, Valve, and Linux

I've been reading a lot about Windows 8. Gunnar Berger and Peter Bright say that it's great for tablets but disconcerting for mouse-and-keyboard users. Gabe Newell, CEO of Valve, calls it a "catastrophe for everyone in the PC space." This is because Windows 8 comes with the Windows Store, which competes with Steam, Valve's popular online store for PC games. Mr. Newell is understandably concerned that Microsoft may use its home field advantage to handicap non-Microsoft applications or suppress them altogether. After all, that's how Apple does business.

So Mr. Newell has become interested in Linux. He wants Steam's games to run as well on Linux as they do on Windows, and his developers are working hard to improve Linux graphics drivers. I think this is wonderful, but Valve isn't the first company to try Linux gaming. Loki Software, which ported Windows games to Linux, went bankrupt after three years of operation. id Software made multiple attempts to sell games on Linux, but met with failure each time. You can watch John Carmack's discussion here.

When I consider Windows 8's strange new interface and its closed-shop policies, I think this could be a golden opportunity for Linux on the desktop. But there are two important problems:

  1. User interface - neither of the two main environments (GNOME and KDE) are sufficiently polished and intuitive for widespread adoption
  2. Marketing - traditionally, Mac OS is for hipsters, Windows is for corporate types, and Linux is for nerds
These problems can be overcome. For the user interface, it occurs to me that Linux could do well by adopting Microsoft's old interface, Windows 7. There will be patent/copyright hurdles, but it would be hilarious if Linux gave users an environment that was more familiar to them than Microsoft's new environment.

For marketing, I think Linux should emphasize its anti-establishment, pro-individual stance. Tired of police states and corporate tyranny? Try Linux, the free OS for free spirits! Sick of Apple and Microsoft telling you what applications you can and can't install? Use Linux and download whatever you like!

But to get Linux seriously accepted on the desktop, a company needs to expend serious money and development time. Red Hat is the company most associated with Linux, but they're focused on servers, not desktop computers. But if Valve were to put its sizable resources behind Linux on the desktop, wonderful things could happen.


Interesting Interviews

>> Sunday, July 8, 2012

I just watched two videos that have given me a great deal to think about. The first is a twenty-minute interview with Andrew Richards about the future of OpenCL. Andrew is the CEO of Codeplay, which makes C/C++ compiler tools for vector languages like OpenCL. He's given OpenCL a great deal of thought and I was pleasantly surprised by how optimistic he is about its future.

The second video is Steve Jobs: The Lost Interview, which is available through iTunes. I'd always thought Steve Wozniak was the brain behind the Apple PC and Jobs was just the mouth. But I was very mistaken; Steve Jobs was a serious programmer with an extraordinary technical background.

Of course, the best part of the Jobs interview is that it took place after he was let go at Apple and before he was brought back. So it's fascinating to hear his thoughts on where and why Apple went astray.


Technical Sessions in Brief

>> Saturday, June 16, 2012

All of the topics at the AFDS were enlightening, but there were four that I found noteworthy:

  • Code XL is a development tool that can debug and profile applications on CPUs and GPUs. Avi Shapira showed how Code XL tabulates data for every work item executing a kernel. AMD will release the tool in a few weeks, and I can't wait. I've been trying to figure out which routines should be performed by kernels versus shaders, and this will be a big help.

  • Edward Callway explained how the Fusion APU can be integrated in hardware. I was amazed by how small the Fusion-based desktop computers were, and if the Fusion can be used in embedded devices, that will be a wonderful thing.

  • Kenneth Russell, Google's only presenter, discussed the Chrome browser and its reliance on WebGL for graphics. The talk was interesting, but the topic of WebCL was conspicuously absent. It looks like WebCL's capability will remain limited to Firefox and Safari.

  • Amit Mookerjee discussed the AMD Media SDK, which makes it possible to access MPEGs in code. From the sound of it, developers can use the SDK to access MPEGs without worrying about MPEG licensing. I don't see how that's possible, but it's great if it's true.
I'd hoped to attend the presentation on HSA Bolt, a new library of primitive math routines for CPUs and GPUs. Unfortunately, it was scheduled at the same time as my own presentation, which went swimmingly.

Addendum: one reader who attended the HSA Bolt presentation described the library as follows:

"They have two versions: One C++ AMP version, and one OpenCL version, both with essentially the same API. It is essentially a heterogenous replacement for algorithms part of STL. For example it provides a bolt::sort for C++ AMP, or clbolt::sort for OpenCL, which takes regular CPU data strucutres as inputs, relying on HSA based system to avoid data transfer. Also, AMD said while it will work on any OpenCL or AMP compatible device, AMD is providing AMD-optimized ports. However, it will be open-source so theoretically someone can write optimized ports to other chips."


The HSA Foundation

One of the high points of the AMD Fusion Developer Summit was Phil Rogers' keynote address concerning the HSA (Heterogeneous System Architecture) Foundation. AMD, ARM, Texas Instruments, MediaTek, and Imagination Technologies have joined forces to "make it easy to program for parallel computing." A full write-up can be found here.

At first I was surprised that Intel wasn't included. But from what I've gleaned since the announcement, the whole point of the foundation is to dislodge Intel from its dominant position in the computing world. It will be a tough battle, especially considering how well Intel's Ivy Bridge chip is performing.

While I enjoyed Phil Rogers' address, he said something I strongly disagree with. In discussing OpenCL, he said the main problem is OpenCL's difficulty. OpenCL is unquestionably hard, but in my opinion, the main problem is that the programming world still doesn't know what OpenCL is. None of the engineers I work with know about OpenCL, and of the attendees I spoke to at the conference, none had even attempted to code an OpenCL kernel. Programmers aren't giving up on OpenCL because of its difficulty -- they're not even getting their feet wet.

If I was Phil Rogers, I would take (at least) two steps to raise awareness. First, I'd send engineers to major universities to demonstrate the technology. Second, I'd find open-source projects that could be accelerated using OpenCL (such as GNU math, NumPy, and OpenFOAM) and provide free releases based on OpenCL. This will not only impress users with OpenCL's power, but also inspire them to harness the power for themselves.


AMD Fusion Developer Summit 2012

>> Saturday, June 9, 2012

AMD is holding its Fusion Developer Summit in Bellevue, WA next week from Monday through Thursday. The posted agenda doesn't say anything about what subjects will be discussed, but you can read about the session topics here.

I'm interested in the talks related to holography, computer vision, sparse matrices, and "actually building stuff." There's also a talk on AMD's new Media Lab toolset, which should be worth hearing about. Of course, I'm particularly interested in the Trinity APU. I have a laptop with an A8-3500M APU, and though I haven't run any graphics-intensive applications, I'm very happy with it.

My presentation on solid modeling will be on Wednesday at 5:15 in Hyatt: Evergreen G. I'll explain the overall theory and then demonstrate what I've accomplished with OpenCL-accelerated NURBS processing.


Two Tips on OpenCL-OpenGL Interoperability

>> Monday, May 28, 2012

I thought I'd share two points related to coding OpenCL-OpenGL applications:

  1. Combine shared data into as few buffers as possible and combine them into an array or vector.

    It takes time for the CPU to coordinate the GPU's OpenCL/OpenGL processing, so it's good to keep this to a minimum. More specifically, you want to call clEnqueueAcquireGLObjects as few times as possible. This function can synchronize multiple buffers with a single call, but only if they're placed in contiguous memory locations (such as inside an array or a vector).

    For example, the following code creates and synchronizes two distinct buffers:
    GLuint vbos[2];
    cl_mem buff_1, buff_2;
    glGenBuffers(2, vbos);
    buff_1 = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, vbos[0], &err);
    buff_2 = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, vbos[1], &err);
    clEnqueueAcquireGLObjects(queue, 1, &buff_1, 0, NULL, NULL);
    clEnqueueAcquireGLObjects(queue, 1, &buff_2, 0, NULL, NULL);
    clEnqueueReleaseGLObjects(queue, 1, &buff_1, 0, NULL, NULL);
    clEnqueueReleaseGLObjects(queue, 1, &buff_2, 0, NULL, NULL);
    In contrast, this code creates and synchronizes two buffers in an array:
    GLuint vbos[2];
    cl_mem buffs[2];
    glGenBuffers(2, vbos);
    buffs[0] = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, vbos[0], &err);
    buffs[1] = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, vbos[1], &err);
    clEnqueueAcquireGLObjects(queue, 2, buffs, 0, NULL, NULL);
    clEnqueueReleaseGLObjects(queue, 2, buffs, 0, NULL, NULL);
    The second example is simpler because it only calls clEnqueueAcquireGLObjects once. This reduces the synchronization time required for OpenCL-OpenGL interoperability.

  2. Place OpenCL and OpenGL function calls in separate classes.

    My C++/Qt project has become very complex, and if I placed all the OpenCL/OpenGL function calls in the same class, it would be impossible to read. So, in addition to my GLWidget class, I created two helper classes: GLUtils and CLUtils. The first contains static functions related to OpenGL and the second contains static functions related to OpenCL.

    These classes are friends of the GLWidget class, which means they can access its private resources. This separation has made my code easier to read and debug.
On an unrelated note, the AMD Fusion Developer Summit is two weeks away, but the agenda doesn't say anything about the session topics. Does AMD think people are going to attend without knowing what's going to be discussed? It's like advertising a rock concert without telling people who's playing.



>> Monday, May 14, 2012

I downloaded and recompiled the Qt source code, installed the Qt Add-In for Visual Studio 2010, and created a completely new project in Visual Studio. Now that I've stopped using Qt Creator, OpenCL-OpenGL interoperability works in Windows 7.

So I take back what I wrote earlier. Qt-OpenCL-OpenGL applications can work on Windows, but you have to be careful.


QCLContextGL and Frustrations Therewith

>> Saturday, May 5, 2012

My OpenCL-OpenGL-Qt application works wonderfully in Linux, but AFDS wants presentations given using Windows computers. So I've spent a great deal of time trying to build my application on Windows 7 using Qt Creator. This means dealing with GLEW, DLL incompatibilities, and spaces in directory names like C:\Program Files. Yuck.

I finally got the application to compile, but the OpenCL-OpenGL interoperability doesn't work. The OpenGL component works and the OpenCL component works, but they're not talking to one another. I invoked clGetGLContextInfoKHR, which verified that my GPU is capable of OpenCL-OpenGL interoperability. I've tried every coding trick I can think of, but it's still not working.

As a last resort, I decided to try Qt's QCLContextGL class, which is supposed to make it easy to combine OpenCL and OpenGL. This is part of the QtOpenCL module, whose build instructions say that it should be compiled with the Qt source code instead of the Qt SDK. I downloaded the Qt source code and tried to build it, but the zlib.dll library was missing. I downloaded the zlib source code, compiled it with Visual Studio, and placed the *.lib/*.dll/*.h files in the right folders. But I'd misread the error. Qt Creator wants zdll.lib, not zlib.dll. Ha. Silly me.

After fixing all the problems with the source code compilation, I attempted to compile the QtOpenCL module. Qt Creator gave me two errors, and they're both described here. There are no solutions at this time, so I'm going to give up on Windows and stick to Linux. If anyone asks whether OpenGL-OpenCL interoperability works on Windows, I'll tell them to use GLUT instead of Qt. It hurts to type that.

Tomorrow, the wonderful BBC series Sherlock will be available for American audiences. I can't wait. I may be a moron, but I can still live vicariously through Sherlock Holmes.


Boundary Representation, COLLADA, and STEP

>> Sunday, April 22, 2012

There are two main methods of forming computer models of three-dimensional figures: mesh representations (meshes) and boundary representations (breps). A mesh is an ordered collection of points in space and its chief advantage is simplicity. Meshes are easy to manage in code and modern GPUs excel at processing vast amounts of mesh data at high speed.

Boundary representations are more complicated. A brep of a figure takes into account its edges and surfaces, and instead of storing every point, it defines the figure's geometry using mathematical relationships. Boundary representation isn't a popular topic but it's vital to the field of computer-aided design. For example, if you use Sketchup to model a house, you don't want to deal with individual points. You want to move edges, extrude faces, insert holes, and manipulate curves. Brep makes these operations possible, and it's also important in computer vision, where computers recognize figures by measuring how light reflects and refracts from their surfaces.

There are plenty of ways to store mesh data, and the OBJ format is one of many free, popular formats. For boundary representation, I've only come across two non-proprietary formats: COLLADA and STEP. I've already discussed how COLLADA stores mesh data, but the latest version of COLLADA can also define a figure using edges, curves, faces, shells, wires, and solids. The COLLADA format is based on XML, and you can download the schema (*.xsd) and specification here. The specification is complex, but it's all regular XML and I found the brep documentation in Chapter 9 to be well-written.

The ISO 10303 standard, commonly referred to as the Standard for the Exchange of Product model data or STEP, is nearly two decades older than COLLADA and has a much broader scope. While COLLADA focuses on digital assets (graphics), the STEP standard is concerned with modeling real-world products. So in addition to boundary representation, a STEP design may include a product's method of manufacture, physical properties, and results of finite element analysis. Because the scope is so broad, you can't download a single STEP specification. Instead, ISO 10303 is split into nearly one hundred parts.

I became interested in STEP because I wanted my modeling tool to be ISO compliant, but I was surprised by how little useful information I could find. Many sources recommend STEP, but no one provides any non-trivial examples. So I opened my wallet and bought Part 28 of the standard, which discusses the XML representation of STEP data. I learned three important points:

  1. Each part of the STEP standard costs between 200 and 300 dollars.
  2. XML is not the primary format for STEP designs. The main language is EXPRESS, which looks like it's based on PASCAL.
  3. There is no STEP schema. That is, you can't download an XSD file that defines how STEP-compliant designs should be formatted.

This last point was the most discouraging. Instead of defining a schema, STEP presents a language that allows you to create your own schema. This is fine for companies like Autodesk, whose software already uses a model for storing product data. But without a hard specification, there's no way to ensure file compatibility. That is, two applications may produce design files based on STEP, but that doesn't mean that they'll be able to read one another's files.

So that's why I'm using COLLADA instead of STEP. I wouldn't mind getting my money back from ANSI, but I guess I'll just chalk it up to experience.



>> Sunday, April 15, 2012

The GPU Technology Conference 2012 is being held in San Jose from May 14-17. Given the name, you might expect a broad coverage of GPU technologies, from traditional offerings by AMD and Nvidia to ARM's mobile GPUs and the embedded GPU in Intel's new Ivy Bridge processors. But this is not the case. GTC 2012 is an Nvidia-centric event, and as the schedule makes clear, no GPU manufacturer but Nvidia will be discussed.

This isn't the first time Nvidia has tried to convince people that they're the only game in town. At SC11, many attendees told me they'd never consider OpenCL because they'd heard it was too experimental and not as widely supported as CUDA (the reverse is true). After I gave my talk on OpenCL-OpenGL interoperability, every question from the audience focused on my choice of language: "Why would you use OpenCL instead of CUDA?" and "Don't you know how easy CUDA is to program?" and "Don't you care about performance?"

I hope the audience at the AMD Fusion Developer Summit will be more receptive. My breakout session, "Solid Modeling on the Fusion," will consist of three parts. The first two are theoretical, and present the basics of solid modeling and the mathematics underlying non-uniform rational B-splines (NURBS). In the last part, I'll demonstrate how NURBS can be processed and rendered at high speed using OpenCL, OpenGL, and AMD's heterogeneous processor architecture. I think it will be a great time.



>> Sunday, April 1, 2012

My spare time has fallen precipitously, but I feel compelled to mention a few points:

  • Current versions of Mac OS do support OpenCL 1.1. I'm not certain which OS versions or hardware configurations are compliant, but this link provides a good way to check using clGetDeviceInfo.
  • I received an e-mail saying that floating-point values can be added atomically by using as_int, which tells OpenCL to interpret floating-point values as integers. This is possible, but you have to make sure every value has the same sign and exponent. I think it would be easier to multiply the floating-point values by an integer (say 10000), round the products to integers value, and add them together using atomic_add.
  • I'm still working on a NURBS modeling tool based on OpenGL, OpenCL, and Qt. I'd originally planned to rely on COLLADA for file persistence, but many have recommended the ISO 10303 format, commonly called STEP. Unfortunately, STEP's underlying language (EXPRESS) is based on Pascal, there are few examples of its usage online, and each part of the ISO standard costs about $300. One part of the standard discusses XML formatting, but instead of providing a schema, it explains how to create a schema of your own. Frustrating. I think I'll stick with COLLADA.


GLSL, Qt, and OpenCL

>> Monday, February 20, 2012

I've coded vertex and fragment shaders in GLSL, but I've never worked with the new geometry and tessellation shaders. To make up for this, I've been reading the OpenGL 4.0 Shading Language Cookbook by David Wolff. It's well written and I'm learning a lot. It's more than just a book of recipes; Dr. Wolff does a fine job explaining the underlying concepts.

One aspect of the book I find particularly interesting is the use of Qt in the example code. This is the first OpenGL/GLSL book I've seen that relies on Qt instead of GLUT, and I strongly approve. I have nothing against GLUT, but it was created as a quick-and-dirty way to demonstrate OpenGL coding. There are efforts to build on top of GLUT (see GLOW), but it's still not a professional toolset for building full-featured applications. In contrast, Qt provides a wealth of features including a new method of multithreading that improves the performance of OpenGL rendering. Unfortunately, because so much OpenGL code relies on GLUT, many conflate the two and assume GLUT's limitations apply to OpenGL.

Another thought struck me as I read Dr. Wolff's book. GLSL shaders and OpenCL kernels have a lot in common: similar purposes, operations, and datatypes. But the APIs are so different that you'd never guess they were released by the same group. For OpenCL 2.0, it would be helpful if the kernel API was changed to resemble GLSL. This would make it easier to code GPU drivers and compilers and would probably increase the developer base of both languages. Besides, in my OpenCL-OpenGL applications, the OpenCL kernels act like pre-shaders, manipulating vertices and colors before the real shaders begin processing.


Mac OS X 10.8 - Mountain Lion

>> Thursday, February 16, 2012

Apple has announced the upcoming release of Mac OS, version 10.8. I've searched as best I can, but I can't find any hard information regarding OpenCL/OpenGL support.

Many articles mention a "new graphics infrastructure" involving OpenCL and OpenGL. I don't know what that means, but I'll keep an eye out.


Source Code

>> Monday, February 13, 2012

Over the weekend, I received an e-mail asking for the source code for my OpenCL FFT. Like most of the projects mentioned on this blog, this is freely available at Manning's web site here. There are three archives: one for Visual Studio, one for Linux, and one for Mac OS. The reason for the separate Linux and Mac OS releases is that Linux supports OpenCL 1.1 and Mac OS doesn't.

Speaking of Mac OS, Apple has several job openings for OpenCL developers, and it doesn't look like they're being filled. I'm sure this is partly due to OpenCL's difficulty and obscurity, but I bet a large part of it has to do with mindset. If you've chosen OpenCL over CUDA, it probably means you prefer standards-based, cross-platform tools over tools developed by and for a single company. If this is the case, Apple is the last place you'll want to work. When I consider how cult-like Apple's consumers are, I can only imagine what their employees must be like.

If I was AMD, I'd lend my best OpenCL coders to Apple to help with driver development. If OpenCL is ever fully supported by Mac OS and iOS, the size of the developer base will skyrocket.


OpenCL, OpenGL, and NURBS

>> Sunday, January 29, 2012

I'm making progress coding an application to evaluate non-uniform rational B-splines (NURBS) on the GPU using OpenCL-OpenGL interoperability. NURBS processing is one of the most computationally-intensive tasks of modern CAD tools and the theory isn't easy, but my main problem is code complexity: the application uses Qt, OpenCL, and OpenGL, but because the three APIs are so different, I can only keep two of the three in mind at any time.

My goal is to develop an open-source solid modeling tool with greater capability and better performance than ACIS or Parasolid. I know there's a lot of disagreement regarding OpenGL vs. Direct3D, but once people see the power of combining OpenGL and OpenCL, I think they'll be stunned.


News in Brief

>> Monday, January 23, 2012

  • I've found new resources for OpenGL programmers. OGLplus provides a header file that makes it possible to code OpenGL applications with C++ functions. Megabyte Softworks has released a tutorial for OpenGL 3.3 here and Durian software has a tutorial here. As before, Jason McKesson's magnficent tutorial is here.
  • AMD has recently released version 2.6 of its SDK, which can be obtained here.
  • In addition to OpenCL support, ARM will provide an open-source, reverse-engineered graphics driver for its upcoming Mali GPU. The Phoronix article is here.
  • AMD has decided to change the name of its Fusion architecture to the Heterogeneous Systems Architecture. I couldn't find a press release on the AMD site, but the brief mention at is here.


Humble Pie

>> Friday, January 6, 2012

A few weeks ago, I had a great idea: write OpenCL code to test prime numbers and execute it on Amazon's GPGPU platform, which provides high-performance computing at low cost. Once I found the first prime with 100,000 digits, I'd win the EFF computing award and all the trappings of victory.

Rather than search for Mersenne primes with the Lucas-Lehmer test, I planned to test regular numbers with the new AKS method. I'd start with 10100000+1 and continue on from there.

But as I read more about testing primes, I became interested in the underlying theory. I wanted to understand topics like congruence relations and the Prime Number Theorem. And why not? I loved writing the chapters on matrix operations and the FFT. How hard could number theory be?

Very hard, it turns out. I've spent days squinting at proofs of the Prime Number Theorem, and I'm still baffled. I could hold this book upside-down and understand the material just as well. There's nothing elegant or intuitive about number theory, especially when it involves integration on the complex plane.

So I'm giving up. I'm going back to evaluating NURBS with OpenCL-OpenGL interoperability. I don't want to read about zeta functions or Dirichlet series or the Abel summation formula ever again.

To those interested in implementing the AKS algorithm, here's a word of warning. At its theoretical best, the AKS algorithm requires O(log3n) operations. If your system executes 1 TFLOPs/s, you can test a 100k-digit number in 1000 seconds, or about seventeen minutes. Not bad.

The problem is that, according to the Prime Number Theorem, the density of prime numbers near N is about ln(N). For N = 10100000, you can expect to find one prime per 230,258 numbers. Taking away even values, a 1 TFLOPs/s system will require about four years.

Personally, I have better things to do with my time.


  © Blogger template Werd by 2009

Back to TOP