Intel, FPGAs, and DLC

>> Sunday, June 29, 2014

I designed FPGA circuits early in my career and I was surprised by how difficult it is. The logic elements in an FPGA operate independently, so designers have to keep track of their input/output signals to make sure they're all in step. If Signal A reaches a gate before B and C are valid, the element may produce errors. Timing errors are hard to detect and very difficult to debug. Tools like gdb can't help, so designers use virtual logic analyzers like those provided by Modelsim.

OpenCL can reduce the risk and difficulty of FPGA design, but given the small developer base, Intel might not allow developers to access the Xeon's integrated FPGAs. Instead, Intel could assemble a catalog of prebuilt, fully-tested FPGA designs for special tasks. If Intel's C/C++ compiler (icc) notices that an application could be accelerated with one of these designs, it could alert the developer with a friendly dialog box:

Howdy, developer! I see you're sorting database records and performing statistical analysis. If you install Intel's RapidCore on your Xeon, this application will execute 7,364 times faster.

Buy RapidCore (Y/N)?

After the purchase is completed, the compiler downloads the core from the Internet and automatically installs it on the Xeon's embedded FPGA. This way, the developer doesn't need to understand OpenCL, logic design, or timing analysis.

The principle is similar to the downloadable content (DLC) provided by game publishers. After customers buy a game, they can pay extra to make the game easier or more interesting. With Xeon DLC, developers buy the compiler, and then they can improve performance with special-purpose FPGA designs. Similar improvements could be made available to end-users.


Intel, FPGAs, and OpenCL

>> Monday, June 23, 2014

Intel has announced that upcoming releases of the Xeon processor will have integrated Field Programmable Gate Arrays (FPGAs). At first, this amazed me. The primary languages for FPGA design are Verilog and VHDL, and both are beyond the experience of most Intel programmers. In fact, the process of designing an FPGA circuit with Verilog/VHDL is completely different than that of building a C/C++ application.

Then a thought struck me. The two main FPGA vendors, Xilinx and Altera, are developing toolsets for creating FPGA designs with OpenCL. I wouldn't be surprised if the Xeon's FPGA is intended to be accessed through OpenCL, not Verilog or VHDL.

The announcement doesn't say whose FPGAs will be integrated in the Xeon, but it's noteworthy that Intel is manufacturing Altera's latest generation of FPGAs, which includes the Arria 10 and the Stratix 10. These are the first FPGAs to provide dedicated logic for floating-point DSP. Further, Altera is working hard on its OpenCL support, and I can state from experience that their SDK is functional and polished.

So here's my prediction: Intel's new Xeons will have integrated FPGAs from Altera. Developers will be able to access the FPGAs' dedicated DSP blocks using OpenCL.

This sounds fine, but I foresee three problems:

  1. No matter what language you use, compiling an FPGA design takes hours. Are developers willing to wait that long?
  2. Altera's OpenCL SDK is great, but it's not free. Also, it requires installation of Altera's Quartus II application.
  3. Despite my best efforts, the OpenCL developer community is pretty small. Integrating OpenCL-accessible FPGAs into high-end CPUs seems like a big risk.
Wait a minute. What if these Xeons are intended for Apple? Apple is a fervent believer in OpenCL and they probably know which floating-point routines need FPGA acceleration. Hmm.

ETA: I received a link to a post that accuses Intel of copying Microsoft's effort to use FPGAs to accelerate web searching. This may be the case, but I suspect Intel is trying to compete with Nvidia's high-speed number-crunching servers. We'll see...


Trial of the Century: Oracle v. Google

>> Friday, May 9, 2014

The US Court of Appeals has ruled that the Java API is copyrightable, and that Google used it improperly in its Android devices. The terrible ramifications of this decision can't be overstated. From now on, programmers will have to ask who owns the language before they start coding. Also, Oracle can (and probably will) go after everyone who develops applications with Java, which is one of the most popular programming languages in the world. And remember: while patents last 14-20 years, copyrights last seventy years.

Apple owns the trademark for OpenCL, but who owns the copyright for the API? What would happen if the owner(s) decided to sue for infringement?

If someone developed a programming language based on the English language, could they sue everyone who writes English? How would you legally distinguish a computer program from regular text?


SYCL and Other Announcements

>> Sunday, March 23, 2014

The Game Developers Conference took place last week and there were many announcements related to OpenCL. For one thing, the WebCL 1.0 standard has been released. The page says that "Security is top priority" but I haven't seen any tools or programming constructs that prevent kernels from locking up the GPU. And it doesn't look like WebGL-WebCL interoperability is a significant concern. Ah well.

The GDC announcement that I found especially interesting involves a new programming layer called SYCL. According to its Khronos page, the goal is to make OpenCL and SPIR accessible through C++. I thought Benedict Gaster did a fine job with his cl.hpp wrapper, but I look forward to using the new SYCL API.

The SYCL effort is led by CodePlay, whose CEO, Andrew Richards, discussed OpenCL in an interview I mentioned in a previous post. He appreciates the importance of OpenCL-OpenGL interoperability, and if SYCL can simplify the coding process, that will be a wonderful thing. The FAQ for SYCL is here, but it doesn't answer the burning question: What does SYCL stand for? If you're going to make up a *CL acronym, why use four letters instead of three?


Chromium and WebCL

>> Sunday, March 16, 2014

Some time ago, I was very interested in using Google's Native Client to enable OpenCL processing in Chrome. As it turns out, the Native Client doesn't allow that sort of thing, but AMD hasn't given up. They've added WebCL to Google's Chromium project, which is the open-source version of Chrome.

AMD has made the source code for Chromium-WebCL available here. I've downloaded it and I'll give it my full attention when the time presents itself.


Book News

>> Wednesday, February 12, 2014

A coworker pointed out that two of my book's examples don't work properly: device_ext_test and buffer_check. After testing both on my Linux/AMD system, I was forced to agree. So I fixed the code and sent the updated zip files to the publisher.

The device_ext_test application fails because of clGetDeviceIDs. This doesn't seem to work properly when you want to determine how many devices are present. To be specific, the following line of code causes the error:

clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, NULL, &num_devices);

When this executes, it returns CL_INVALID_VALUE. I have no idea why this is, but I removed the call to clGetDeviceIDs and the error disappears.

My second error is more interesting. buffer_check fails because of the call to clCreateSubBuffer. My original code set the sub-buffer's origin to 120. This isn't an aligned memory address, and when I wrote the application, alignment wasn't a concern. But now my call to clCreateSubBuffer produces CL_MISALIGNED_SUB_BUFFER_OFFSET, a new error that was introduced in OpenCL 1.2. To clear this, I set the origin to 0x100. Now all is well.

In other news, Manning has made my book the Deal of the Day for February 13. Woo-hoo! That Oculus Rift book looks pretty incredible as well...


RenderScript and OpenCL

>> Monday, January 20, 2014

I decided it was time to learn RenderScript, so I spent the weekend reading through documentation and testing code on my Galaxy S4. For those unfamiliar, a RenderScript project requires at least three files:

  • a native file (*.rs) containing high-performance C code
  • a Java file automatically generated from the *.rs file
  • a Java file that calls the methods in the generated file
This is complicated, but RenderScript is much easier to deal with than the Java Native Interface.

Code in a *.rs file can operate on scalar and vector types and can call functions like dot, sin, and ilogb. The functions are declared in RenderScript headers (*.rsh) and one of the most prominent headers is rs_cl.rsh.

Looking through rs_cl.rsh, I was surprised by how similar its functions are to OpenCL's kernel functions. That's when it dawned on me—the 'cl' in rs_cl.rsh refers to OpenCL. So RenderScript isn't really competing with OpenCL. RenderScript is a Java layer on top of OpenCL's kernel API.

As I investigated further, the parallels between the two languages became more apparent. RenderScript's Allocations serve the same role as OpenCL's memory objects. In OpenCL, work-items have identifiers with one, two, or three dimensions. In RenderScript, kernels access similar dimensions as function parameters.

Of course, RenderScript differs from OpenCL in many respects. RenderScript doesn't let you choose the target device and each kernel can only access one or two Allocations (memory objects). Also, developers can't specify the usage of global, local, or private memory.


  © Blogger template Werd by 2009

Back to TOP