>> Saturday, November 1, 2014
Over three and half years ago, I completed the OpenCL FFT that I discussed in Chapter 14. I tested it with data sets of varying sizes on different graphics cards and operating systems. It ran successfully every time, but recent comments make it seem likely that there's a race condition that needs to be addressed.
The problem with debugging an FFT is that it requires lengthy time for concentration, which usually involves me lying on the floor and squinting up at the ceiling for hours on end. Unfortunately, I'm busy at the moment and don't the time. But because I'm so ashamed, I'm going to take the week of 11/10 off from work and I'll do my best to resolve the problem.
It looks like the root cause is my bit-reversal routine, and I'll explain why this is particularly jarring. If you're familiar with FFT code, then you know that many routines perform bit-reversal with code like the following:
ans = x & 1;
x >>= 1;
ans <<= 1;
ans += x & 1;
Rather than operate on scalars, I devised a routine that bit-reverses all four elements of a uint4 vector at the same time. I thought it was clever, but if it causes a race condition, it has to go.
I apologize to everyone who was/is disappointed with the code. If you're still looking for a good OpenCL FFT, I recommend the clFFT project. This was once part of AMD's Accelerated Parallel Processing Math Libraries (APPML), but it looks like that's no longer supported.