>> Saturday, November 15, 2014
Because of the comments I received, I decided to test my FFT on new systems with new hardware and new drivers. My FFT passed every test, so I wrote a self-satisfied post stating that the commenter's problem was caused by using work-groups whose sizes weren't a power of two. Then it dawned on me. In the fft_init kernel, work items read data from bit-reversed addresses and write the processed data to unreversed addresses in the same buffer. This makes it possible for one work item to read data that has already been processed by another. This is the race condition to which the commenter was referring. Thankfully, this problem is easy to fix. I'll add a second buffer to fft_init so that every work item reads from the first buffer and writes to the second. I'll get this coded tomorrow morning and I'll contact Manning to get it uploaded to their software site. I'd like to thank the commenter for his/her assistance. I'd also like to point out that my bit-reversal algorithm, while idiosyncratic, is perfectly functional.