The Fast and the Fourier

>> Monday, March 28, 2011

Five days. It's taken five long, painful days, but it's done. I finished coding the Fast Fourier Transform (FFT) for Chapter 12 this evening, and it works. Not just the forward FFT either, but also the inverse. And if there's anything more agonizing than debugging an inverse FFT running on a GPU, I haven't experienced it.

This isn't the first time I've coded an FFT, but it's the first time I've coded an FFT with OpenCL's work-items, work-groups, vector operations, and memory synchronization. Unbelievable. I look like something that staggered out of a zombie movie, but as I write this, I feel a tranquility and lightness of spirit that I can't describe.

I took one break on Saturday afternoon, and rented 13 Assassins on iTunes. Fine movie. A little more characterization would have made the lengthy battle scenes more meaningful, but other than that, it's a fine addition to the men-on-a-mission genre.

5 comments:

author October 22, 2014 at 1:43 AM  

hi, thanks for sharing the code I found a serious race condition problem. I wonder how could you run it.

Matt Scarpino October 22, 2014 at 5:44 AM  

It ran on every platform I tested it on, but there may still be a problem. You may want to use the (free) FFT code at https://github.com/clMathLibraries/clFFT.

author October 29, 2014 at 6:20 AM  

I find out that for the input points more than the 4096 the init kernel crashes. you might say that you have already test the code and it works, yes you are right and but it works on your graphic card. it took me some times to find out the reason,
the reason is that your local memory size is 32768 but for me is 49152 bytes. and the differences is that your points_per_group is 32768/8 = 4096 but for me is 49152/8 = 6144 which cause the problem
so I also changed my points per group to the 4096 when the number of points are larger than 4096 and there is no crashing any more
but the result is not correct

author October 29, 2014 at 6:25 AM  

There is also another Probelm which is more serious and important than the first one, and is the race condition in the Init_Kernel. you read from the global memory and the write it back to the diffferent address in the global memroy (becuase of indexing). it works if you have only one workgoup but when you have more than one it wont work. i solve this problem by adding another global buffer . as you know it is not easy to synchronze between the workgroups. if you want I can send the modified code. however you are the master :)

author October 29, 2014 at 6:31 AM  

sudo code to show the race condition

read(global_mem_buff1)
do_calculation
write(global_mem_buff1) // diff address

without synchronizing between the workgroups it won't work. and believe it didn't work for me.

Solution

read(global_mem_buff1)
do_calculation
write(global_mem_buff2) // diff address diff. buffer

Post a Comment

  © Blogger template Werd by Ourblogtemplates.com 2009

Back to TOP