The Fast and the Fourier

>> Monday, March 28, 2011

Five days. It's taken five long, painful days, but it's done. I finished coding the Fast Fourier Transform (FFT) for Chapter 12 this evening, and it works. Not just the forward FFT either, but also the inverse. And if there's anything more agonizing than debugging an inverse FFT running on a GPU, I haven't experienced it.

This isn't the first time I've coded an FFT, but it's the first time I've coded an FFT with OpenCL's work-items, work-groups, vector operations, and memory synchronization. Unbelievable. I look like something that staggered out of a zombie movie, but as I write this, I feel a tranquility and lightness of spirit that I can't describe.

I took one break on Saturday afternoon, and rented 13 Assassins on iTunes. Fine movie. A little more characterization would have made the lengthy battle scenes more meaningful, but other than that, it's a fine addition to the men-on-a-mission genre.


Sparse Matrices and OpenCL

>> Saturday, March 12, 2011

I finally finished implementing the Conjugate Gradient (CG) algorithm in OpenCL, and judging from a casual web search, I think this is the first time it's been done. The theory isn't simple by any means, and despite Jonathan Shewchuk's excellent paper on the subject (available here), there are still a few places where I'm not satisfied.

Thankfully, Matlab provides an m-file that shows how the CG algorithm is implemented, so I was able to check my work. Last night, I tested my code with the BCSSTK05 sparse matrix from NIST's Harwell-Boeing collection, and it converged with a final residual of 0.067. Happy day.

I was also planning to implement the biconjugate gradient stabilized algorithm, also known as BiCGSTAB, in OpenCL, but when I tried the Matlab bicgstab routine with NIST's LNS_131 matrix, the algorithm didn't converge. Even after 20,000 iterations, the residual stayed above ten thousand.

This amazes me. Finite element analysis (FEA) has been around since the 1960s, so I figured all the mathematical theory had been riddled out years ago. But from what I've seen, coding sparse matrix solvers is still more of an art than a science.

GPUs excel at highly-parallel algorithms, so it might be better to have them solve sparse matrix systems using direct methods instead of iterative methods. Need to give this more thought.


Is Your Local Memory Really Local?

>> Friday, March 4, 2011

CUDA makes a clear distinction between shared memory, which is SRAM located on the GPU, and local memory, which is located in DRAM, off the GPU. Both memory types are specific to a given processing unit, but shared memory has much less latency than local memory or global memory. For this reason, CUDA routines generally copy input data from global memory to shared memory, process the data in shared memory, and write the output to global memory.

OpenCL, on the other hand, doesn't make any distinction between shared memory and local memory. Both types are referred to as local. So here's the question: how do you know if the local memory you're working with is high-speed memory on the GPU or low-speed memory off the GPU?

It turns out that the clGetDeviceInfo function has a field called CL_DEVICE_LOCAL_MEM_TYPE, which can be either CL_LOCAL or CL_GLOBAL. If the type is CL_GLOBAL, then there's no point copying data from global memory to local memory because both memory types are essentially global. But if the type is CL_LOCAL, then the memory is close to the processing unit and it's a good idea to store frequently-accessed data there.

Kind of a nuisance, isn't it? It seems like the only way to ensure high-performance is to check the local memory type of a device and send it a different kernel depending on whether it's CL_LOCAL or CL_GLOBAL.


The 83rd Annual Academy Awards

>> Wednesday, March 2, 2011

The Oscars were given out this past weekend, and I'm glad that my only serious prediction turned out to be correct: Melissa Leo won for Best Supporting Actress in The Fighter. She was the best thing about that movie, and she made everyone else look like amateurs. I'd hoped that Geoffrey Rush would win for Best Supporting Actor, but that's more because he's my favorite actor than because of his performance.

I enjoyed The King's Speech, partly for the story and partly for the performances. Guy Pearce and Helena Bonham Carter were wonderful. It took me a while to recognize Jennifer Ehle, but I hadn't seen her since the last time she'd starred with Colin Firth. They even cast Anthony Andrews, who had played King Edward VIII, to play Stanley Baldwin. Ha.

I had two problems with the movie, though:

  • It should have ended with the speech's conclusion. The ten minutes of smug back-clapping weren't necessary, and given the gravity of the subject, it seemed odd that everyone was so happy. And Winston Churchill saying "I couldn't have said it better myself" was just silly.
  • If you're going to cast Claire Bloom as Queen Mary, you should give her some lines. I saw her in Limelight some time ago, and I've been a fan ever since.
I knew that one of the two little girls was Queen Elizabeth II, but it wasn't until I walked out of the theater that I realized Helena Bonham Carter's character was the Queen Mum. I just read her Wikipedia entry, and she was a fascinating woman.

I should see Limelight again. I wonder if I'll enjoy it as much as I once did.


  © Blogger template Werd by 2009

Back to TOP