The State of GPU Computing: Is the CPU Dead Yet?
AMD: The Mainstreaming of GPU Compute
AMD was a little late to the GPU compute party, but it has been working feverishly to catch up. ATI Stream was the company's equivalent to Nvidia's CUDA. The first AMD FireStream cards for dedicated GPU compute were the model 580s, built on the Radeon X1900 GPU, which saw fairly limited pickup. It wasn’t until the Radeon HD 4000 series shipped that AMD really had competitive hardware for GPU compute. The HD 5000 improved on that substantially. The latest Radeon 6000 series has significant enhancements specifically geared for general purpose parallel programming.
Philosophically, though, AMD has taken a slightly different road. At first, the company tried to mimic Nvidia’s CUDA efforts, but eventually discarded that approach and fully embraced open standards like OpenCL and DirectCompute. (We’ll discuss the software platforms in more detail next.)

AMD is taking GPU computing mainstream by building in Radeon-class shader cores into the CPU die, as seen in this Fusion die shot.
Recently, AMD has shifted its GPU compute focus more to the mainstream. While AMD ships dedicated compute accelerators under the moniker FireStream, the company is trying to capitalize on its efforts to integrate Radeon graphics technology into mainstream CPUs. The Fusion APUs (accelerated processing units) are available in either mobile or desktop flavors. Even the high-end A3800, sporting a quad-core x86 CPU and 400 Radeon-class programmable shaders, costs less than $150.
AMD calls its approach to mainstream GPU compute App Acceleration. It’s a risky approach, since the mainstream applications ecosystem isn’t exactly rich with products that take advantage of GPU compute. The few applications that exist can run much faster on the GPU side of the APU, but the modest performance of the x86 side of the equation makes it difficult to compete with Intel’s x86 performance dominance. AMD is betting that more software developers will take advantage of GPU compute, shifting the performance equation for the APUs.
Intel: Bridges to GPU Compute
Intel has been watching the GPU compute movement with some understandable concern. The company tried to get into discrete graphics with Larrabee, but that project died on the vine. The technology behind Larrabee is now relegated to limited use in some high-performance parallel compute applications, but you can’t go out and buy a Larrabee board.
On the other hand, Intel has made waves with the integrated graphics built into its current Sandy Bridge CPUs. The Intel HD Graphics GPU is pretty average for Intel graphics, but the fixed-function video block is startlingly good. Video decode and transcode are very fast—even faster than most GPU-accelerated transcode. Of course, it’s a fixed-function unit, so it isn’t useful with non-standard codecs. But since a big part of the consumer GPU compute efforts from Nvidia and AMD focus on video encode and transcode, Sandy Bridge graphics stole a little thunder from the traditional graphics companies.

The GPU in Sandy Bridge is fairly mediocre—except for the fixed-function video engine, which is purely awesome.
Intel’s upcoming 22nm CPU, code-named Ivy Bridge, may actually change the balance. The x86 CPU itself will offer modest enhancements to Sandy Bridge, but the GPU is being re-architected to be fully DirectX 11 compliant. When asked if GPU compute code could run entirely on the Ivy Bridge graphics core, the lead architect for Intel said it would. Performance is unknown at this point, but if Intel can couple a GPU core that’s equal to the AMD GPU inside Fusion APUs with its raw x86 CPU capabilities, then it may signal a sunset on the era of entry-level discrete graphics cards.
The API Story
If you can’t write software to take advantage of great hardware, you essentially have really expensive paperweights. Early attempts to turn GPUs into general purpose parallel processors were bootstrapping efforts, requiring programmers to figure out how to write a graphics shader program that would do something other than graphics.
As the hardware evolved, a strong need for standard programming interfaces became critical. What happened is a recapitulation of graphics history: proprietary technology first, then a steady shift to more open standards.
CUDA
Nvidia’s CUDA platform was one of the first attempts to build a standard programming interface for GPU compute. Nvidia has always maintained that CUDA isn’t really “Nvidia-only,” but neither AMD nor Intel has really taken up the company’s offer to accept it as a standard. Some of Nvidia’s third-party partners, however, have chipped in, enabling support for Intel CPUs as fallback for some CUDA-based middleware.
CUDA started out small, consisting of libraries and a C compiler to write parallel‑processing code for the GPU. Over the years, CUDA has evolved into an ecosystem of Nvidia and third-party compilers, debugging tools, and full integration with Microsoft Visual Studio.
CUDA has seen most of its success in the HPC and academic supercomputing market, but CUDA has a broader reach than just deskside supercomputers. Adobe used CUDA in Adobe Premiere Pro CS4, and later to accelerate high-definition video transcode and some transitions. MotionDSP uses CUDA to help reduce the shaky‑cam effect in home videos. We’ll highlight a few GPU‑accelerated apps later in this article.
ATI Stream
We’ll just mention AMD’s Stream software platform briefly, since AMD is no longer pushing it, choosing to focus instead on OpenCL and DirectCompute.
Stream was AMD’s attempt to compete with CUDA, but the company obviously feels that the greater accessibility offered by standards-based platforms is more appealing.
DirectCompute
DirectCompute shipped with Microsoft’s DirectX 11 API framework, so is available only on Windows Vista and Windows 7. It will also be available on Windows 8 when that OS ships. That means there’s no support for DirectCompute on non-Microsoft operating systems. DirectCompute won’t run on Windows XP, either, nor on Windows Phone 7 or the Xbox 360.
DirectCompute works across all GPUs capable of supporting DirectX 11. Today, that means only Nvidia GTX 400 series or later and AMD Radeon HD 5000 series or later. Intel will support DirectX 11 compute shaders when Ivy Bridge ships in 2012.
DirectCompute’s key advantage is that it uses an enhanced version of the same shader language, HLSL, for GPU compute programming as it does for graphics programming. This makes it substantially easier for the large numbers of programmers already facile in Direct3D to write GPU compute code. It also runs across graphics processors from both AMD and Nvidia, giving it broad graphics hardware support.
On the downside, DirectCompute has no CPU fallback. So code specifically written for DirectCompute simply fails if a DirectX 11-capable GPU isn’t available. That means programmers need a separate code path if they want to replicate the results of the DirectCompute code on a system running an older GPU.
OpenCL
Early on, OpenCL was developed by Apple, who turned over the framework to an open standards committee called Khronos Group. Apple retained the name as a trademark, but granted free rights to use it.
OpenCL runs on just about any hardware platform available, including traditional PC CPUs and GPUs inside mobile devices like smartphones and tablets. Care must be taken with code designed for multiplatform use, as a cell‑phone GPU may not be able to handle the same number of threads as gracefully as an Nvidia GTX 580. In fact, Intel has even released an OpenCL interface for the current Sandy Bridge‑integrated GPU.
Support for OpenCL has been quite strong. AMD is so enamored of OpenCL that it dropped its ATI Stream SDK in favor of a new Accelerated Parallel Processing SDK, which exclusively supports OpenCL. OpenCL has also come to the web. A variant of OpenCL, called WebCL, is in the prototype stage for web browsers, which allows JavaScript to call OpenCL code. This means you may one day run GPU compute code inside your browser.
On the other hand, OpenCL is still in its infancy. Supporting tools and middleware are still emerging, and for the time being developers may need to create their own custom libraries, instead of relying on commercially available or free middleware to ease programming chores. There’s no integration yet with popular dev tools like Microsoft’s Visual Studio.