The State of GPU Computing: Is the CPU Dead Yet?
The API Wars
The GPU compute API situation today resembles the consumer 3D graphics API wars of the late 1990s. The leading development platform is CUDA. Despite Nvidia’s protestations to the contrary, CUDA remains a proprietary platform. It has a rich ecosystem of developers and applications at this stage, but history hasn’t been kind to single-platform standards over the long haul.

This chart sums up the state of the GPU compute APIs in a nutshell.
You could argue that DirectCompute is also proprietary, since it’s Windows-only—and even lacks support on pre-Vista versions of Windows. However, Windows is by far the leading PC operating system, and DirectCompute supports all existing DirectX 11–capable hardware. That’s where the support ends, however, since there’s no version for mobile hardware, though we may see that change with Windows 8.
OpenCL offers the most promise in the long run, with its support for multiple operating systems, a wide array of hardware platforms, and strong industry support. OpenCL is the native GPU compute API for Mac OS X, which is gaining ground in the PC space, particularly on laptops. But OpenCL is still pretty immature at this stage of the game. There’s a strong need for integration with popular development platforms, more powerful debugging tools and more robust third-party middleware.
The Applications Story
To see what kind of strides GPU compute has made, we’re going to focus on consumer applications, not scientific or highly vertical applications. GPUs should do well in applications where the code and data are highly parallel. Examples include some photography apps, video transcoding, and certain tasks in games (that aren’t just graphical in nature.)
Musemage
Musemage is a complete photo editing application available from Chinese developer Paraken. When running on systems with Nvidia GPUs, Musemage is fully GPU accelerated. Musemage uses the CUDA software layer to accelerate the full range of photographic operations.

Musemage is the first photo editing application to be fully GPU accelerated.
Musemage lacks a lot of the automated functions built into more mature tools like Photoshop, but if you’re willing to manually tweak your images, most of the filters and tools act almost instantly, even on very large raw files—provided you’ve got Nvidia hardware.
Adobe Premiere Pro CS5/5.5
Adobe’s Premiere Pro is a professional-level video editing tool. One of the tasks necessary for any video editor is previewing projects as you assemble clips, titles, transitions and filters into a coherent whole. Adobe’s Mercury playback engine uses CUDA to accelerate the preview. This is incredibly useful as projects grow in size—you’re able to scrub back and forth on the timeline in real time, even after making changes.
In addition, a number of effects and filters are GPU accelerated, including color correction, various blurs, and more. A complete list can be found at the Adobe website.
Adobe is investigating porting the Mercury engine and other GPU-accelerated portions of Premiere Pro to OpenCL, but we haven’t heard whether a final decision has been made. Given the relative immaturity of the tool sets and drivers, OpenCL may need a little more time before major software companies like Adobe commit to the new standard.
Interestingly, Intel has recently delivered a plugin for Premiere Pro CS5.5 that can speed up HD encoding if you use Adobe Encoder. It does require an H67 or Z68 chipset. With a Z68 system, you can use an Nvidia-based GPU to accelerate the Mercury playback engine and QuickSync to perform the final render.
Video Conversion
A number of video transcoding apps exist that are GPU accelerated. One of the first was CyberLink’s Media Espresso, which first used Nvidia’s CUDA framework, then OpenCL. The latest version of Media Espresso takes advantage of Intel’s QuickSync. Transcoding with QuickSync can be faster than using a GPU, but only if you use a QuickSync-supported codec.
Higher-end tools, like MainConcept, also use GPU encode. MainConcept offers separate H.264/AVC encoders for Nvidia, running on CUDA, and AMD, which uses OpenCL.
Games
When we think of games and GPUs, it’s natural to think about graphics. But games are increasingly using the GPU for elements that aren’t purely graphical. Physics is the first thing that comes to mind. Usually when we think of physics, we think of collisions and rigid body dynamics.
But physics isn’t just about stuff bouncing off other stuff. Film effects like motion blur and lens effects like bokeh and volumetric smoke are handled via GPU compute techniques rather than run on the CPU. GPU compute also handles cloth simulations, better-looking water, and even some audio processing. In the future, we might see some of the AI calculations offloaded to the GPU; AMD already demonstrated GPU-controlled AI in an RTS-like setting.
As more GPU compute capability is integrated into the CPU die, it’s possible for the on-die GPU to handle some of these compute tasks while the discrete graphics card takes care of graphics chores. The ability for the on-die GPU and CPU to share data more quickly—without having to move data over the PCI Express bus—may make up for the fewer shader cores available on-die.
Parallelism is the Future
CPUs will never go out of fashion. There will always be a need for linear computation, and some applications don’t lend themselves to parallel computation. However, the future of the Internet and PCs is a highly visual one. Digital video, photography, and games may be the initial drivers for this, but the visual Internet, through standards like WebCL and HTML5 Canvas, will create more immersive experiences over the web. And much of the underlying programming for creating these experiences will be parallel in nature. GPUs, whether discrete or integrated on the CPU die, are naturals for this highly visual, parallel future. GPU computing is still in its infancy.