ATI to Nvidia: You're a Dinosaur


On the eve of the GeForce GTX 280 launch just last week, ATI unveiled a bombshell—a brand-new GPU architecture that utilized better process technology and a more power efficient design to outperform Nvidia's gargantuan new GPU. ATI eschewed the huge, hot monolithic GPU for a more compact, but modular core. With twin goals of decreased power consumption and more efficiency per die area, ATI looks poised to dethrone Nvidia, and all without building a videocard that sports an aural footprint roughly equivalent to a Dyson vacuum cleaner.

With the new RV770 GPU comes two products, the $200 Radeon 4850 and the $300 Radeon 4870. While their prices vary wildly, the videocards all use the same GPU. Let’s find out what makes it tick.

Under the Heat Spreader

ATI says the day of the giant monolithic GPU is over. Instead of using giant, power-hungry GPUs, ATI is planning on designing smaller, more efficient GPUs that can work together to handle big workloads.

We’ve walked this path before. When Intel’s Netburst architecture reached the end of its life, we were seeing the largest, hottest, most power-hungry CPUs ever, but performance wasn’t scaling up as fast as the power and heat were. In order to see a 10% performance increase, the new CPU would generate 30% more heat and require 30% more power. This was an untenable situation, so Intel and AMD quickly moved away from monolithic cores to more efficient multi-core designs. If your applications can take advantage of all the CPU cores in your system, you should see significantly better performance with a much slower, cooler multi-core design than you would with a similar sized single-core design running at twice the speed.

The two main GPU manufacturers are at a similar crossroads, and each chose a different direction with this generation of GPU. Nvidia has launched its GTX 280 boards, which sport a massive monolithic GPU design. These are among the largest chips ever put into mass production—a single GTX 280 chip is 576mm^2, features a 512-bit memory interface, and draws 236W when it’s running at full bore. To contrast, the RV770 chip that ATI is using in its new line of GPUs is just 260mm^2, features a 256-bit memory bus, and draws about 170W when running at full bore. But, despite a much smaller die, drawing less power, and running a memory bus about half the width of the GTX 280, the Radeon 4870 delivers about 75% of the speed of the faster card in most of our benchmarks.

GPU Core Competencies

With this generation of GPU, ATI’s beginning to see the payoff from the premature move to a 55nm die size last generation. While Nvidia languishes at 65nm, ATI is packing more silicon into a smaller space, and increasing efficiency at the same time. But, that’s not all ATI’s done. The new RV770 series GPUs feature a redesigned GPU core with an astounding 800 stream processors—the little silicon dynamos that handle everything from rendering soft shadows and bump maps to decoding H.264 video from Blu-ray movies.

By integrating 16KB of cache, bundles of 10 stream processors, and four dedicated texture units in so-called SIMD units, ATI has juiced much better shader performance out of the overall package. The 10 stream processors share can share information with other processors in the same bundle using their shared memory, which makes the new shader core much more efficient than previous designs. And, because the shader cores pump their output directly into dedicated texture units, there’s very little time lost between writing the output to texture memory.

By integrating the stream processors in modular units around the texture processing cores, the RV770 design minimizes latency and improves performance. Each SIMD unit is connected to four dedicated texture units with 480GB/sec of bandwidth between them. This was absolutely crucial to maintain performance, or the texture units, which render the actual pixels that are displayed, would become the bottleneck again.

Under the Hood: RV770 Unveiled

ATI packed the latency-sensitive silicon, like the stream processors and the basic texturing units in the center of the RV770s GPU. Surrounding that is are the memory controllers and L2 cache, and on the periphery of the chip rests the memory interface (GDDR5 on the 4870 and GDDR3 on the 4850), the PCI Express connection, the Crossfire controller, and the various display controllers for DVI, HDMI, VGA, and DisplayPort. And they packed all that on a 260mm^2 55nm die.

Sometimes Narrower is Better

There are two basic ways to increase memory bandwidth. You can increase the clock speed of the memory or you can transfer more data with every clock cycle by increasing the width of the memory bus. Like ATI’s previous-generation GPUs, Nvidia’s GTX 280 uses a 512-bit wide memory bus. The new GPU utilizes a narrower 256-bit bus, but it’s using new GDDR5 memory, which allows twice as many transfers per clock cycle as GDDR3. This gives ATI roughly the same memory bandwidth as the GTX 280 on a board with a cheap 256-bit bus and which transfers more data at lower clock speeds.

GDDR5 also uses fewer pins to connect the memory to the board. This reduces board complexity compared to DDR3, which is especially important when you consider the greatly reduced space available for connector pins on GPUs that use smaller process technology. By using a less complex 256-bit bus and cranking the clocks up on the GDDR5 memory, ATI should be able to able to bring decent memory performance in without harming yields for the GPU.

The high-end Nvidia GeForce GTX 280 runs its GDDR3 memory at a punishing 1100MHz and pushing an impressive 115GB/sec of bandwidth. Meanwhile ATI’s 4870 just ticks along at 900MHz, but runs at the same 115GB/sec. The net result is that the ATI card’s memory draws less power and generates less heat, while delivering the same level of performance as the more expensive card.

Running GDDR5 memory at speeds lower than GDDR3 memory with the same bandwidth is great, but the current low-end and mid-range ATI boards only support 512MB of total memory (the GeForce GTX 260 ships with 896MB of memory on a 448-bit interface and the GTX 280 ships with a full gigabyte). For the most part, performance doesn’t seem to suffer due to lack of memory, but that could change as graphically intensive games like Far Cry 2 and Fallout 3 are released this year.

Video Playback and Encoding

As we’ve covered in the past, video decode acceleration is a crucial feature for modern GPUs. The new RV770-series GPU handles advanced Blu-ray-required features, such as picture-in-picture, on the hardware, which allows for much lower CPU utilization with supported players. In our testing, CPU utilization went up about 5% when we flipped on picture-in-picture playback, while there was about a 20% increase when using an older ATI card on the same system.

Like Nvidia, ATI has demonstrated GPU-accelerated video transcodes from MPEG2 to H.264 video. While the demos run at an impressive clip, there’s no way for us to compare performance between the two cards, as neither encoder will work with both Nvidia and ATI GPUs, and neither the Element BadaBoom encoder that Nvidia uses, nor Cyberlink’s PowerDirector 7 encode using similar enough settings that we feel comfortable comparing them.

This illustrates the fundamental problem with GPU-based computing today, which we’ll talk about next.

Stream Processing

GPU-based computing promises to bring massive performance to all tasks that require massive numbers of parallel computations to occur, and the early apps, such as the Folding@Home clients, are extremely promising. However, the problem is that there’s one GPU computing API for Nvidia’s cards and a second one for ATI’s cards.

That means that anyone who writes software and wants to harness the power of GPUs needs to write not one, but two programs—one for ATI and one for Nvidia. If anything about the last 13 years of DirectX have taught us anything, it’s that in order for hardware-accelerated anything to succeed, you need to have common APIs that allow developers to write code once that works on both platforms.

We don’t know whether ATI’s Stream or Nvidia’s CUDA is the better API. Because we’re not programmers, we don’t care. But, we do know that there needs to be a common API that developers can write to that will run on every supported GPU. To make that happen, ATI and Nvidia need to put aside their differences and work together to build a common API that works on all hardware. If the two companies need a place to start, Apple pitched OpenCL, which does just that.

The Speeds and Feeds

And now, it’s time to talk about the hardware. By the time you read this, ATI will have launched both the Radeon 4850 and the Radeon 4870. Priced at $200 and $300 respectively, these cards are set to compete squarely in the mid-range.

The 4850 ships with 512MB of GDDR3 running at 993MHz on a 256-bit bus. The board we tested runs a 625MHz core and sports the same 800 stream processors as the more expensive 4870. The card will sell for between $200 and $250 depending on configuration and specs.

The Radeon 4870 is ATI’s new mid-range part, slotting it the $300 price range. The GPU core runs at 750MHz and the boards 512MB of GDDR5 memory runs at 900MHz on a 256-bit bus. Remember though, that the GDDR5 memory transfers 4 chunks of data per clock, giving it an effective memory bandwidth that’s almost double that of the 4850. For $50-$100 more, this is a Good Thing.


Radeon 4850
Radeon 4870 GeForce GTX 280
GPU Core
55nm RV770 55nm RV770 65nm GT200
GPU Clock Speed
625MHz 750MHz 602MHz
Memory Type/Interface 256-bit GDDR3 256-bit GDDR5 512-bit GDDR3
Memory Speed
993MHz 900MHz 1107MHz
Die size
260mm^2 260mm^2 576mm^2

The Performance Story

Do ATI's new graphics cards deliver 75% of a GeForce GTX 280’s power for a fraction of the price? We went into the lab to find out.

The short answer is "yes". The Radeon 4870 runs nearly as fast as a GTX 280 in most benchmarks for about 60% of the cost. Running two 4870 boards together in Crossfire delivers performance that beats a single GTX 280 board for the same cash outlay. The performance you get from a single 4870 card is quite impressive, especially when you consider that it's half the price. When you look at the scores the Radeon 4870s chalked up in Crossfire mode, you may even be tempted to pony up for a pair of 4870s, but think before you leap.

Dual-card solutions are well and good in practice, but before you make the jump to a dual-GPU, you need to be aware of the pitfalls. First, adding a second card to your rig completely obviates the power and noise benefits the 4870 has over the GTX 200. Second, functionality that you may take for granted, like multiple monitor support, doesn't work with dual-card solutions from either ATI or Nvidia. Third, new games frequently require a driver update or even a patch before they'll properly take advantage of your second card. Multiple-cards are great for power users, but you need to be aware of the sacrifices entailed with these rigs, preferably before you whip out your credit card. We can't wholeheartedly recommend SLI and Crossfire as more than niche products until these problems are solved.

During the course of our testing, we also discovered that many of these new cards were CPU-bound on our testbeds in all but the most demanding games. That means that even adding a second (or a faster) videocard to your system shows very little performance improvement because the CPU can't handle its tasks fast enough to keep multiple GPUs occupied. We'll be updating our testbed before the next round of GPU reviews, however, if your current CPU is slower than an Intel Core2 Duo X6800--a 2.93GHz dual-core Conroe--then you probably won't see much benefit in games outside of Crysis if you upgrade to more than one graphics card, whether it's a GTX 280, a Radeon 4870, or even a Radeon 4850.

But, we digress. The short, short verdict is that ATI's new Radeon 4850 and Radeon 4870 deliver stunning performance at an extremely compelling price point. If you've been waiting to upgrade to a DirectX 10-compatible graphics card, now is the time. For less than the price of an Xbox 360, you can upgrade your GPU and get kick ass gaming performance on most modern PCs.

Radeon 4870 Benchmarks

GeForce GTX 280
Radeon 4870 Radeon 4870 Crossfire
Crysis (fps)
15.9 9.3 19.9
World in Conflict (fps)
32.0 28.0 34.0
Company of Heroes (fps) 32.0 39.6
3DMark'06 Game 1 (fps)
34.0 47.1
3DMark'06 Game 2 (fps) 45.5 36.7
3DMark Vantage Game 1 (fps) 15.5
3DMark Vantage Game 2 (fps)
11.9 9.0
Best scores are bolded. Allbenchmarks runat 1920x1200 with 4x AA enabled, unless otherwise specified.

Radeon 4850 Benchmarks

GeForce GTX 280
Radeon 4850 Radeon 4850 Crossfire
Crysis (fps)
15.9 8.1
World in Conflict (fps)
31.0 34.0
Company of Heroes (fps) 32.0 32.7
3DMark'06 Game 1 (fps)
24.4 45.5
3DMark'06 Game 2 (fps) 45.5
29.7 48.1
3DMark Vantage Game 1 (fps) 15.5
3DMark Vantage Game 2 (fps)
11.9 7.0
Best scores are bolded. Allbenchmarks runat 1920x1200 with 4x AA enabled, unless otherwise specified.

Around the web