Quantcast

Don't have an account? Register Now! Forgot password?

Maximum IT
Features

Build the Perfect PC! Step-by-Step Illustrated How-To Guide

comment Commentsprint Printemail EmailDeliciousDiggStumbleUponRedditFacebookSlashdot

ATI’s Excellent Adventure

How the graphics underdog regained its mojo and nearly ate Nvidia’s lunch 

There are but a few great underdog stories in any era, and to the list of today’s finest, we add ATI’s RV770 series GPU. ATI, much like the Red Sox, had been in a bad funk—the only way the company could compete with archrival Nvidia was by cutting the price of wannabe high-end cards to the level of Nvidia’s midrange offerings. Clearly, this was not a good business model.

Then, on the eve of the GeForce GTX 280 launch, ATI unveiled a bombshell—a brand-new GPU architecture that utilizes better process technology and a more power-efficient design to outperform Nvidia’s gargantuan new GPU. ATI eschewed the huge, hot, monolithic GPU for a more compact but modular core. With the twin goals of decreased power consumption and more efficiency per die area, ATI now looks poised to dethrone Nvidia, and all without building a videocard that sports an aural footprint equivalent to that of a Dyson vacuum cleaner.

With this new GPU come three products: the $200 Radeon 4850, the $300 Radeon 4870, and the $600 4870 X2. While their prices vary wildly, the GPU for each is identical. Let’s find out what makes it tick.

ATI is kissing the giant GPU goodbye, preferring smaller, more efficient GPUs that can work in tandem on big workloads

We’ve walked this path before. When Intel’s NetBurst architecture reached the end of its life, we were seeing the largest, hottest, most power-hungry CPUs ever, but performance wasn’t scaling in line with the power and heat increases. In order to see a 10 percent performance boost, the new CPU would generate 30 percent more heat and require 30 percent more power.

This was an untenable situation, so Intel and AMD quickly moved away from monolithic cores to more efficient multicore designs. If your applications can take advantage of all the CPU cores in your system, you should see significantly better performance with a much slower, cooler multicore design than you would with a similar-size single-core design running at twice the speed.

The two main GPU manufacturers are at a similar crossroads, but each chose a different direction with its next-gen GPU. Nvidia has launched its GTX 280 boards, which sport a massive, monolithic GPU design. These are among the largest chips ever put into mass production—a single GTX 280 chip is 576mm2, features a 512-bit memory interface, and draws 236W when running full bore. By contrast, the RV770 chip that ATI is using in its new line of GPUs is just 260mm2, features a 256-bit memory bus, and draws about 170W when running at full power. Yet despite a much smaller die, a lower power draw, and a memory bus that’s half the width of the GTX 280’s, the Radeon 4870 delivers about 75 percent of the speed of the GTX 280 in most of our benchmarks.

The RV770 Unveiled

ATI packed the latency-sensitive silicon, such as the stream processors and the basic texturing units, in the center of the RV770 GPU. Surrounding it are the memory controllers and the L2 cache, and on the periphery of the chip rests the memory interface (GDDR5 on the 4870 and GDDR3 on the 4850), the PCI Express connection, the CrossFire controller, and the various display controllers for DVI, HDMI, VGA, and DisplayPort. And all of that is packed on a 260mm2 55nm die. 

The GPU Core

With this generation of GPU, ATI’s beginning to see the payoff from its premature move to a 55nm die size last generation. While Nvidia languishes at 65nm, ATI is packing more silicon into a smaller space and increasing efficiency at the same time. But that’s not all ATI’s done. The new RV770 series GPU features a complete redesign with an astounding 800 stream processors—the little silicon dynamos that handle everything from rendering soft shadows and bump maps to decoding H.264 video from Blu-ray movies.

By integrating 16KB of cache with bundles of 10 stream processors and a dedicated texture unit into so-called SIMD units (which combined make up the complete shader core), ATI has juiced much better shader performance out of the overall package. The stream processors in each SIMD unit can share information using their shared memory, which makes the new shader core more efficient than previous designs. And because the stream processors pump their output directly into a dedicated texture unit, there’s very little time lost between writing the output to texture memory.

The SIMD units themselves are each integrated with four texture processors in modular units, which minimizes latency and improves the performance of the RV770 design. Each SIMD connects to its four dedicated texture processors with 480GB/s of bandwidth between them. This was absolutely crucial to maintain performance; otherwise, the texture processors, which render the actual pixels that are displayed, would remain a bottleneck.

The Memory

There are two basic ways to increase memory bandwidth. You can increase the clock speed of the memory or you can transfer more data with every clock cycle by increasing the width of the memory bus. Like ATI’s previous-generation GPUs, Nvidia’s GTX 280 uses a 512-bit-wide memory bus. The RV770 GPU utilizes a narrower 256-bit bus, but it also supports new GDDR5 memory, which is capable of twice as many transfers per clock cycle as GDDR3. This gives ATI’s GPU roughly the same memory bandwidth as the GTX 280 on a board with a less-expensive 256-bit bus and the ability to transfer more data at lower clock speeds.

What’s more, GDDR5 also uses fewer pins than DDR3 to connect the memory to the board. This reduces board complexity, which is very important given the reduced space available with smaller process technology. By using a less-complex 256-bit bus and cranking the clocks up on the GDDR5 memory, ATI should be able to achieve decent memory performance without harming yields for the GPU, all while spending less per board.

While the high-end Nvidia graphics parts are running at a punishing 1100MHz and pushing an impressive 115GB/s of bandwidth, ATI’s 4870 ticks along at just 900MHz but delivers the same 115GB/s. The net result is that the ATI card’s memory draws less power and generates less heat while delivering the same level of performance as the more expensive card.

Running GDDR5 memory at speeds lower than GDDR3 memory with the same bandwidth is great, but the current low-end and midrange ATI boards feature only 512MB of total card memory—half the amount Nvidia’s new cards offer (the GeForce GTX 260 ships with 896MB of memory on a 448-bit interface and the GTX 280 ships with a full gigabyte).

For the most part, performance doesn’t seem to suffer from this shortcoming, but that could change as graphically intensive games like Far Cry 2 and Fallout 3 are released later this year.

Video Playback and Encoding

Video decode acceleration is a crucial feature for modern GPUs. The new RV770-series GPU handles advanced Blu-ray-required features, such as picture-in-picture, on the hardware, which allows for much lower CPU utilization with supported players. In our testing, CPU utilization went up about 5 percent when we flipped on picture-in-picture playback, while there was about a 20 percent increase when using an older ATI card on the same system.

Like Nvidia, ATI has demonstrated GPU-accelerated video transcodes from MPEG-2 to H.264 video. While the demos run at an impressive clip, there’s no way for us to compare the performance of the two cards. The Elements BadaBoom encoder that Nvidia uses is not compatible with ATI cards and the Cyberlink PowerDirector 7 encoder used by ATI is not compatible with Nvidia cards. Nor are the two apps’ settings similar enough to elicit a meaningful comparison. This illustrates the fundamental problem with GPU-based computing today, which we’ll talk about next.

Stream Processing

GPU-based computing is expected to be the answer for tasks that entail massive numbers of parallel computations, and the early apps that take advantage of GPUs, such as the Folding@Home clients, make the prospect seem quite promising. The problem, however, is that there’s one GPU computing API for Nvidia’s cards and a separate one for ATI’s cards.

That means anyone who writes software to harness the power of GPUs needs to write not one but two programs—one for ATI and one for Nvidia. If the last 12 years of DirectX have taught us anything, it’s that in order for hardware-accelerated anything to succeed, you need a common API that allows developers to write code once that works on both platforms.

We don’t know whether ATI’s Stream or Nvidia’s CUDA is the better API. Because we’re not programmers, we don’t care. But we do know that the continuance of two competing standards will only hamper development of GPU-leveraged applications. ATI and Nvidia need to put aside their differences and work together to build a common API that works on all hardware. If the two companies need a place to start, Apple pitched OpenCL, which does just that.

The Cards

ATI has launched the Radeon 4850 and the Radeon 4870. Priced at $200 and $300, respectively, these cards compete squarely in the midrange.

The 4850 ships with 512MB of (less-expensive) GDDR3 running at 993MHz on a 256-bit bus. The board we tested runs a 625MHz core and sports the same 800 stream processors as the more expensive 4870. The card will sell for between $200 and $250, depending on configuration and specs.

The Radeon 4870’s core runs at 750MHz and the board’s 512MB of GDDR5 memory runs at 900MHz on a 256-bit bus. Remember, though, the GDDR5 memory transfers four chunks of data per clock, giving it an effective memory bandwidth that’s almost double that of the 4850. For $50 to $100 more, this is a good thing.

COMMENTS
avatarPrice/performance comparison?

I'd like to see a price performance ratio with NewEgg or Pricewatch or some other source to get at performance.  AMD usually wins price/performance but the cheap quads Intel put out challenged that notion.

I also wanna know, just for kicks, when is Microsoft or Apple going to start optimizing their software (OS and applications) for 64 bit and for multi-core processors?  Isn't a lot of what the chip makers putting out being simply wasted because only Adobe and a few others have bothered to optimize their software for multicore? Games might be a lot faster if they used more than one core, and its been a few years now, right?  Shouldn't something in the development pipeline be able to take advantage of 2+ cores and rock out with its code out?

Login or register to post comments
avatarI totally agree

I totally agree.   Are there ANY games out there that can take advantage of 2 cores or more?  If not, then why should I buy anything more than a high end Core 2 Duo and put the money towards a kickass videocard, maybe 2,  and/or maybe even a PCI-Express SoundBlaster?  Just some food for thought.

Sincerely yours, from Fort Campbell, KY,

SGT Samuel E. McClard II

Life's a journey, enjoy the ride!!

Login or register to post comments
avatar.

you should have used Ph2 720. Stick a better Video card in the rig with the saved money. Much better frame rate improvement.

Login or register to post comments
avatarzalman 9900?

wheres the new zalman cooler at? its your best tested cooler, so its kinda funny you dont recommend it for building a pc. plus the zalman 9900 is getting unfairly beaten over the head on newegg by stupid reviewers, i hate people sometimes. great article though, very informative and lengthy, ill definitely recommend people new to building pc's to this.

Login or register to post comments
avatarMemory Boo Boo

I spy an error!  When talking about the official supported memory speed of the Core i7, it should read DDR3/1333 which is PC3 10666 not 1066.

Login or register to post comments
avatarI agree with da_saman...I

I agree with da_saman...I believe the build-your-own pc guide should have been revised with the new parts which present a different build experience altogether. I also noticed a lot of the writing about "why we chose the parts" was also from the article in an old issue. I do, however, commend you guys for a great overview of the parts out today and how to get the maximum potential out of your pc.

Login or register to post comments
avatarNew guide, old info

The actual section where you build the rig looks like it is utilizing the old guide where they used the Stacker case.  Shouldn't the pictures and the writing reflect the new parts? 

Sincerely yours, from Ft. Campbell, KY,

SGT Samuel E. McClard II

Life's a journey, enjoy the ride!!

Login or register to post comments
avatarMicroCenter....

I just learned about them, love their deals-saving a lot on my case and the Core i7 920....but they never get new stock. I have been waiting almost a week, going on 2 for them to get more Core i7s because they are out of stock right now. Many places get new stuff on Tuesdays....doesn't seem to be the deal here.

Login or register to post comments
avatarNCIX

Give NCIX a try, if you're in the US here is the URL:

http://www.ncixus.com/

If in Canada (which is where I am):

http://www.ncix.com/

There prices are a bit more expensive, but they do price matching...so you're able to get cheaper prices; also they tend to do surprise sales, etc.

Michael

Login or register to post comments
avatarLooks like they took it off.

Looks like they took it off. They don't have the banner ad for it, and looking at the "processors" section shows nothing.

Login or register to post comments
avatarMicrocenter has i7 for $229

Microcenter has i7 920 for $229 right now...you cant go wrong! 02/11/2009

2/12/09 - Looks like it is off the website search...odd. I bought mine about 3 weeks ago when I got the ad in an email. Paid $229 for it - couldnt believe it!

I found the link http://www.microcenter.com/single_product_results.phtml?product_id=0300438

Also to their ad this month is BYOPC: http://microcenter.com/specials/catalogs/broadsheet.html

Login or register to post comments
avatarSome things I felt were

Some things I felt were missing:

1. AMD's Phenom II (Deneb)

2. More AMD boards

3. A Thermalright HSF

4. One of Antec's gaming cases (like the 902 or 1200)

Login or register to post comments
avatarThis should have been

This should have been called, "Build the Perfect Intel Based PC"

Login or register to post comments

This Month's Issue
FEATURE How to Get FREE Programs, Services, Software & MoreFEATURE Digital Photo Printer RoundupHOW TOBuild a 3D CameraFEATUREDIY Arcade PCWHITE PAPERHow TRIM Works