- How to Build a Kick-ass $800 Gaming PC
- Ultimate Core i7 Overclocking Guide -- We Push Nehalem to its Limits
- How To: Build a Media Server
- AMD Strikes Back with Phenom II -- Full Analysis and First Benchmarks!
- Core i7 Dissected and Benchmarked! Does Intel’s Next-Generation Chip Live Up to the Hype? Hell Yeah!
Build the Perfect PC! Step-by-Step Illustrated How-To Guide
Posted 02/11/09 at 11:00:00 AM by The Maximum PC Staff
Nvidia's Next-Gen GPU
It wouldn’t be fair to say that Nvidia has jumped the shark, but the GTX 200 series isn’t nearly as impressive as many of this company’s previous product rollouts
Watching the ongoing race between AMD and Nvidia to build the ultimate graphics processor reminds us of the tale of the tortoise and the hare. AMD has played the hare, aggressively bounding ahead of Nvidia in terms of process size, number of stream processors, frame buffer size, memory interface, die size, and even memory type. Yet Nvidia always manages to snag the performance crown. The GeForce 200 series is but the latest example (although AMD’s RV770 is a helluva comeback).
Earlier this year, we convinced Nvidia to provide us with a rough-around-the-edges engineering sample of its high-end reference design (the GeForce GTX 280), with very immature drivers, for a first look at the GPU’s performance potential. At the time, the company was a full month away from shipping this product and its lesser cousin, the GeForce GTX 260, so we didn’t issue a formal verdict. We’ve since obtained and benchmarked a shipping unit.
As interesting as those benchmark numbers proved to be, the story behind this new architecture is even more fascinating. We’ll give you all the juicy details.
But first, let’s explain the new naming scheme: Nvidia has sowed a lot of brand confusion in the recent past, especially with the 512MB 8800 GTS. That card was based on a completely different GPU architecture than the 8800 GTS models with 320MB and 640MB frame buffers. The Green Team hopes to change that with this generation.
The letters GTX now represent Nvidia’s “performance” brand, and the three digits following those letters indicate the degree of performance scaling: The higher the number, the more performance you should expect.
Using 260 as a starting line should give the company plenty of headroom for future products (as well as leave a few slots open below for budget parts).
Manufacturing Process
AMD jumped ahead to a 55nm manufacturing process with the RV670 (the foundation for the company’s flagship Radeon HD 3870); it uses the same process to fabricate the RV770. Nvidia has stuck with the tried-and-true 65nm process for the GeForce 200 series.
Nvidia cites the new part’s long development cycle and sensible risk management as justification; but with the benefit of hindsight, we think Nvidia’s play-it-safe decision was a major strategic mistake.
The GTX 280 is an absolute beast of a GPU: Packing1.4 billion transistors (the 8800 GTX got by with a mere 681 million, and a quad-core Penryn has 820 million), it’s capable of bringing a staggering 930 gigaFLOPs of processing power to any given application (a Radeon HD 3870 delivers 496 gigaFLOPs, and the quad-core Penryn just 96).
Considering the transistor count and the 65nm process size, the GeForce 200 die must be absolutely huge (and Nvidia’s manufacturing yields hideously low). Nvidia declined to provide numbers on either of those fronts.
Processor Cores
The GeForce GTX 280 has 240 stream processors onboard (Nvidia has taken to calling them “processing cores”). This being Nvidia’s second-generation unified architecture (a requiremed feature for compatibility with DirectX 10), each core can handle vertex-shader, pixel-shader, or geometry-shader instructions as needed.
The cores can handle other types of highly parallel, data-intensive computations, too—including physics, a topic we’ll explore in more depth shortly. The less-expensive GeForce GTX 260 is equipped with 192 stream processors. Several months after launch, Nvidia introduced a second-generation GeForce GTX 260 that bumped the processor count to 216.
Although the GeForce 280 has nearly twice as many stream processors as Nvidia’s previous best GPU, it’s still 80 shy of the 320 in AMD’s Radeon HD 3870; what’s more, AMD’s Radeon HD 4870 boasts an astounding 800 stream processors.
Nvidia’s asymmetric clock trick, however, enables its stream processors to run at clock speeds more than double that of the core. And this speed trick has so far overcome AMD’s numerical advantage.
A significant increase in the number of raster-operation processors (ROPs) and the speed at which they operate likely contributes to the new chip’s impressive performance. The 8800 GTX has 24 ROPs and the 9800 GTX has 16, but if the resulting pixels need to be blended as they’re written to the frame buffer, those two GPUs require two clock cycles to complete the operation. The 9800 GTX, therefore, is capable of blending only eight pixels per clock cycle.
The GTX 280 not only has 32 ROPs but also is capable of blending pixels at full speed—so its 32 ROPs can blend 32 pixels per clock cycle. The GTX 260, which is also capable of full-speed blending, is outfitted with 28 ROPs. The absurdly named GTX 260 Core 216, as we mentioned earlier, has more processing cores than the standard GTX 260, but it has the name number of ROPs as the lesser part.
Memory and Clock Speeds
GeForce GTX 280 cards will feature a 1GB frame buffer, and the GPU will access that memory over an interface that’s a full 512 bits wide. AMD’s Radeon 2900 XT, you might recall, also had a 512-bit memory interface, but the company dialed back to a 256-bit interface for the Radeon 3800 series, claiming that the wider alternative didn’t offer much of a performance advantage. That was before Crysis hit the market.
Cards based on both versions of the GTX 260 have 896MB of memory with 448-bit interfaces. Despite the news that AMD had moved to GDDR5 for its next-generation GPUs, Nvidia is sticking with GDDR3, claiming that the technology “still has plenty of life in it.” Judging by the performance of the GTX 280 compared to the Radeon 3870 X2, which uses GDDR4 memory (albeit half as much and with an interface half as wide as the GTX 280’s), we’d have to agree. Nvidia is taking a similar approach to Direct3D 10.1 and Shader Model 4.1: The GTX 280 and GTX 260 don’t support either.
A stock GTX 280 runs its core at 602MHz while its stream processors hum along at 1.296GHz. Memory is clocked at 1.107GHz. Both versions of the GTX 260 have stock core, stream processor, and memory clock speeds of 576MHz, 1.242GHz, and 999MHz, respectively (what, they couldn’t squeeze out an extra MHz to reach an even gig?).
PhysX Connection
When Nvidia acquired the struggling Ageia, we were disappointed—but not surprised—to learn that Nvidia was interested only in the PhysX software. While it wouldn’t be accurate to say that Nvidia has orphaned the hardware, the company has no plans to continue developing the PhysX silicon. What’s more, there is absolutely no Ageia intellectual property to be found in the GTX 200 series silicon—the new GPU had already been taped out when the acquisition was finalized in February.
But Nvidia didn’t acquire Ageia just to put the company out of its misery. Nvidia’s engineers quickly set about porting the PhysX software to the GeForce 8-, 9-, and 200-series GPUs. When Ageia first introduced the PhysX silicon, the company maintained that it was a superior solution to existing CPUs and GPUs because those architectures weren’t specifically optimized for accelerating complex physics calculations. In reality, the PhysX architecture wasn’t as radically different from modern GPU architectures as we’d been told.
The first PhysX part, for example, had 30 parallel cores; the mobile version that ships in Dell’s XPS 1730 notebook PC has 40 cores. Nvidia tells us it took only three months to get PhysX software running on GeForce, and the software will soon be running on every CUDA platform.
SLI and Display Considerations
Both the GeForce GTX 280 and both versions of the GTX 260 have two SLI edge connectors, so they will support three-way SLI configurations. Nvidia hasn’t commented on the possibility of a future single-board, dual-GPU product that would allow quad SLI, but reps have told us they expect the current dual-GPU GeForce 9800 GX2 to fade away.
Nvidia’s reference-design board features two DVI ports and one analog video output on the mounting bracket, with HDMI support available via dongle. The somewhat kludgy solution of bringing digital audio to the board via SPDIF cable remains (we much prefer AMD’s over-the-bus solution). Add-in board partners can choose to offer DisplayPort SKUs for customers who want support for displays with 10-bit color and 120Hz refresh rates.
More Architectural Details
Nvidia’s new GPUs are capable of managing three times as many threads in flight at a given time as the previous architecture. Improved dual-issue performance enables each stream processor to execute multiple instructions simultaneously, and the new processors have twice as many registers as the previous generation.
These performance-oriented improvements allow for faster shader performance and increasingly complex shader effects, according to Nvidia. In the new Medusa demo, a geometry shader enables the mythical creature to turn a warrior to stone with a single touch. This isn’t a simple texture change or skinning operation—the stone slowly creeps up the warrior’s leg, torso, and face until he is completely transformed.
Nvidia still perceives gaming as a critically important market for its GPUs, but the company is also looking well beyond that large niche, market. Through its CUDA (Compute Unified Device Architecture) initiative, the company is taking on an increasing number of apps that have traditionally been the responsibility of the host CPU. Nvidia isn’t looking to replace the CPU with a GPU; it’s just trying to convince consumers that the GPU is at least as important as the CPU.
CUDA applications will run on any GeForce 8- or 9-series GPU, but the GeForce 200 series delivers an important advantage over those architectures: support for the IEEE-754R double-precision floating-point standard.
This should make the new GPUs—and CUDA in general—even more attractive to users who develop or run applications that rely heavily on floating-point math. Such applications are common not only in the scientific, engineering, and financial markets, but also in the mainstream consumer marketplace (for everything from video transcoding to digital photo and video editing).
Power Considerations
Nvidia has made great strides in reducing the power consumption of its GPUs, and the GeForce 200 series promises to be no exception. In addition to supporting Hybrid Power (a feature that can shut down a relatively power-thirsty add-in GPU when a more economical integrated GPU can handle the workload instead), these new chips will have performance modes optimized for times when Vista is idle or the host PC is running a 2D application, when the user is watching a movie on Blu-ray or DVD, and when full 3D performance is called for. Nvidia promises the GeForce device driver will switch between these modes based on GPU utilization in a fashion that’s entirely transparent to the user.
One chip, 1.4 Billion Transistors
You could fit nearly six Penryns onto a single GeForce GTX 280 die, although a portion of the latter part’s massive size can be attributed to the fact that it’s manufactured using a 65nm process, compared to the Penryn’s more advanced 45nm process.
Nvidia packs 240 tiny processing cores into this space, plus 32 raster-operation processors, a host of memory controllers, and a set of texture processors. Thread schedulers, the host interface, and other components reside in the center of the die. With technologies like CUDA, Nvidia is increasingly targeting general-purpose computing as a primary application for its hardware, reducing its reliance on PC gaming as the raison d’être for such high-end GPUs.
Price/performance comparison?
Submitted by Zazubovich on Wed, 02/11/2009 - 7:18pm
I'd like to see a price performance ratio with NewEgg or Pricewatch or some other source to get at performance. AMD usually wins price/performance but the cheap quads Intel put out challenged that notion.
I also wanna know, just for kicks, when is Microsoft or Apple going to start optimizing their software (OS and applications) for 64 bit and for multi-core processors? Isn't a lot of what the chip makers putting out being simply wasted because only Adobe and a few others have bothered to optimize their software for multicore? Games might be a lot faster if they used more than one core, and its been a few years now, right? Shouldn't something in the development pipeline be able to take advantage of 2+ cores and rock out with its code out?
I totally agree
Submitted by da_samman on Thu, 02/12/2009 - 7:31am
I totally agree. Are there ANY games out there that can take advantage of 2 cores or more? If not, then why should I buy anything more than a high end Core 2 Duo and put the money towards a kickass videocard, maybe 2, and/or maybe even a PCI-Express SoundBlaster? Just some food for thought.
Sincerely yours, from Fort Campbell, KY,
SGT Samuel E. McClard II
Life's a journey, enjoy the ride!!
.
Submitted by sasquatch42 on Wed, 02/11/2009 - 3:11pm
you should have used Ph2 720. Stick a better Video card in the rig with the saved money. Much better frame rate improvement.
zalman 9900?
Submitted by Captain on Wed, 02/11/2009 - 3:10pm
wheres the new zalman cooler at? its your best tested cooler, so its kinda funny you dont recommend it for building a pc. plus the zalman 9900 is getting unfairly beaten over the head on newegg by stupid reviewers, i hate people sometimes. great article though, very informative and lengthy, ill definitely recommend people new to building pc's to this.
Memory Boo Boo
Submitted by AaronDaub on Wed, 02/11/2009 - 2:56pm
I spy an error! When talking about the official supported memory speed of the Core i7, it should read DDR3/1333 which is PC3 10666 not 1066.
I agree with da_saman...I
Submitted by dmstr23 on Wed, 02/11/2009 - 2:12pm
I agree with da_saman...I believe the build-your-own pc guide should have been revised with the new parts which present a different build experience altogether. I also noticed a lot of the writing about "why we chose the parts" was also from the article in an old issue. I do, however, commend you guys for a great overview of the parts out today and how to get the maximum potential out of your pc.
New guide, old info
Submitted by da_samman on Wed, 02/11/2009 - 12:45pm
The actual section where you build the rig looks like it is utilizing the old guide where they used the Stacker case. Shouldn't the pictures and the writing reflect the new parts?
Sincerely yours, from Ft. Campbell, KY,
SGT Samuel E. McClard II
Life's a journey, enjoy the ride!!
MicroCenter....
Submitted by maniacm0nk3y on Wed, 02/11/2009 - 8:11am
I just learned about them, love their deals-saving a lot on my case and the Core i7 920....but they never get new stock. I have been waiting almost a week, going on 2 for them to get more Core i7s because they are out of stock right now. Many places get new stuff on Tuesdays....doesn't seem to be the deal here.
NCIX
Submitted by mlauzon on Wed, 02/11/2009 - 11:26am
Give NCIX a try, if you're in the US here is the URL:
If in Canada (which is where I am):
There prices are a bit more expensive, but they do price matching...so you're able to get cheaper prices; also they tend to do surprise sales, etc.
Michael
Looks like they took it off.
Submitted by maniacm0nk3y on Wed, 02/11/2009 - 11:32am
Looks like they took it off. They don't have the banner ad for it, and looking at the "processors" section shows nothing.
Microcenter has i7 for $229
Submitted by unitymind on Thu, 02/12/2009 - 2:15am
Microcenter has i7 920 for $229 right now...you cant go wrong! 02/11/2009
2/12/09 - Looks like it is off the website search...odd. I bought mine about 3 weeks ago when I got the ad in an email. Paid $229 for it - couldnt believe it!
I found the link http://www.microcenter.com/single_product_results.phtml?product_id=0300438
Also to their ad this month is BYOPC: http://microcenter.com/specials/catalogs/broadsheet.html
Some things I felt were
Submitted by MAXPCreader07 on Tue, 02/10/2009 - 9:52pm
Some things I felt were missing:
1. AMD's Phenom II (Deneb)
2. More AMD boards
3. A Thermalright HSF
4. One of Antec's gaming cases (like the 902 or 1200)
This should have been
Submitted by probablecause on Wed, 02/11/2009 - 8:34pm
This should have been called, "Build the Perfect Intel Based PC"
Feature
Review
Feature
Feature
Feature







