Hands-On with Nvidia's New GeForce GTX 280!

The Details Disclosed
The whole truth and nothing but the truth (as far as we know it)
Watching the ongoing race between AMD and Nvidia to build the ultimate graphics processor reminds us of the tale of the tortoise and the hare. AMD has played the hare, aggressively bounding ahead of Nvidia in terms of process size, number of stream processors, frame buffer size, memory interface, die size, and even memory type. Yet Nvidia always manages to snag the performance crown. The GeForce 200 series is but the latest example.
We convinced Nvidia to provide us with an early engineering sample of its high-end reference design (the GeForce GTX 280), with very immature drivers, for a first look at the GPU’s performance potential. At the time of this writing [Ed note: late May], the company was still a full month away from shipping this product, and its lesser cousin, the GeForce GTX 260, so we won’t issue a formal verdict in this issue (our full hands-on review should be online by the time this issue reaches you).
As interesting as the benchmark numbers are, the story behind this new architecture is even more fascinating. We’ll give you all the juicy details, but first, let’s explain the new naming scheme: Nvidia has sowed a lot of brand confusion in the recent past, especially with the 512MB 8800 GTS. That card was based on a completely different GPU architecture than the 8800 GTS models with 320MB and 640MB frame buffers. The Green Team hopes to change that with this generation.
The letters GTX now represent Nvidia’s “performance” brand, and the three digits following those letters will indicate the degree of performance scaling: The higher the number, the more performance you should expect. Using 260 as a starting line should give the company plenty of headroom for future products (as well as leave a few slots open below for budget parts).
Manufacturing Process
AMD jumped ahead to a 55nm manufacturing process with the RV670 (the foundation for the company’s flagship Radeon HD 3870), but Nvidia stuck with the tried-and-true 65nm process for the GeForce 200 series. Nvidia cites the new part’s long development cycle and sensible risk management as justification.
The GTX 280 is an absolute beast of a GPU: Packing 1.4 billion transistors (the 8800 GTX got by with a mere 681 million, and a quad-core Penryn has 820 million), it’s capable of bringing a staggering 930 gigaFLOPs of processing power to any given application (a Radeon HD 3870 delivers 496 gigaFLOPs, while the quad-core Penryn musters just 96).
Considering the transistor count and the 65nm process size, the GeForce 200 die must be absolutely huge (and Nvidia’s manufacturing yields hideously low). Although Nvidia declined to provide numbers on either of those fronts, those two questions will remain academic in the absence of fresh and considerable competition from AMD. (And for the record, all AMD would tell us about its new part is that we can expect it “real soon.”)
You're Staring at 1.4 Billion Transistors
You could fit nearly six Penryns onto a single GeForce GTX 280 die, although a portion of the latter part’s massive size can be attributed to the fact that it’s manufactured using a 65nm process, compared to the Penryn’s more advanced 45nm process.
Nvidia packs 240 tiny processing cores into this space, plus 32 raster-operation processors, a host of memory controllers, and a set of texture processors. Thread schedulers, the host interface, and other components reside in the center of the die.
With technologies like CUDA, Nvidia is increasingly targeting general-purpose computing as a primary application for its hardware, reducing its reliance on PC gaming as the raison d’être for such high-end GPUs.
Processor Cores
The GeForce GTX 280 has 240 stream processors onboard (Nvidia has taken to calling them “processing cores”). This being Nvidia’s second-generation unified architecture, each core can handle vertex-shader, pixel-shader, or geometry-shader instructions as needed. The cores can handle other types of highly parallel, data-intensive computations, too—including physics, a topic we’ll explore in more depth shortly. The GeForce GTX 260 is equipped with 192 stream processors.
Although the GeForce 280 has nearly twice as many stream processors as Nvidia’s previous best GPU, it’s still 80 shy of the 320 in AMD’s Radeon HD 3870. But Nvidia’s asymmetric clock trick, which enables its stream processors to run at clock speeds more than double that of the core, has so far obliterated AMD’s numerical advantage. In fact, a single GeForce GTX 280 proved to be an average of 28 percent faster than the dual-GPU Radeon HD 3870 X2 with real-world games running on Windows XP, and it was 24 percent faster running Vista.
We didn’t have an opportunity to benchmark the GTX 280 in SLI mode (or the GTX 260 at all), but a single GTX 280 beat two GeForce 9800 GTX cards running in SLI by a 9-percent margin, thanks in large measure to significantly improved performance with Crysis. (Turn to page 60 for complete benchmark results.)
A significant increase in the number of raster-operation processors (ROPs) and the speed at which they operate likely contributes to the new chip’s impressive performance. The 8800 GTX has 24 ROPs and the 9800 GTX has 16, but if the resulting pixels need to be blended as they’re written to the frame buffer, those two GPUs require two clock cycles to complete the operation. The 9800 GTX, therefore, is capable of blending only eight pixels per clock cycle.
The GTX 280 not only has 32 ROPs but is also capable of blending pixels at full speed—so its 32 ROPs can blend 32 pixels per clock cycle. The GTX 260, which is also capable of full-speed blending, is outfitted with 28 ROPs.
Memory and Clock Speeds
GeForce GTX 280 cards will feature a 1GB frame buffer, and the GPU will access that memory over an interface that’s a full 512 bits wide. AMD’s Radeon 2900 XT, you might recall, also had a 512-bit memory interface, but the company dialed back to a 256-bit interface for the Radeon 3800-series, claiming that the wider alternative didn’t offer much of a performance advantage. That was before Crysis hit the market.
Cards based on the GTX 260 will have 896MB of memory with a 448-bit interface. Despite the news that AMD will move to GDDR5 with its next-generation GPUs, Nvidia is sticking with GDDR3, claiming that the technology “still has plenty of life in it.” Judging by the performance of the GTX 280 compared to the Radeon 3870 X2, which uses GDDR4 memory (albeit half as much and with an interface half as wide as the GTX 280’s), we’d have to agree. Nvidia is taking a similar approach to Direct3D 10.1 and Shader Model 4.1: The GTX 280 and GTX 260 don’t support either.
A stock GTX 280 will run its core at 602MHz while its stream processors hum along at 1.296GHz. Memory will be clocked at 1.107GHz. The GTX 260 will have stock core, stream processor, and memory clock speeds of 576MHz, 1.242GHz, and 999MHz, respectively (what, they couldn’t squeeze out an extra MHz to reach an even gig?).