Johannes Kepler once wrote, “Nature uses as little as possible of anything.”
Nvidia’s latest GPU, code-named Kepler after the German mathematician, looks to be inspired by that quote, as much as by the original Kepler’s mathematical prowess. The new GPU—the GTX 680— offers superb graphics horsepower, but requires only two 6-pin PCI Express power connectors. It’s a big departure from the last-generation GTX 580, which was fast, but power hungry.
We’ll talk about performance shortly, but let’s first look at Kepler’s underlying architecture.
Kepler GPUs are built using a 28nm manufacturing process, allowing Nvidia to build in more circuits in less die area.
Like Fermi, Kepler is a modular architecture, allowing Nvidia to scale the design up or down by adding or subtracting functional blocks. In Fermi, Streaming Multiprocessors, or SMs for short, are the basic building blocks from which the GTX 500 line of GPUs were built. The CUDA core counts inside the SMs could vary. For example, each SM block in the GTX 560 Ti contained 48 CUDA cores, while the GTX 580 SM was built with 32 cores. The GTX 580, on the other hand, had a total of 16 SMs of 32 cores each, for a total of 512 CUDA cores.
Kepler’s functional block is the SMX. Kepler GPUs are built on 28nm, which allowed Nvidia’s architects to scale things a bit differently. So Nvidia increased the number of cores inside a Kepler SMX to a stunning 192 CUDA cores each.
The GTX 680 GPU is built from eight SMX blocks, arranged in paired groups called GPCs (graphics performance clusters). This gives the GTX 680 a whopping 1,536 CUDA cores.
The SMX doesn’t just house the CUDA cores, however. Built into each SMX is the new Polymorph engine, which contains the hardware-tessellation engine, setup, and related features. Also included are 16 texture units. This gives the GTX 680 a total of 128 texture units (compared with the 64 texture units built into the GTX 580). Interestingly, the cache has changed a bit—each SMX still has 64KB of L1 cache, part of which can be used as shared memory for GPU compute. However, that means the total L1 cache has shrunk a bit, since there are only eight SMX units in the GTX 680, not 16 as with GTX 580. The L2 cache is also smaller, at 512KB rather than the 768KB of Fermi.
Another interesting change is that pre-decoding and dependency checking has been offloaded to software, whereas Fermi handled it in hardware. What Nvidia got in return was better instruction efficiency and more die space. Interestingly, the transistor count of the GTX 680 GPU is 3.5 billion, up only a little from the 3 billion of the GTX 580. The die size has shrunk, however, to a much more manageable 294mm2—by contrast, Intel’s Sandy Bridge 32nm quad-core CPU die is 216mm2.
One of the cooler new features from an actual application perspective is bindless textures. Prior to Kepler, Nvidia GPUs were limited to 128 simultaneous textures; Kepler boosts that by allowing textures to be allocated as needed within the shader program, with up to 1 million simultaneous textures available. It’s doubtful whether games will use that many textures, but certain types of architectural rendering might benefit.
Nvidia continues to incorporate its proprietary FXAA antialiasing mode, but has added a new mode that it’s calling TXAA. The “T” stands for “temporal.” TXAA in its standard mode is actually a variant of 2x multisampling AA, but varies the sampling pattern over time (i.e., over multiple frames.) The result is better edge quality than even 8x MSAA, but the performance hit is more like 2x multisampling.
Another cool new feature that will also eventually be supported in older Nvidia GPUs is Adaptive Vsync. Currently, if you lock vertical sync to your monitor’s refresh rate (typically 60Hz, but as high as 120Hz on some displays), you’ll get smoother gameplay. However, you might see a stutter as the frame rate drops to 30fps or below, due to the output frames being locked to vsync. On the other hand, if you run with vsync off, you may see frame tearing, as new frames are sent to the display before the old one is complete.
Adaptive Vsync locks the frame rate to the vertical refresh rate, until the driver detects the frame rate dropping below the refresh rate. Vsync is then disabled temporarily, until the frame rate climbs above the monitor refresh rate. The overall result is much smoother performance from the user’s point of view.
Finally, Nvidia has beefed up the video engine, building in a dedicated encode engine capable of encoding H.264 high-profile video at 4x – 8x real time. Power usage is low in this mode, consuming single-digit watts, rather than the shader-driven tens of watts of past GPUs.
Nvidia built an improved circuit board to host the GTX 680 GPU. The board will ship with 2GB of GDDR5, with the default memory clock running at 6008MHz—the first board to ship with 6GHz GDDR5. The GTX 680 also introduces GPU Boost, an idea borrowed from the world of x86 CPUs. GPU Boost increases the core clock speed if the internal thermal environment permits. This allows games that offer lighter overall load to get additional performance as needed. In another departure, the GTX 680 offers a single clock—the shader clocks are now the same as the core clock frequency. Product boxes will likely show both the base and boost clocks on the box. As with recently released AMD products, the GTX 680 is fully PCI 3.0 compliant.
A few notable things spring to mind when examining the specs. First, this is a 256-bit wide memory interface, as opposed to the 384-bit interface of AMD’s Radeon HD 7970. Nvidia makes up for this with both improved memory-controller efficiency plus higher clocked GDDR5. The frame buffer is “only” 2GB, but that was enough to run our most demanding benchmarks at 2560x1600 with all detail levels maxed out and 4x MSAA enabled.
Also worth calling out is Nvidia’s new devotion to power efficiency. The GTX 680 is substantially more power efficient than its predecessor, with a maximum TDP of just 195W. Idle power is about 15W. We saw the power savings in our benchmarking.
The GTX 680 is also the first single-GPU card from Nvidia to support more than two displays. Users can add up to four displays using all four ports. Nvidia was strangely reticent about discussing its DisplayPort 1.2 implementation, which should allow for even more monitors once 1.2 capable monitors and hubs arrive on the scene later this year.
The GTX 680 cooling system is a complete redesign, using a tapered fin stack, acoustic dampening, and a high-efficiency heat pipe. The card was very quiet under load, though perceptually about the same as the XFX Radeon HD 7970’s twin-cooling-fan design. Of course, having a more power efficient GPU design is a big help. The GTX 680 is no DustBuster.
We pitted the GTX 680 against two previous GTX 580 designs, the slightly overclocked EVGA GTX 580 SC and the more heavily overclocked EVGA GTX 580 Classified. The XFX Radeon HD 7970 Black Edition was also included. We ran our usual benchmark suite at 2560x1600 with 4x MSAA enabled, along with the FutureMark and Unigine synthetic tests.
The GTX 680 clearly takes most of the benchmarks, though the XFX HD 7970 eked out a couple of wins. Note that it’s possible some of these benchmarks are actually becoming CPU limited, even with 4x MSAA, but it’s hard to say for certain. That’s very likely the case with HAWX 2, where the older GTX 580 Classified—albeit a heavily overclocked GTX 580—manages a 1fps advantage.
The GTX 680’s idle power ratings are impressive, too. The total system power at idle was just 116W, 8W better than the XFX card. However, Nvidia doesn’t incorporate anything like AMD’s ZeroCore technology, which reduces power to a bare 3W when the display is turned off (as when Windows 7 blanks the screen.) Even better is the power under load—the GTX 680 is the only GPU to run at under 300W at full load.
The GTX 680 we tested is Nvidia’s reference card, and it’s likely that some manufacturers will ship retail cards at higher core clock speeds. Retail cards will be available upon launch (March 22). Nvidia is pricing the card at $500, but prices may vary a bit depending on manufacturer. That $500 price tag substantially undercuts AMD’s Radeon HD 7970 pricing by as much as $100, which makes the GTX 680 look even better for high-end gamers.
The GTX 680 looks to regain Nvidia the performance crown briefly held by AMD, and is priced lower, to boot. What’s most intriguing, however, is that Kepler likely has some headroom for even greater power consumption, which may allow Nvidia to ship an even higher-end GPU when needed. The performance horserace continues, and while the top spot now belongs to Nvidia, the company also needs to deliver midrange GPUs to compete with AMD’s more recent product moves. In the long run, gamers will benefit from more choices and competition. It’s a win all around.