Everything You Need to Know About Nvidia's GF100 (Fermi) GPU
Texturing
Each SM also contains four texture units, which computes texture addresses and can fetch four texture samples per clock, which can be filtered. This is a departure from Nvidia’s past generation, where texture units were shared among several SMs.
Overall texture performance has also been improved by the dedicated L1 texture cache, plus the unified L2 cache. The GF100 incorporates new texture modes required by DirectX 11. Perhaps the most interesting outcome of the increased performance of the new texture units is improved shadowing performance due to implementing DX11’s four-offset Gather4 directly in hardware. Four texels can be fetched from a 128x128 grid with one instruction. This, plus jittered sampling (available in previous generations) can substantially improve soft shadowing performance, which was a weak spot in previous generations.

Improved texturing performance allows for more effective use of soft shadows in PC games.
Render Output
The GF100’s six ROP partitions are laid out around the L2 cache. Each partition contains eight ROP units, for a total of 48 ROPs on the initial chip. This is double the number of ROPs per partition over Nvidia’s last generation GT200 GPU. Each ROP unit itself has been streamlined and enhanced to improve performance. The net result is a big step up in anti-aliasing performance. Nvidia suggests that 8x AA is averaging only 9% slower than 4x AA mode on the GF100 – and 230% faster than the 285 GTX.
Nvidia is also extending its proprietary coverage sampling AA modes (CSAA to a 32x mode, which will improve image quality in scenes that use billboarded transparency, such as foliage, railings, fencing and similar types of scenery items.

Using 32x CSAA will improve AA in areas with heavy use of small objects plus transparency.
GPU Compute for Gaming
Nvidia also spent a great deal of time talking up the use of GPU compute in gaming. We’re all familiar with Nvidia’s aggressive promotion of its own PhysX physics engine, though there’s no reason other physics middleware couldn’t be implemented. The soon-to-be released game Dark Void will implement some creative particle and fluid dynamic effects into the game, improving the overall “gee-whiz” factor. This will differentiate Dark Void from PCs running on competing hardware, but perhaps more importantly, from the console version.
Other potential aspects for GPU compute in gaming includes improved post-processing effects (better motion blur, fluid dynamics, cloth effects, realistic hair and more.

Will we finally see realistic hair in games? Maybe…
Nvidia’s Emmett Kilgariff also spent some time extolling the virtues of ray tracing, but the demo shown ran at less than one frame per second on two GF100 cards. Note that the car rendering demo involved a high resolution, fully ray traced scene. Kilgariff did note that games could used mixed-mode rendering, using ray tracing only on small portions of the scenery that required more realistic reflections, for example.
One very cool demo Nvidia had on tap was titled Supersonic Sled. This demo was a big exercise in physics, with lots of fluid dynamic effects, particle effects and many thousands of objects blowing up, colliding and otherwise interacting. It’s a great example of something that could be done in a true DirectX 11 title, given enough GPU horsepower.
The Hardware
We’ve covered a lot of ground regarding the architecture of the GPU. What about the hardware itself? When will we see GF100 cards?
Nvidia’s technical marketing director, Nick Stam, emphatically noted that GF100 would be a “Q1 product,” implying we’ll see first cards before the end of March. However, Nvidia refused to disclose any details on pricing, quantity, die size or yield. We do know that the GPU is being manufactured on TSMC’s 40nm process technology, and uses over 3 billion transistors.
Given that AMD’s Radeon HD 5870 is built on the same 40nm process, has 2.15 billion transistors, and is a 334mm2 die, it’s very likely the GF100 chip approaches 500mm2. That’s one big chip, and high yields will be necessary to ensure the cost of boards isn’t ridiculously high.
Nvidia representatives also noted that the GPU cooler wasn’t final as well. We certainly hope it wasn’t final. One demo, which featured a dual card SLI configuration, was noticeably loud. When asked about power usage, Nvidia’s said that the GF100 would “… use more power than our current high end.”
Given that statement, Nvidia will be hard pressed to match the current AMD Radeon HD 5870’s 27W at idle and 188W at full throttle. However, if the final GF100 hardware is substantially faster than the HD 5870, that would justify higher power consumption. We really hope, however, that the card isn’t as noisy as the reference hardware on display at the briefing.
What about multimonitor support? AMD has been generating lots of press over its Eyefinity multi-display technology, though how many people actual take advantage of more than two displays with a single card isn’t known. Shipping GF100 cards will support two displays out of the box; if you want more than two, you’ll need a second card.