Core i7 Dissected and Benchmarked! Does Intel’s Next-Generation Chip Live Up to the Hype? Hell Yeah!
Tomorrow’s Performance Today
You can’t recompile the world. That’s the lesson Intel learned with the Pentium 4, which kicked ass with optimized code but ran like a Yugo with legacy apps. And even with Intel’s nearly limitless resources, it couldn’t get every developer to update software for the P4.
Intel took those lessons to heart with the stellar Core 2 and continues in that vein with Core i7, which is designed to run even existing code faster. That’s largely due to the Hyper-Threading, massive bandwidth, and low latency in the new chip, but other touches also help.
Loop conditions are common programming techniques that repeat the same task in a CPU. With Core i7, an improved loop detector routine will save power and boost performance by detecting larger loops and caching what the program asks for. Intel also polished its branch prediction algorithms. Branch predictions are those yes/no questions a CPU faces. If the CPU guesses wrong on what the program wants, the assembly-line-like pipeline inside the CPU must be cleared and the process started anew. New SSE4.2 instructions also make their way into Core i7, but they will be of little benefit to desktop users. Since Intel is designing the chip for server use as well, the new instructions are mainly to help speed up supercomputing and server-oriented workloads.
The main takeaway is that while some of the changes are radical, Intel is being pragmatic with its chip design—you won’t have to go out and buy new software to experience the CPU’s performance potential.
Making Better Connections
With a Hyper-Threaded quad core, even enthusiasts are unlikely to see the need for a multi-processor machine; nevertheless, one of the new features in Core i7 directly addresses a weakness in Intel’s current lineup when it comes to multi-CPU machines. As you know, Intel currently uses a front-side-bus technology to tie its multiprocessor machines together. As you might imagine, problems arise when a single front-side bus is sharing two quad-core CPUs.
With so many cores churning so much data, the front-side bus can become gridlocked. Intel “fixed” this issue by building chipsets with two front-side buses. But what happens when you have a machine with four or eight CPUs? Since Intel couldn’t keep adding front-side buses, it took another page from AMD’s playbook by building in direct point-to-point connections over what it calls a Quick Path Interconnect. Server versions of Core i7 feature two QPI connections (desktop versions get just one), which can each talk at about 25GB/s, or double what a 1,600MHz front-side bus can achieve. AMD fans, of course, will point out that the fastest iteration of AMD’s chip-to-chip conduit, dubbed HyperTransport 3.1, is twice as fast as the current QPI.
QPI combined with the on-die memory controller will also make an Intel server or workstation a NUMA, or non-uniform memory access, design. Since each CPU has a direct link to its own individual memory DIMM, what happens if CPU 1 needs to access something that’s stored in the RAM being controlled by CPU 2? In this case, it must use the QPI link to access the second CPU’s memory controller to the RAM to get the data. This will slow things down a bit, but Intel says its tests indicate that even given this scenario, the memory access is still faster than what is possible with the current front-side-bus multiprocessor design.
The Power Within
It’s a known fact that overclocking can decrease the life of your CPU; thus, Intel has always discouraged end-users from overclocking its CPUs. With Core i7, Intel reverses its stance and actually overclocks the CPU for you! Of course, Intel would not describe its Turbo mode as overclocking, and, technically, it isn’t. While pushing your 2.66GHz Core 2 Quad to 3.2GHz would likely strain its thermal and voltage specs, the new Core i7 CPUs feature an internal power control unit that closely monitors the power and thermals of the individual cores.
This wouldn’t help by itself, though. Intel designed the Core i7 to be very aggressive in power management. With the previous Core 2, power to the CPU could be lowered only so far before the chip would crash. That’s because while you can cut power to large sections of the execution core, the cache can tolerate only so much decrease in power before blowing up. With Core i7, Intel separates the power circuit, so the cache can be run independently. This lets Intel cut power consumption and thermal output even further than before. Furthermore, while the Core 2 CPUs required that all the cores were idle to reduce voltage, with Core i7, individual cores can be turned off if they’re not in use.
Turbo mode exploits the power savings by letting an individual core run at increased frequencies if needed. This again follows Intel’s mantra of improving performance on today’s applications. Since a majority of today’s applications are not threaded to take full advantage of a quad core with Hyper-Threading, Turbo mode’s “overclocking” will make these applications run faster. For more information on how you’ll set up Turbo mode, read our sidebar below.
Intel’s Turbo Mode Technology
Turbo mode might sound like a feature left over from the TV series Knight Rider, but it’s more neat than cheesy. You already know that Core i7 CPUs closely monitor the power and thermals of the chip and use any leftover headroom to overclock the individual cores as needed. But just how does it work?
From what we’ve surmised by examining an early BIOS, you will be able to set each type of core scenario based on how far you want to overclock, given the load. For example, with applications that push one thread, you could set the BIOS to overclock, or rather, turbo that single core by perhaps three multipliers over stock. You would do the same for two-, three-, and four-core scenarios.
The good news is that you’ll get fine-grain control over the Turbo mode in the upcoming Core i7 CPUs.
The BIOS will also take into account the thermal rating, or TDP, of the cooling system you’re using. If you’re using, say, a heatsink rated for 150 TDP, the BIOS will overclock to higher levels than it would with a 130 TDP unit. You would manually set the heatsink’s rating in the BIOS, as there’s no way for the heatsink to communicate with the motherboard directly.
New Socket on the Block
So all this CPU goodness and performance will drop right into that $450 LGA775 board you just bought, right? Of course not. Ung’s Law dictates that the minute you buy expensive hardware, something better will arrive that makes what you just bought obsolete.
Intel isn’t doing this just to piss people off (although a history of such behavior has had that result). Since Core i7 moves the memory controller directly into the CPU, Intel added a load of pins that go directly to the memory modules. The new standard bearer for performance boxes is the LGA1366 socket. It looks functionally similar to the LGA775, with the obvious addition of more pins. More pins also means a bigger socket, which means your fancy heatsink is also likely headed to the recycle bin. LGA1366 boards space the heatsink mounts just a tad bit wider, just enough to make your current heatsink incompatible. There’s a chance that some third-party heatsink makers will offer updated mounts to make your current heatsink work, but that’s not known yet.
What will be interesting to heatsink aficionados is Intel’s encouragement that vendors rate the heatsinks using a unified thermal rating that will be tied to the Turbo mode settings. For more information, see the Turbo mode sidebar below.
The Second Coming
Intel is adopting more than just AMD’s integrated memory controller with its new Core i7 chips; it’s also adopting AMD’s abandoned Socket 940/754 two-socket philosophy. For the high end, the LGA1366 socket will offer tri-channel RAM and a high-performance QPI interface. For mainstream users, Intel will offer a dual-channel DDR3 design built around a new LGA1066 socket late next year. LGA1066 isn’t just about shedding one channel of DDR3 though; LGA1066-based CPUs will also bring direct-attach PCI Express to the table.
Instead of PCI Express running through the chipset, as it does with existing Core 2 and the new performance Core i7, PCI-E will reside on the die of LGA1066 CPUs. With the PCI-E in the CPU itself, Intel will reuse its fairly slow DMI interface to connect the CPU to a single-chip south bridge. The two chips Intel will introduce are the quad-core Lynnfield and the dual-core Havendale. Havendale CPUs will actually feature a newly designed graphics core inside the heat spreader that will talk to the CPU core via a high-speed QPI interface. Both chips will feature Hyper-Threading on all cores.
Many AMD users got a royal screwing when the company abandoned both Socket 940 and Socket 754 for a unified Socket 939; could Intel do something similar? We asked Intel point blank whether LGA1366 would eventually be abandoned for LGA1066; the company told us it fully intends to support both platforms.
The Core i7 Family
| |
Core i7-965 Extreme Edition |
Core i7-940 |
Core i7-920 |
| Clock Speed |
3.2GHz |
2.93GHz |
2.66GHz |
| L2 Cache |
1MB |
1MB |
1MB |
| L3 Cache |
8MB |
8MB |
8MB |
| Process |
45nm |
45nm |
45nm |
| Transistors |
731 million |
731 million |
731 million |
| QPI Speed |
6.4GT/s |
4.8GT/s |
4.8GT/s |
| Multiplier Lock |
No |
Yes |
Yes |
| Default Multiplier |
24 |
22 |
20 |
| Volume Pricing |
$999 |
$562 |
$284 |