We know, you just got your rig right where you want it, complete with a primo CPU, a kick-ass videocard config, and seemingly limitless storage. So forgive us if we dangle the temptation of better, faster hardware in front of your face. We’re just doing our job. Over the last few weeks, we’ve been grilling our industry contacts for news of what computing delights await power users in the months and years to come. And delightful the future is: CPUs with eight cores, GPUs that run games as a pastime, mobos with both SLI and CrossFire support, and hard drives so large your data will feel puny and inadequate. And that’s just part of it.
Look at it this way: Our 2009 technology preview gives you advance warning about the hardware that will soon occupy your dreams, so you can start saving your pennies and plotting your next upgrade path today.
[ Editor's Note: This feature originally ran in our December 2008 issue. The Intel Core i7 section has been expanded and incorporated into our full review, which you can find here. ]
As a buttoned-down company, Intel rarely likes to make sweeping changes, but its upcoming Core i7 CPU is a major break from the past. Gone is the ancient front-side bus that connects all of the current-gen CPU cores. Instead, cores will communicate via a high-speed crossbar switch, and different CPUs will communicate via a high-speed interconnect.
Also on the outs is the need for an external memory controller. Intel, which has relied on gluing two dual-core chips together under the heat spreader to make its quad-core CPUs, is now placing all four cores on a single die.
Even overclocking, which was once verboten to even talk about within 10 miles of Intel’s HQ, is now automatically supported. Intrigued? You should be. Intel’s Core i7 is the most radical new design the company has taken in decades.
One of Core i7’s most significant changes is the inclusion of an integrated memory controller. Instead of memory accesses going from the CPU across a relatively slow front-side bus to the motherboard chipset and finally to the RAM, an IMC will eliminate the need for a front-side bus and external memory controller. The result is dramatically lower latency than was found in the Core 2 and Pentium 4 CPUs.
Why can’t the memory controller on the motherboard simply be pushed to higher speeds to match an IMC? Remember, when you’re talking about a memory controller residing directly in the core, the signals have to travel mere millimeters across silicon that’s running at several gigahertz. With an external design, the signals have to travel out of the CPU to a memory controller in the chipset an inch or so away. It’s not just distance, either—the data is traveling across a PCB at far, far slower speeds than it would if it were within the CPU. In essence, it’s like having to go from an interstate to an unpaved, bumpy road.
Of course, if you’re an AMD loyalist, you’re probably bristling at the thought of Intel calling an IMC an innovation. After all, AMD did it first. So doesn’t that make AMD the pioneer? We asked Intel the same question. The company’s response: One: An IMC isn’t an AMD invention and, in fact, Intel had both an IMC and graphics core planned for its never-released Timna CPU years before the Athlon 64. Two: If AMD’s IMC design is so great, why does the Core 2 so thoroughly trash it with an external controller design? In short, Intel’s message to the AMD fanboys is nyah, nyah!
Naturally, you’re probably wondering why Intel thinks it needs an IMC now. Intel says the more efficient, faster execution engine of the Core i7 chip benefits from the internal controller more than previous designs. The new design demands boatloads of bandwidth and low latency to keep it from starving as it waits for data.
The Core i7 CPU is designed to be a very wide chip capable of executing instructions with far more parallelism than previous designs. But keeping the chip fed requires tons of bandwidth. To achieve that goal, the top-end Core i7 CPUs will feature an integrated tri-channel DDR3 controller. Just as you had to populate both independent channels in a dual-channel motherboard, you’ll have to run three sticks of memory to give the chip the most bandwidth possible. This does present some problems for board vendors though, as standard consumer mobos have limited real estate. Most performance boards will feature six memory slots jammed onto the PCB, but some will feature only four. On these four-slot boards, you’ll plug in three sticks of RAM and use the fourth only if you absolutely have to, as populating the last slot will actually reduce the bandwidth of the system. Intel, in fact, recommends the fourth slot only for people who need more RAM than bandwidth. With three 2GB DIMMs, though, most enthusiast systems will feature 6GB of RAM as standard.
Although it may change, Core i7 will support DDR3/1066, with higher unofficial speeds supported through overclocking. Folks hoping to reuse DDR2 RAM with Intel’s budget chips next year can forget about it. Intel has no plans to support DDR2 with a Core i7 chip at this point, and with DDR3 prices getting far friendlier to the wallet, we don’t expect the company to change its mind.
A CPU core can execute only one instruction thread at a time. Since that thread will touch on only some portions of the CPU, resources that are not used sit idle. To address that, Intel introduced consumers to Hyper-Threading with its 3.06GHz Pentium 4 chip. Hyper-Threading, more commonly called simultaneous multi-threading, partitioned the CPU’s resources so that multiple threads could be executed simultaneously. In essence, a single-core Pentium 4 appeared as two CPUs to the OS. Because it was actually just one core dividing its resources, you didn’t get the same performance boost you would receive from adding a second core, but Hyper-Threading did generally smooth out multitasking, and in applications that were optimized for multi-threading, you would see a modest performance advantage. The problem was that very few applications were coded for Hyper-Threading when it was released and performance could actually be hindered. Hyper-Threading went away with the Core 2 series of CPUs, but Intel has dusted off the concept for the new Core i7 series because the transistor cost is minimal and the performance benefits stand to be far better than what the Pentium 4 could ever achieve.
Intel toyed with the idea of redubbing the feature Hyper-Threading 2 but decided against it, as the essential technology is unchanged. So why should we expect Hyper-Threading to be more successful this go around? Intel says it’s due to Core i7’s huge advantage over the Pentium 4 in bandwidth, parallelism, cache sizes, and performance. Depending on the application, the company says you can expect from 10 to 30 percent more performance with Hyper-Threading enabled. Still, Intel doesn’t force it down your throat because it knows many people still have mixed feelings about the feature. The company recommends that you give it a spin with your apps. If you don’t like it, you can just switch it off in the BIOS. Intel’s pretty confident, however, that you’ll leave it on.
You can’t recompile the world. That’s the lesson Intel learned with the Pentium 4, which kicked ass with optimized code but ran like a Yugo with legacy apps. And even with Intel’s nearly limitless resources, it couldn’t get every developer to update software for the P4.
Intel took those lessons to heart with the stellar Core 2 and continues in that vein with Core i7, which is designed to run even existing code faster. That’s largely due to the Hyper-Threading, massive bandwidth, and low latency in the new chip, but other touches also help.
Loop conditions are common programming techniques that repeat the same task in a CPU. With Core i7, an improved loop detector routine will save power and boost performance by detecting larger loops and caching what the program asks for. Intel also polished its branch prediction algorithms. Branch predictions are those yes/no questions a CPU faces. If the CPU guesses wrong on what the program wants, the assembly-line-like pipeline inside the CPU must be cleared and the process started anew. New SSE4.2 instructions also make their way into Core i7, but they will be of little benefit to desktop users. Since Intel is designing the chip for server use as well, the new instructions are mainly to help speed up supercomputing and server-oriented workloads.
The main takeaway is that while some of the changes are radical, Intel is being pragmatic with its chip design—you won’t have to go out and buy new software to experience the CPU’s performance potential.
With a Hyper-Threaded quad core, even enthusiasts are unlikely to see the need for a multi-processor machine; nevertheless, one of the new features in Core i7 directly addresses a weakness in Intel’s current lineup when it comes to multi-CPU machines. As you know, Intel currently uses a front-side-bus technology to tie its multiprocessor machines together. As you might imagine, problems arise when a single front-side bus is sharing two quad-core CPUs. With so many cores churning so much data, the front-side bus can become gridlocked. Intel “fixed” this issue by building chipsets with two front-side buses. But what happens when you have a machine with four or eight CPUs? Since Intel couldn’t keep adding front-side buses, it took another page from AMD’s playbook by building in direct point-to-point connections over what it calls a Quick Path Interconnect. Server versions of Core i7 feature two QPI connections (desktop versions get just one), which can each talk at about 25GB/s, or double what a 1,600MHz front-side bus can achieve. AMD fans, of course, will point out that the fastest iteration of AMD’s chip-to-chip conduit, dubbed HyperTransport 3.1, is twice as fast as the current QPI.
QPI combined with the on-die memory controller will also make an Intel server or workstation a NUMA, or non-uniform memory access, design. Since each CPU has a direct link to its own individual memory DIMM, what happens if CPU 1 needs to access something that’s stored in the RAM being controlled by CPU 2? In this case, it must use the QPI link to access the second CPU’s memory controller to the RAM to get the data. This will slow things down a bit, but Intel says its tests indicate that even given this scenario, the memory access is still faster than what is possible with the current front-side-bus multiprocessor design.
It’s a known fact that overclocking can decrease the life of your CPU; thus, Intel has always discouraged end-users from overclocking its CPUs. With Core i7, Intel reverses its stance and actually overclocks the CPU for you! Of course, Intel would not describe its Turbo mode as overclocking, and, technically, it isn’t. While pushing your 2.66GHz Core 2 Quad to 3.2GHz would likely strain its thermal and voltage specs, the new Core i7 CPUs feature an internal power control unit that closely monitors the power and thermals of the individual cores.
This wouldn’t help by itself, though. Intel designed the Core i7 to be very aggressive in power management. With the previous Core 2, power to the CPU could be lowered only so far before the chip would crash. That’s because while you can cut power to large sections of the execution core, the cache can tolerate only so much decrease in power before blowing up. With Core i7, Intel separates the power circuit, so the cache can be run independently. This lets Intel cut power consumption and thermal output even further than before. Furthermore, while the Core 2 CPUs required that all the cores were idle to reduce voltage, with Core i7, individual cores can be turned off if they’re not in use.
Turbo mode exploits the power savings by letting an individual core run at increased frequencies if needed. This again follows Intel’s mantra of improving performance on today’s applications. Since a majority of today’s applications are not threaded to take full advantage of a quad core with Hyper-Threading, Turbo mode’s “overclocking” will make these applications run faster. For more information on how you’ll set up Turbo mode, read our sidebar below.
So all this CPU goodness and performance will drop right into that $450 LGA775 board you just bought, right? Of course not. Ung’s Law dictates that the minute you buy expensive hardware, something better will arrive that makes what you just bought obsolete.
Intel isn’t doing this just to piss people off (although a history of such behavior has had that result). Since Core i7 moves the memory controller directly into the CPU, Intel added a load of pins that go directly to the memory modules. The new standard bearer for performance boxes is the LGA1366 socket. It looks functionally similar to the LGA775, with the obvious addition of more pins. More pins also means a bigger socket, which means your fancy heatsink is also likely headed to the recycle bin. LGA1366 boards space the heatsink mounts just a tad bit wider, just enough to make your current heatsink incompatible. There’s a chance that some third-party heatsink makers will offer updated mounts to make your current heatsink work, but that’s not known yet.
What will be interesting to heatsink aficionados is Intel’s encouragement that vendors rate the heatsinks using a unified thermal rating that will be tied to the Turbo mode settings. For more information, see the Turbo mode sidebar below.
Intel is adopting more than just AMD’s integrated memory controller with its new Core i7 chips; it’s also adopting AMD’s abandoned Socket 940/754 two-socket philosophy. For the high end, the LGA1366 socket will offer tri-channel RAM and a high-performance QPI interface. For mainstream users, Intel will offer a dual-channel DDR3 design built around a new LGA1066 socket late next year. LGA1066 isn’t just about shedding one channel of DDR3 though; LGA1066-based CPUs will also bring direct-attach PCI Express to the table.
Instead of PCI Express running through the chipset, as it does with existing Core 2 and the new performance Core i7, PCI-E will reside on the die of LGA1066 CPUs. With the PCI-E in the CPU itself, Intel will reuse its fairly slow DMI interface to connect the CPU to a single-chip south bridge. The two chips Intel will introduce are the quad-core Lynnfield and the dual-core Havendale. Havendale CPUs will actually feature a newly designed graphics core inside the heat spreader that will talk to the CPU core via a high-speed QPI interface. Both chips will feature Hyper-Threading on all cores.
Many AMD users got a royal screwing when the company abandoned both Socket 940 and Socket 754 for a unified Socket 939; could Intel do something similar? We asked Intel point blank whether LGA1366 would eventually be abandoned for LGA1066; the company told us it fully intends to support both platforms.
Turbo mode might sound like a feature left over from the TV series Knight Rider, but it’s more neat than cheesy. You already know that Core i7 CPUs closely monitor the power and thermals of the chip and use any leftover headroom to overclock the individual cores as needed. But just how does it work?
From what we’ve surmised by examining an early BIOS, you will be able to set each type of core scenario based on how far you want to overclock, given the load. For example, with applications that push one thread, you could set the BIOS to overclock, or rather, turbo that single core by perhaps three multipliers over stock. You would do the same for two-, three-, and four-core scenarios.
The BIOS will also take into account the thermal rating, or TDP, of the cooling system you’re using. If you’re using, say, a heatsink rated for 150 TDP, the BIOS will overclock to higher levels than it would with a 130 TDP unit. You would manually set the heatsink’s rating in the BIOS, as there’s no way for the heatsink to communicate with the motherboard directly.
It’s been a long two years for AMD. After a ton of trash-talking, its Phenom CPUs failed to impress anyone. The integrated memory controller, the chip-to-chip interconnects, and the native quad-core design all added up to a high-end CPU that was maybe on par with Intel’s slowest quad-core chip. It didn’t help that an esoteric bug plagued Phenom’s launch and left lingering doubts about the CPU.
Next year will be different, the company pledges.
AMD has long acknowledged that one of its mistakes was trying to make a native quad-core design using a 65nm process. The chips proved to be too big and the yields too low. The yields were initially poor enough that the company began taking defective quad-core dies and selling them as tri cores.
With that in mind, AMD has been feverishly working to get its 45nm process online. The good news is that it’s ahead of schedule. AMD says it expects to have 45nm-based CPUs by the end of this year, not well into next year as expected. All indications are that AMD will release 45nm-based Phenoms by late this year with clock speeds finally ramping up to the 3GHz range—a speed Intel pierced more than a year ago.
The bad news is that even a 3GHz die-shrunk Phenom may not be enough to go head-to-head with Intel’s Core i7 in performance. For that, it’ll likely take the company’s Shanghai core, which is on tap for 2009. AMD is playing it much closer to the vest, but the quad-core Shanghai will be followed by a six-core Istanbul CPU at the end of next year. By 2010, AMD expects to have its Magny-Cours chip out with 12 cores in the CPU. Right now, AMD is mainly concentrating on the one bright spot on its roadmap: multi-CPU systems, where its chip-to-chip design makes these configs competitive with Intel’s CPUs.
One thing is clear, with rumors continuing to swirl that AMD is short on cash and may sell off its fabs, the company—which has already seen its fair share of adversity—is facing one of its most trying times.
It’s been a long time since a new vendor entered the 3D graphics market, but that’s exactly what Intel plans to do late in
2009 with Larrabee. Unlike previous videocards from Intel, which used traditional 3D pipelines, fairly standard x86 cores will power Larrabee.
Larrabee will include many x86 cores, but the cores in Larrabee processors will be greatly simplified compared to a modern Core 2 proc. Larrabee CPUs will be based on the Pentium P54C design, updated to include modern features, such as 64-bit support and the inclusion of traditional GPU hardware in the form of texture filtering units. Additionally, Larrabee will feature cache coherency between the many x86 cores, which means that all of the cores will have access to the same high-speed cache, and thus, memory. This is a common feature in monolithic CPU cores, like the AMD Phenom and upcoming Intel Core i7, but it isn’t typically a GPU feature. Cache coherency should give Larrabee a significant advantage over more traditional architectures when it comes to running general-purpose computing applications on the GPU.
So should you start saving your pennies for a Larrabee-powered GPU in 2009? Not yet. We expect that Larrabee will launch in late ’09 with parts targeted at the server community for render farms and scientific applications, followed by mainstream parts designed to upgrade low-end machines that would typically sport integrated graphics. For 2009, at least, Nvidia’s GT 200 and ATI’s RV770 cores (which power the existing GeForce GTX 280 and Radeon 4870 HD, respectively) will remain the top dogs in graphics.
So what exactly is upcoming from ATI and Nvidia? ATI will roll out a slew of parts across all prices based on modified versions of the RV770. The current rumor is that Nvidia will launch a modified version of the GT 200 sometime next year, tweaked to reduce power consumption and die size, that’s more suitable for lower-end parts, as well as dual-GPU cards similar to ATI’s Radeon 4870 X2 boards.
We’ve heard a lot of buzz from Nvidia and ATI about GPUs being used for general-purpose computing, but to date, only a small number of applications actually harness this power: a couple of Folding@Home clients, a video encoder or two, and a whole host of scientific and video-rendering apps that don’t really apply to normal users. Right now, GPU-based computing is essentially a promising science fair project—at least as far as Maximum PC readers are concerned.
In 2009, we expect that to start to change. A host of mainstream apps, including Photoshop CS4, are slated to launch that will impact the scene in a big way. By treating the photos you’re editing as 3D textures, Photoshop is able to take advantage of the astounding performance packed into a modern GPU. What’s the end-user benefit? Lightning-fast zooms, resizes, and scrolling, and that’s just the beginning. And although the first round of video encoders failed to deliver acceptable visual quality at better-than-CPU speeds, we expect to see rapid improvement in visual quality as the GPU-powered encoders mature.
However, we don’t expect to see any massive increase in these GPU-accelerated apps until there’s a common API that lets software vendors write GPU-accelerated programs for Nvidia, ATI, and Intel GPUs. (Right now, apps must be specifically coded for either ATI or Nvidia GPUs.) Both Microsoft and Apple have APIs in the works that will compete to become the final unified standard, but today there’s no way of knowing which will win.
Next up: Motherboard Chipsets, Hard Drives, USB 3.0, and the next PCI-E
It used to be that the CPU north bridge was the star of the core-logic chipset world. With its jurisdiction over RAM, the north bridge’s speed had a significant impact on performance.
But with AMD and now Intel integrating the memory controller into the CPU, the north bridge’s importance just got a whole lot smaller. That’s not to say it doesn’t still have some use. The north bridge contains the circuitry connecting the motherboard to the graphics cards, and the south bridge as well; in integrated graphics boards, the north bridge also contains a GPU core.
Next year, however, the north bridge’s role will be further diminished. AMD is expected to integrate a GPU into the core of its CPUs due next year, and Intel will move graphics and direct-attach PCI-E into the CPU. That pretty much means the end of the north bridge as we’ve known it all these years.
Next year, we’ll get something we’ve long pined for: the ability to run both CrossFire and SLI on a motherboard without the need for extra hardware. The change comes from Nvidia’s flip-flop regarding SLI support on motherboards that use Intel’s X58 chipset. Originally, Nvidia said it would allow SLI only if motherboard vendors integrated a pricey and hot nForce 200 chip into the PCB to “enable” SLI. When board vendors balked, Nvidia decided to enable SLI on X58 boards the company has “certified” for SLI use. The company says that an nForce 200 chip is still recommended for best performance in configs consisting of more than two cards, but not required.
With VIA officially calling it quits, Nvidia is the only third-party chipset vendor still shooting live rounds. But what about in 2009? On Intel, it’s open for debate. Nvidia has said it believes it has a license to build chipsets for Core i7, but Intel has said that’s not quite true. One thing is certain: Nvidia will not have a chipset for the LGA1366-based Core i7 CPU at all, but the company is planning one for LGA1066 CPUs when they’re released later next year. Unfortunately, it’s not clear whether Nvidia can do this without a lawsuit from Intel—and with graphics and PCI-E integration slated for Intel’s LGA1066 CPUs, what would even be the point?
Perpendicular recording has allowed industry giants to push the bounds of drive capacities, with Seagate now leading the pack at 1.5 terabytes. But that trend won’t last forever. Hitachi officials believe they can take perpendicular recording all the way up to an areal density of one terabit per square inch—modern drives hover around 300 to 400Gb per square inch. But by 2010, new storage technologies could take hold.
Most promising is patterned media recording. It allows drive makers to overcome the thermal stability issues that plague perpendicular recording. When a manufacturer wants to increase the areal density of a drive, it shrinks the small bits of magnetic material, or grains, on the drive’s platter. The smaller these grains get in a perpendicular-recording format, the more likely they are to become thermally unstable—switching their magnetization spontaneously and, thus, scrambling the data they store.
Patterned media recording carves actual grooves onto the platter in the form of tracks or individual bits. The latter can be thought of as a swarm of magnetic islands. Each island stores a single bit of information that’s represented by a number of grains magnetized in a particular direction. The size of these grains can be reduced, and the overall areal density of the drive increased, because the magnetic noise from each island of grains is unable to affect others.
Hard drive manufacturers are considering two other storage technologies that would similarly reduce the thermal instabilities between grains. Thermally assisted recording, or heat-assisted magnetic recording, increases the stiffness of the grains to prevent unintended magnetization. The drive head uses a miniscule near-field aperture laser to heat the grains. This allows their magnetization to be switched, representing a change in the data bit. The same goes for the second storage technology: microwave-assisted magnetic recording. Only in this case, the laser is replaced by a small device that can emit a radio frequency magnetic field.
USB 2.0 increased the original data rate of USB from 12Mb/s to 480Mb/s, and now USB 3.0 is set to multiply that bandwidth tenfold. The new 3.0 connectors and cables will be physically and functionally compatible with older hardware—of course, you won’t get maximum bandwidth unless you’re using a USB 3.0 cable with Superspeed devices and ports.
USB 3.0 will use nine lines; five new lines will sit parallel to the original four lines on a different plane, making it easy to differentiate between USB 2.0 and USB 3.0 cables. Two of the new lanes will transmit data while another pair will receive data; the fifth cable provides an additional ground.
A new interrupt-driven protocol keeps nonactive or idle devices (which aren’t being charged by the USB port) from having their power drained by the host controller as it looks for active data traffic. Active devices will send the host a signal to begin data transfer. This feature will also be backward compatible with USB 2.0 certified devices. Hardware partners should have USB 3.0 controllers designed by mid 2009, but consumers won’t see products until early 2010.
Given the choice between pre-pre-draft specs, a political standards war, or a boring, uneventful rollout, we’ll take option C any day. It doesn’t get any more boring than PCI Express, which launched and axed AGP almost overnight with very little drama. For the most part, it worked and worked well. The rollout to PCI-E 2.0 went even better than the original’s launch.
PCI-E 3.0 is just around the corner, and we’re confident in its abilities. PCI-E 1.0 spit out 2.5 gigatransfers a second and PCI-E 2.0 doubled that to 5GT/s. PCI-E goes to just 8GT/s yet actually doubles the data rate by improving the encoding efficiency by 25 percent.
The PCI Special Interest Group said it took this route to save power. The PCI-E 3.0 spec is expected to be fully backward-compatible when it’s introduced in 2010. A final spec is expected late next year with testing to take place soon after. If it’s anything like PCI-E 1.0 and 2.0, it’ll just work.
While numerous sources (us included) have pegged DisplayPort as the next logical progression in PC-to-monitor connectivity, the connection has yet to make a dent in the marketplace. It’s a chicken-and-egg story: Monitor manufacturers need to support DisplayPort every bit as much as a vendors, and each seems to be waiting for the other to make a big move.
OK, the format war is over. Blu-ray is the optical storage king. Still, how many people do you know that have a Blu-ray burner? And of those, how many use it for that purpose regularly? High hardware and media prices, along with perfectly acceptable alternatives, keep this tech firmly planted on the fringe.
10GbE might make Richie Rich happy, but hopes that the superfast interface standard will trickle down to regular-Joe consumers anytime soon seem fanciful. 10GbE over copper is considered to be the poor man’s fiber, but it’s still a mighty pricey commodity. How pricey? An add-in NIC will set you back more than $1,000, so you can imagine how much a four-port switch will cost you.
Universal Extensible Firmware Interface, the ballyhooed replacement for the BIOS, was supposed to have made its big splash this year. Instead, its debut has been more of dribble. Of the mobo vendors, only MSI seems remotely interested in incorporating UEFI—and the boards aren’t even out yet. Most others seem to think it’s just not worth the time and effort when the current BIOS works just fine.
For years now we’ve been hearing that organic-light-emitting-diode displays are coming to the desktop, but we’ve given up on waiting. While the screen technology, which requires no backlight to produce its vibrant colors, can be found in small devices, we just don’t see manufacturers ramping up large-scale production at prices that can compete with LCDs anytime soon.
When we first heard of WiMAX back in 2002, we hoped that one day it would empower users with cheap, high-speed wireless broadband. While it’s not quite right to call WiMAX a failure—there are hundreds of networks deployed around the world—it’s not ubiquitous enough to compete with cellular providers. We fear that by the time it is, the tech will be irrelevant.
As we near the end of 2008, we’re entering what seems like the second eon of 802.11n development, and unfortunately, there’s no end in sight. The involved parties are deadlocked, so it’s entirely possible that the draft N 2.0 version is the last update we’ll see to 802.11n.