Analysis: Inside the Sandy Bridge Integrated HD Graphics Core

Alan Fackler

Sandy Bridge will feature an on-die, high clock speed graphics core. Will it be fast enough for most users?
Maximum PC readers know we’ve never been strong fans of integrated graphics. Even when we’ve needed them – in small formfactor home theater PCs, for example, we’ve tended to go for AMD or Nvidia integrated solutions. More often, though, we’ll spec out an entry level discrete graphics card for a compact HTPC.

The new Intel HD Graphics built into the Sandy Bridge CPU may shift that decision point a bit. While any gaming experience with the new graphics is still fairly entry level, it’s far less anemic than past Intel efforts. Starcraft 2, for example, runs at medium settings and keeps up pretty well with entry level discrete solutions from Nvidia.
Let’s take a quick look at the internals of the latest Intel graphics core, rebuilt from the ground up for 32nm, and what kind of power it brings to the table.

It’s All About Power

The first thing to note about Sandy Bridge in general, and the new graphics core in particular, is that it’s all about power. When Tom Piazza, Intel’s chief graphics architect, talks about Sandy Bridge, it’s in the context of 17W, 25W and 35W TDP – those are squarely in the realm of laptop CPUs. Sure, there will be Sandy Bridge desktop solutions as well. But the chief goal was to get moderate levels of 3D performance and excellent high definition video playback into a power envelope suitable for thin-and-light laptops.

Another key feature of the graphics core is that it’s modular, like the rest of Sandy Bridge. That means Intel will be able to deploy CPUs with differing levels of 3D performance by building CPUs with fewer than them current maximum 12 execution units (EUs, as Intel calls its programmable shader engines.) This will mostly affect 3D performance, not media playback (video.)

It’s also worth noting that Sandy Bridge graphics will not support DirectX 11 – it’s a DirectX 10 part. Note that Microsoft does allow certain DX11 features, like elements of DirectCompute – to run on DX10 GPUs, and Intel HD Graphics will be no exception. But the graphics core itself will not support DX11 graphics features like hardware tessellation. Since all current and foreseeable PC games will have, at a minimum, a DX10 code path, that’s not likely to be a big handicap. In the end, though, it’s worth remembering that Intel  GPUs have always been aimed at a “mainstream” gaming experience. If you want high frame rates in Just Cause 2 or Metro 2033, you’ll need a discrete graphics solution.

On the technical side, the most interesting design decision was to build the graphics core behind the “last level cache” or LLC. (Typically, this would be the L3 cache in current CPUs.) That means the L3 cache will be shared between the CPU cores and the GPU. It’s up to the graphics driver to make decisions about what will and won’t be cached.
The potential bandwidth and performance gains are substantial. If the graphics core can find the data in the cache, that means it won’t have to go to main memory, which saves substantial time and power. It’s like the embedded memory in a console graphics chip.

The EUs (execution units) have been rebuilt, and now offer twice the compute capability of the GPU built into the current 32nm Arrandale CPU. In fact, Piazza noted that performance of the overall GPU is 20 – 25 times faster than the GMA45 series – fulfilling a promise made two years to improve graphics performance by 10x by the year 2010.
The graphics core also supports Turbo-boost, allowing clock frequencies to ramp up quickly for short periods of time to handle heavy workloads. It’s unlikely we’ll see graphics Turbo-boost kick in at the same time as CPU boost, since the types of applications usually mean that graphics is waiting for the CPU or vice versa.

Intel also added a large register file for more complex shader execution and increased parallelism. Dedicated transcendental functions were added, improving throughput in transcendental math 4 to 20 times.

Interestingly, high definition decode is handled entirely in a dedicated, fixed function unit. The fixed function unit can run in parallel to the EUs. This media decode unit natively understands H.264, VC-1 and MPEG-2, and can handle hardware decode of two simultaneous streams. This means that an users of a Sandy Bridge system can use them to watch 3D stereoscopic HDTV without performance hiccups – and without bringing the rest of the system to a crawl.
Video encode is partially built into fixed function units, and partially programmable, though Intel didn’t divulge what portions of the pipeline required use programmable units.

Overall, Sandy Bridge graphics looks like it will be a highly capable media engine with modest, but reasonably effective 3D capability. Gamers who need better 3D will still want to add a discrete graphics chip to their laptop or desktop Sandy Bridge system. But if all you want is a compact HTPC, or a laptop with strong media playback capabilities, Sandy Bridge graphics may be enough.

For more news from this year's Intel Developer Forum, click here!

Around the web