There’s a joke in the hardware community that the only thing a performance computer is good for is running benchmarks . This dis at benchmarking suggests that such performance measures are pointless.
We disagree. We honestly think that benchmarks keep the hardware world honest. They give you a real metric with which to measure one piece of hardware against another, or one system against another. Yes, there are times when politics get injected into benchmarks and they can be misapplied, cooked, or even cheated on. But think of what the world would be like without benchmarks. A vendor could make claims that his gadget is faster than the competitor’s. An Internet declaration claiming a PowerPC Mac was 10 times faster than a Pentium II would stand as truth. A good benchmark run well and analyzed correctly can tell you more about a piece of hardware than any marketing flyer.
Since Maximum PC’s system benchmarks haven’t been updated since the last decade, we’re rolling out newer, more punishing tests that push today’s hardware. We’re also using real-world workloads such as gigapixel imaging and multiple 1080p streams to closely match what people are doing today.
With new benchmarks also comes a new zero-point system to give you a reference point for how today’s fastest PCs perform. And after we’ve given you a tour of our official system tests, we’ll point you to some benchmarks you can run at home on your own rig.
We didn’t need benchmarks to tell us our Nehalem-based test bed was dragging ass
If we tell you how fast some new $5,000 PC is, it doesn’t mean much without a reference point. That’s why we build standard zero-point PCs to compare machines to. It’s hard to believe, but our previous zero-point is now several generations old. It’s still serviceable for many folks, but when it’s meant to be our measuring stick for some of the fastest production computers in the world, it better have some chutzpah.
In choosing our parts, we spent some time pondering whether to go LGA1155 or LGA2011. Quad-core or hexa-core? Single GPU or dual? In the end, we decided that more cores still matter, so Intel’s Core i7-3930K would be the basis of our new ZP. Yes, it’s based on the older Sandy Bridge microarchitecture, but it still has plenty of speed, and LGA2011 gives us an upgrade path to Ivy Bridge-E and perhaps an eight-core chip in the future. The 3930K is stock-clocked at 3.2GHz with Turbo Boost to 3.8GHz. We decided to override the stock clock and run it at 3.8GHz full time, with Turbo taking it to 3.9GHz.
For storage, we are finally unshackled from SATA 3Gb/s speeds with the X79 chipset, via an Asus Sabertooth X79 board. A 120GB OCZ Agility 3 gives us zesty SATA 6Gb/s reads and writes and has enough capacity to handle our benchmarks. For graphics, we’ve long used a single‑card dual-GPU and we continue that trend with Nvidia’s benchmark- and wallet-busting GeForce GTX 690 . It’s basically the equivalent of two GTX 680 cards—in performance and cost. The rest of the build is essentially borrowed from the $2,100 Tax Refund PC in our May 2012 issue and includes 8GB of 2GB DDR3/1600 in quad-channel mode, a 2TB WD Caviar Green drive, an 850W Corsair HX850 PSU, and an NZXT Phantom 410 case and Havik heatsink .
To create our benchmarks, this wasn’t the only hardware we used. Additional testing was also done using a stock Core i7-3770K and GeForce GTX 580 card. We also tested our benchmarks using an OCZ Revo 3 X2 card to see how much of an impact I/O would have on the individual tests. It was negligible.
|CPU||2.66GHz Core i7-920 (at 3.5GHz)||3.2GHz Core i7-3930K (at 3.8GHz)|
||NZXT Havik 120|
Asus X79 Sabertooth
|RAM||6GB DDR3/1750 in tri-channel mode||8GB DDR3/1600 in quad-channel mode|
|GPU||AMD Radeon HD 5970||Nvidia GeForce GTX 690|
|SSD||160GB Intel X25-M||120GB OCZ Agility 3|
1TB WD Caviar Black
||2TB WD Caviar Green|
Click the next page to read about our new benchmark tests
We chose our tests with an eye toward real-world workloads
ADOBE PREMIERE PRO CS6
We’ve been envious of the Mercury Playback engine since Adobe introduced it in Premiere Pro CS5. In Premiere Pro CS6 , Adobe has tucked in even more enhancements to make it probably one of the fastest, if not the fastest, nonlinear editor on the planet. That presented a few problems for us, though: Do we render using the wickedly fast GPU or the CPU? Using the GPU could cut our times by several factors, but not all machines support the GPU encoding. In the end, our problem was solved for us, as the GTX 690 is not currently supported by the Mercury Playback engine, so it’s CPU all the way. That doesn’t mean the benchmark is a wimp. We find the multithreading in CS6 to be impressive. All 12 threads on our Core i7-3930K are hammered during the export. For the workload, we take 1080p video previously shot on a Canon EOS 5D Mark II, add transitions and moving picture-in-picture frames with additional 1080p footage, and export it to H.264 formatted for Blu-ray. The six cores in our 3930K pay dividends, as our render took about 33 minutes. A stock Ivy Bridge setup took about 53 minutes.
GIGAPAN STITCH.EFX 2.0
The GigaPan Epic Pro uses a motor to pan your DSLR to create gigapixel images.
New to our stable is Stitch.Efx 2.0 . Let’s face it, applying a sepia filter and scratch effects can be done on a $50 smartphone. Since PCs are about going big, we went as big as we could get. We used a motorized GigaPan Epic Pro head, a Canon EOS 7D, and a 300mm lens with 1.4x teleconverter to shoot a panorama of 287 images totaling 1.63GB. Using Stitch.Efx we stitch the shots into a single continuous 1.1 gigapixel panorama. Yes, that’s 1.1 billion pixels, or 1,100 megapixels. (That might sound like a lot, but it’s nowhere near the current record of 272 gigapixels—also shot with an Epic Pro head and 7D.)
Stitch.Efx is one-third single-threaded and two-thirds multithreaded. We use it to stitch together 1.6GB of JPEGs into one single 1.1-billion-pixel image.
About the first third of the process, where the app aligns the images, is single-threaded and sensitive to clock and microarchitecture. Ivy Bridge cores give the Sandy Bridge cores a good run for the money in this section, but in the blend section it’s all about the cores and this is where we see SB’s greater number of cores pull ahead of the Ivy Bridge chip. As we stitched “only” 287 images together, it’s mostly a CPU test, but we can say the process created no fewer than 24,339 files during the stitch, so small-file read and write performance should matter. With its mix of single- and multithreaded performance, Stitch.Efx2.0 is a good representation of today’s software.
TECHARP X264 HD 5.0
Since our Premiere Pro CS6 test actually features MainConcept’s popular encoding engine, we cast about for another publicly available encoding test and found one in the newly released x264 HD 5.0 . Created by tech website TechARP.com , the test uses the x264 library to encode a 1080p video stream multiple times. The benchmark is multithreaded and loves cores. It performs two passes, with the second pass compressing the compressed material even further to save space. We run in 64-bit mode and report the average frame rate for the second pass. In our testing, the hexa-core Core i7 smashes the newer Core i7 Ivy Bridge in the nose by a significant margin. We’ve found that encoders can be sensitive to memory bandwidth, so we reconfigured our machine from quad-channel to dual-channel mode (using larger DIMMs so the total amount of RAM would remain the same) and found a negligible difference.
PROSHOW PRODUCER 5.0
Favored by professional photographers, ProShow Producer 5.0 is a popular slideshow creator that we’ve long used as a benchmark. For our new benchmarks, we update to the latest version of the app, which adds GPU acceleration, but only for video playback. When we started using ProShow Producer five years ago, it was one of the few apps that could push quad-core chips to their limit. Unfortunately, the app seems to top out with four cores, but that’s fine. We intentionally picked ProShow Producer 5.0 knowing full well that it doesn’t scale with cores. Like Stitch.Efx 2.0, we wanted something that’s closer to most apps in performance instead of simply scaling as you add more cores. Why pick something that won’t push an eight-core chip to its limits? The sad truth is that the vast majority of apps can’t exploit the threads.
Our ProShow Producer benchmark also has a "cute" element...
BATMAN: ARKHAM CITY
Arkham City is based on a heavily modified version of the Unreal Engine 3 and adds the latest DX11 bells and whistles. We run the test at 2560x1600 with 8x AA, tessellation on High, and detail on Extreme. Why not use some of the more advanced AA settings available from Nvidia or AMD? Since this test will be used on systems, it can be difficult to compare a proprietary antialiasing technique from one vendor against another vendor that doesn’t support it. Even at 8x AA and everything cranked up, the GeForce GTX 690 makes mincemeat of the benchmark.
FUTUREMARK 3DMARK 11
Our last benchmark is Futuremark’s 3DMark 11 . We normally eschew synthetic benchmarks in favor of real-world benchmarks, but we have relied on the various iterations of 3DMark over the years. We’re choosing it here because it scales well with multiple GPUs, and this version doesn’t seem to represent the typical game of political football between rival graphics companies that previous versions have. For our test, we run the default benchmark for the Extreme preset.
|Premiere Pro CS6 (sec)||2,000||
|Stitch.Efx 2.0 (sec)||831||
|ProShow Producer 5.0 (sec)||1,446||
x264 HD 5.0 (fps)
|Batman: Arkham City (fps)||
|3DMark 11||X5,847.0||X2,115 (-64%)|
For comparison, we ran our benchmarks on a stock quad-core 3.5GHz Core i7-3770K on an MSI Z77A-GD65, with 8GB of RAM, a GeForce GTX 580, a WD Raptor 150 drive, and 64-bit Windows 7.
Click the next page to read how you can benchmark like an expert!
Lessons from the Lab that you can apply to your own testing methods
Our last benchmark is Futuremark’s 3DMark 11. We normally eschew synthetic benchmarks in favor of real-world benchmarks, but we have relied on the various iterations of 3DMark over the years. We’re choosing it here because it scales well with multiple GPUs, and this version doesn’t seem to represent the typical game of political football between rival graphics companies that previous versions have. For our test, we run the default benchmark for the Extreme preset.
Before you begin your benchmarking, there are a few basic rules that every techie has learned through blood, sweat, and tears. First, record all your settings. From bclock, to RAM timing, to GPU clocks, drivers, and BIOS settings, you should keep a written record that you can refer back to. Second, you’re human and make mistakes. If the result from B outrageously exceeds A, assume you made a mistake and retest. Third, double-check your system. Are you in the correct SATA port? Is the RAM fully inserted and in the correct memory mode? Is the CPU overheating and throttling? Fourth, triple-check your settings. Yeah, this is the second tip again, but more often than not, user error is the cause of errors in tests. Finally, benchmarking doesn’t have to cost money. Here are a few free and reliable benchmarks and how to interpret their results.
MAXON CINEBENCH 11.5
Cinebench 11.5 is best used as a pure CPU-performance benchmark and applications outside of that should be carefully weighed.
Cinebench 11.5 is a great test of pure CPU performance. The benchmark is based on Maxon’s Cinema 4D rendering engine and is heavily multithreaded. The test also features an OpenGL rating. So what’s the catch? Cinebench’s rendering test is best used to test CPU performance only. As a system level test, any variances you see between system A and system B will be due to the CPU and not the hard drive, SSD, or memory bandwidth. It’s virtually worthless to try to use it as, say, a motherboard test using the same chip, because any variances will be due to how much the vendor tweaks the board’s bclock settings. OpenGL also has little value for mainstream users, as very few games even use OpenGL anymore. www.maxon.net.
TECHARP X264 HD 5.0
We’ve just started using TechARP’s x264 HD 5.0 benchmark, but we like it already. It gives you an easy, repeatable way to test the encoding prowess of a machine. Be advised that, like Cinebench, it seems to be almost completely compute-bound. We tested it in dual-channel mode and quad-channel mode on a hexa-core chip and found a very minor difference resulting from memory bandwidth. Testing from a single SSD to a RAIDed PCIe SSD also yielded very little difference. www.techarp.com
UNIGINE HEAVEN 3.0
For graphics, Unigine’s Heaven 3.0 is a great way to measure tessellation performance and will push even the fastest cards. It even has a Mac version, but without tessellation. Multiple GPUs help this benchmark, which is purely focused on the GPU. Quad-core, hexa-core, low clock, or high clock hardly make a difference in this test. The free version of Futuremark’s 3DMark11 will also work—but only for the Performance preset. www.unigine.com
CRYSTAL DISMARK 3
We’ve also been happy with CrystalDiskMark , which is easy to run and gives you a good feel for your disk subsystem’s performance. Keep in mind, one limit with the test is that the workload is limited to 4GB, so even a hybrid drive could perform like an SSD. www.crystalmark.info
CrystalDiskMark 3.0 is a reliable way to measure disk performance but not across an entire disk or SSD.
Folks interested in measuring their memory bandwidth should check out SiSoft Sandra 2012 . The free version offers a host of benchmarks, including a synthetic memory benchmark. Keep in mind, though, with the large-cache CPUs today, it’s very difficult to see an impact from memory bandwidth unless you are running integrated graphics.
Note: This article appeared in the August 2012 issue of the magazine.