To casual observers, PC builders who fixate on benchmarks are geeks unable to see the forest from the trees. “Why,” they ask, “can’t you just enjoy your new computer and let it be?” Our answer: the difference between a person who cares about benchmarking and one who doesn’t is how much that person values their free time.
Case in point, we recently did something as simple as download two large zip files at the end of the work day. Instead of strolling out at 6 p.m., we ended up waiting 15 minutes for the files to be decompressed on our work-issued PC. To care about benchmark is to care about performance. And to care about performance is to care about having more free time on your hand.
But you shouldn’t just download any benchmarking tool to run--there’s a right and wrong way to benchmark your machine if you want to get meaningful results. We’ll teach you proper benchmarking techniques and how to interpret your results. Read on to learn how to benchmark the Maximum PC way.
Getting repeatable, reliable benchmark results isn’t just about picking the right benchmark, it’s also about configuring your PC properly too. Here are some basic tips every armchair benchmarker should perform before running his or her first benchmark run:
Turn off any screen saver: Even though the screen saver is supposed to stay inactive during use, you should always completely disable the screensaver.
Turn off power saving modes: Unless you’re interested in measuring power consumption of the machine using a Watt meter, all benchmark runs should be conducted with the machine set to high performance mode in the OS.
Disconnect from Internet: Remove any Ethernet cable or disconnect any Wi-Fi connection unless it’s needed for your benchmarking run.
Disable antivirus apps: Unless you want to see the impact of having AV overhead on a machine, disable any antivirus tools for your benchmarking run.
Turn off autoupdate: Windows update should be switched off to prevent it from download a massive huge patch (You did disconnect the network connection right?) or to prevent it from eating CPU cycles looking for one. Other apps that autoupdate should also be turned off as well.
Defrag your hard drive: If the drive is heavily fragmented, we recommend that you invoke a defrag of the disk. Those with SSD’s, obviously, need not perform this step.
Disable System Restore: Turning off System Restore will prevent Windows from creating those restore points.
Reboot: Self explanatory.
Wait for the machine to fully boot: As we all know, it takes a minute or a few minutes for the OS to load all of the files it needs – even after you’re presented with the desktop. Wait a few minutes until disk activity has subsided.
Run ProcessIdleTasks: Spawn a “DOS box” by typing run CMD and type: “Rundll32.exe advapi32.dll,ProcessIdleTasks” This will order Windows to perform all of the tasks it would normally do when the system is idle.
Repeat your benchmark: We recommend that you run your benchmark at least three times to five times and to take the median score.
Real-world benchmark wasn’t always the en vogue. Years ago, the enthusiast community mostly relied on synthetic benchmarks (some prefer the term ‘artificial benchmarks’). That trend broke when people realized that vendors were skewing their drivers to increase performance in the synthetic tests, which actually hurt real-world gaming performance. This move pushed benchmarkers toward real-world apps and games with the thought that performance enhancements will deliver real benefit.
Just like you wouldn’t bring a Klingon d’k tahg to a phaser fight, you shouldn’t use a CPU benchmark to test a hard drive. As easy as it would be to understand, you wouldn’t believe how many times we see people cite a benchmark intended as a GPU test to illustrate CPU performance. For every benchmark you run, you’ll want to understand what component it’s most influenced by: CPU, GPU, RAM or HDD.
So you’ve found a benchmark actually works for your needs. Great! But is it repeatable? Can you run it five times on the same machine and have it produce the same results within a tolerable level of variance of, say, three percent?
As we mentioned, real-world applications have been established as the preferred benchmarking tools for quite some time, but that doesn’t mean synthetic benchmarks are irrelevant. In fact, synthetic benchmarks can be quite useful in evaluating a focused set of components such as RAM, the CPU or hard drive. Some synthetic tests can even be considered partially real-world.
The classic complaint against synthetic tests is that they used tests or engines that were optimized solely for the benefit of the benchmark results. But many synthetic tests today are based on real-world engines or use algorithms developed from popular applications. PC Mark’s hard drive tests, for example, uses traces of what apps or the OS does. It then runs these traces against the hard drive to measure hard drive performance.
You can see how the line between synthetic and real-world benchmarks can get easily blurred today. In some cases, actually finding real-world benchmarks that stress a particular component is difficult. RAM is probably one of the best examples of that. It’s actually very difficult to find real-world benchmarks that will exploit either the low latency or high bandwidth features of modern RAM. It’s only through synthetic benchmarks that you can actually see that you’re benefiting from any additional bandwidth at all. Hard drive features is also fairly difficult to discern without the use of at least some synthetic benchmarks.
Next: Let's get on to the actual benchmarking tools!
There’s common mistake that rookie benchmarkers often make when starting out. Many people think that running one benchmark is enough to tell you everything about one type of component. But all a single benchmark will tell you is how a particular component performs that one benchmark. And of all the parts in a PC, the one that’s the most difficult to judge is the CPU. Even with the GPU encroaching on it, the CPU continues to be the king as the vast majority of apps still rely on it for the heavy lifting. From photo editing, to video editing to 3D modeling, anti-virus scans, and decompressing files, the CPU continues to be the go to part that most applications seek out. How you go about testing the CPU really depends what kind of performance you want to test for. Floating point performance? Integer performance? How fast does it encode video or play certain games?
One other key element to consider before you benchmark your CPU is multi-threading. Just as very few applications exploit all of the threads available in a processor, very few benchmarks do as well. That is changing but you’d be surprised at the number of benchmarks that fail to measure the performance of a modern quad core.
FutureMark’s PC Mark Vantage is one of those benchmarks that has one foot in the real-world and another foot in the synthetic. The test uses workloads derived from the apps that come with Windows and traces of common hard drive loads to discern computer performance. The upside is that it uses test scenarios based on “real apps” and is actually a pretty fair estimate of computer performance. Many of the tests blend single-threaded apps with multi-threaded or run multiple applications to gauge multi-tasking performing.
The downside is that not very many people actually use those freebie apps that Microsoft bundles with its OS. Another downside is the abstracted score which makes it hard for people to want to see a pure “hard drive” or “CPU” performance number. Don’t take that all as a negative though. While Windows Photo Gallery doesn’t necessarily translate into direct performance in Photoshop CS4, but we haven’t seen it come down on the wrong side of a CPU test. Generally, we’ve been pretty pleased with PC Mark Vantage. It produces fairly reliable numbers that seem to jibe with other benchmarks. The fact that it’s multi-faceted also gives you a nice way to quickly gauge your machine’s performance.
One more thing to be aware of: the overall score that PC Mark Vantage produces is based on a set of test criteria that is unique to the PC Mark Vantage tests. Individual tests suites for Memory, TV and Movies, Music, Gaming, Communications and Productivity are actually different tests than the overall PC Mark Vantage Score.
Futuremark has a free version of PC Mark Vantage that lets you run it once on your machine. It’s a bit of a pain since you can only view the results online and you have to request a trial-key to run the utility. For additional runs, you have to pay a modest fee of $6.95 to run the main PC Mark Vantage suite. The other suites will cost you $20 and allow you to change some benchmark settings. For the most part, the main PC Mark Vantage suite score is the one that most people care about. Keep in mind, that there is a 64-bit and 32-bit version and you should compare only 64-bit to 64-bit when trying to compare one machine to another.
Running the test is simple--select the 64-bit or 32-bit icon from your desktop after downloading and installing it, and click run benchmark. You’ll be asked to request a trial key and have to supply an email address. Once you’re done, the results will take you a web page where you can see how fast your machine is and also get a reality check by seeing a score from the fastest machine compared to yours.
One of the heaviest workloads you can put on a CPU today is 3D modeling. In 3D modeling, performance is the difference between getting the project done on time or not at all. There are benchmarks for Autudesk’s 3Ds max, Newtek’s Lightwave and other pricey applications, but you have to have licensed copies of these applications which can run into the thousands of dollars. Fortunately, there’s a cheaper way to gauge how a particular system may perform at 3D rendering. Maxon’s Cinebench R10 is based on the company’s rendering engine used in its Cinema 4D modeler. Most 3D modeling is floating point intensive. Again, keep in mind that Cinebench has a 64-bit mode and a 32-bit mode. The benchmark also lets you test it in single threaded or multi-threaded mode.
To run it, simply install it and launch it. Select Rendering 1 CPU to run a single-threaded run or Rendering X CPU to run a multi-threaded test. Like, PC Mark Vantage, the results are expressed as a numerical score--the higher the better. Cinebench 10 is nicely multi-threaded and takes full advantage of today’s many core CPUs.
POV Ray is another popular 3D rendering benchmark that’s available for free from povray.org. One thing you need to know: make sure you download the 3.7 version from the beta page. This is the only version that is multi-threaded. Again, make sure you download the correction version for your OS: 32-bit for 32-bit and 64-bit for 64-bit. To run it, simply click on the render menu item, select run benchmark all CPUs. The score you’re most interested in is the CPU time which is expressed in seconds in the purplish part of the window.
For a different view of computer performance, we also rely on Fritz Chess Benchmark and ScienceMark 2.0. Fritz Chess Benchmark is based on the popular Chessbase engine, so it’s considered real-world (although slightly out of date for the free benchmark.) Running it is straightforward: simply fire up the app and click start. You should note Fritz will indicate how many “processors” it’s going to use. The number will include all of the physical cores in your chip as well as any virtual cores. The result gives you the performance of your machine versus a 1GHz Pentium III as well as how many kilo nodes per second it can compute. A kilo node per second is how many moves per second are being computed. The benchmark ships free with copies of Fritz 9 or can be found online.
ScienceMark 2.0 is another synthetic that’s rooted in a real-world engine. It uses mathematical algorithms common in scientific and engineering applications and also stresses memory performance and latency. The caveat to this benchmark is that it doesn’t seem to be particularly multithreaded. And back in the days when Intel Pentium 4 would get soundly splattered by the Athlon 64 in ScienceMark 2.0, the company would grouse that the authors of ScienceMark 2.0 weren’t interested in working with Intel in addressing optimizations for its CPUs. But with Intel taking the lead with Core 2 and Core i7, Intel doesn’t seem to object to this test as much anymore. Running it is easy. Download the installation file and decompress it. Execute the file and click on File, Run All Benchmarks. The results will give you an overall ScienceMark score as well as subscores for molecular dynamics, cryptography, memory, among other benchmark scores.
Although these are single-threaded, two popular benchmarks are great for calculating the math prowess of a CPU: Prime95 and SuperPi. Both have actually long been favored by overclockers as stress tests but both also will give you overall scores as a performance indicator. The weakness with both tests, obviously, is the lack of multi-threading. The preferred version of SuperPi is 1.5 and has been modified by XtremeSystems.org to make it more amenable to stress testing. It’s available here. To run it, execute the app and select Calculate and select 1M and the program will calculate pi to one million digits.
Like SuperPi, Prime95 is considered more of a stress test than a benchmark. In fact, we use a custom blend of Prime95 developed by an OEM PC builder to stress test many of the overclocked PCs we review. Prime95 is a distributed project used to search for Mersenne prime numbers. To run the benchmark, first download it from www.mersenne.org. Start the application, dismiss the stress testing screen, and go to Options Run Benchmark. When it’s complete, the results are dumped into a file named results.txt that should be in the same folder where the executable resides. Open the file and you should find results for each separate run you conducted reported in milliseconds. You can compare the results to others at: http://v5www.mersenne.org/report_benchmarks/.
Hard drive benchmarking is perhaps one of the most difficult components to gauge. For many years, users relied solely on straight file copying tests: Take a few gigabyte of files and dump them to a target hard drive three or four times and time it with a stop watch. Sounds real-world and accurate doesn’t it? Unfortunately it isn’t. Straight file copies have often proven to be unreliable. That’s mainly because you have no control over where the data is dumped on the drive. And since the placement of data is on a mechanical hard drive you can easily get whacky results.
We don’t mean to say that file copying is completely invalid as a measurement of performance, but this is an area where synthetics can be more reliable than real-world tests. Two of the most popular tests include HD Tach 3 (simplisoftware.com) and HD Tune 4 (hdtune.com). For Vista and Windows 7 users, HD Tach requires a bit of tweaking since it was designed for Windows XP. To enable its use in Vista and 7, right mouse click on the icon, click properties and set the compatibility to Windows XP SP2. On startup, choose the target drive to test and choose long bench. Click Run Test.
You will be given four results that matter: the average read performance, random access time, CPU utilization and burst speed. The two that matter are average read performance and random access time. The relevance of burst speed to performance is up for debate. The general consensus is that it doesn’t matter much the cache’s in hard drives (even today’s supersized 64MB caches) are so small that they can’t really help much. On the other hand, some feel that burst performance can be a quite significant if seen as an indicator of how well the drive’s caching performance and read-ahead algorithms perform. CPU utilization also is fairly meaningless since it’s usually below 5 percent. This figure should only concern you if it’s double digits as it might indicate some problem with the storage subsystem.
HD Tach’s one weakness is its inability to perform write tests (at least on the free version.) Fortunately, that’s one thing you can do with HD Tune (hdtune.com) for 14 days. The trial version lets you run write tests on drives for the duration of the trial period. Starting it is simple; launch the app, select your target drive click start. If you plan on running the write tests, you’ll have to delete the partition on the target drive first. Obviously, don’t do this on the primary partition that you are using.
Gauging RAM performance with real-world applications is probably even more difficult than hard drive benchmarking. Like the Great White Whale, we’ve long looked for the application that would instantly show you just how much more performance you get from running ultra-tight RAM timing tolerance or clocking the modules past the 2GHz mark. In all our years of system testing, we’ve never found it. We don’t mean to say that it doesn’t exist. Valve’s non-public multi-threaded particle benchmark typically favors lower latency RAM setups. But even there you don’t see magical results.
To actually see if your overclocked RAM even gives you more bandwidth, you’ll have to turn to the synthetic tests. We favor Sisoft Sandra Lite and Everest Ultimate. Sisoft Sandra Lite. Launch the app and click the benchmarks tab. Select Memory Bandwidth and press the F5 key to run the benchmark. The app will give you a score and give you four other chipset/RAM/CPUs that you can compare your results too. You can do the same with Memory Latency as well.
The free version of Everest Ultimate gives you a 14-day trial period. That’s plenty of time to run all the benchmarks you want. To test your RAM with Everest, install the app, launch it and click on the Benchmark icon. Select Memory Read and click on the refresh icon on top. You can do the same for Memory Write, Memory Copy, and Memory Latency. Like Sandra, Everest Ultimate will also give you a lengthy comparative list of motherboards/CPUs/chipsets so you to gloat (if you have happen to have a triple-channel board and proc) or turn sullen (if you happen to still be pushing a Pentium 4 on an 848 chipset an pushing 2.6GB/s in bandwidth.
So you’ve run your benchmarks, now what? Besides using it to be proud of your rig, you can use these two tools to tune your RAM for higher bandwidth or lower latency. Just note your score before rebooting into the BIOS where you can clock your RAM higher or latency lower.
Benchmarking graphics cards gets more difficult every year. In the old days, most people had 4:3 CRT displays, and usually ran games at relatively low resolutions. Today’s gaming environment is considerably more complex. We’ve now got wide screen displays running in a variety of resolutions and two different aspect ratios (16:10 and 16:9).
Then there’s the confusing issue of APIs. It’s true that OpenGL for PC gaming is less relevant than it used to be, but now we have multiple versions of DirectX: DirectX 9, 10 and 11. Each offers different capabilities and feature sets. Currently, only AMD offers DX11 class GPUs, though that may change by the time you read this.
The next layer of complexity is figuring out which benchmarks are relevant. FutureMark’s 3DMark series has been popular for doing quick tests, but it’s also understood that a 3DMark score doesn’t always reflect how a particular GPU might perform in real games.
Then again, game benchmarks don’t always reflect reality, either. Take, for example, Far Cry 2. This Ubisoft title has one of the best built-in benchmarking tools we’ve seen. The problem is that it’s actually multiple benchmarks.
Far Cry 2 offers rich benchmarking opportunities – almost too rich.
So which benchmark is more useful? The longer, “Ranch Long”, which is almost purely a graphics test? The “Playback (Action Scene)”, in which AI and physics has a major role? Or the “Ranch Small”, which is more balanced between CPU heavy and GPU heavy elements?
If what you really want to do is test purely graphics performance, you’d not only choose the “Ranch Long”, you’d also take pains to disable AI and other CPU elements in the Game Settings tab. If what you want is to check out performance during actual gameplay, the Action Scene might be better – but CPU performance would have a large effect. On top of all this is the fact that a particular graphics card performance in Far Cry 2 might not reflect how it performs in a completely different game.
Then there’s the resolution question. Again, if you’re looking to just hammer on the GPU, run at very high resolutions. Then, add to the GPU’s pain by pumping up anti-aliasing and maxing out game detail and effects. So the right answer would be to use a big display – say, a 30-inch monster running at 2560x1600, right?
Well, not necessarily – not everyone has a 30-inch, 2560x1600 display. For example, if you take a look at the Steam user survey, you’ll see that the largest single group of users – over 20% -- is running at 1280x1024. That’s a 5:4 aspect ratio, and a pretty undemanding resolution. In addition, a $100 graphics card will completely tank at very high resolutions, while delivering very playable frame rates at something like 1280x1024.
The right answer, of course, is to run multiple different benchmarks at multiple resolutions. But you do have take a stand and minimize the number of variables, or you’d spend all your time benchmarking games and not actually playing games. If you’re testing your own personal setup, you’ll of course be limited by your CPU and your display. In our labs, we take a middle ground. We do test at multiple resolutions and using many games, but we also try to find the sweet spot for each card. When we benchmark a dual GPU card like the Radeon HD 5970, we’ll run it on a 30-inch display. On the other hand, we might max out at 1680x1050 for a sub-$100 card, but also test at lower resolutions, like 1280x720.
We’ve already discussed how to set up your system for more reliable benchmarking. What works for CPUs and systems also works for graphics cards, with one additional wrinkle: make sure you’re running the latest graphics driver. Performance in some titles can go up by over 10% just with a driver update (though in rare cases, you may actually see performance decreases.)
The good news is that you can find a pretty good set of graphics benchmarks that cost you nothing but download time and some time to learn. If you want to spend $50 for a particular game to run as a benchmark, feel free. But here’s our handy guide to a few of the good, free graphics tests.
The developers at GSC Game World have released two games based on their excellent S.T.A.L.K.E.R. series. The latest is the Call of Prypiyat test, which supports DirectX 9, Direct 10 and DirectX 11.
Call of Pripyat supports the latest APIs and is fairly simple to run.
Call of Pripyat is a first person shooter, but there’s no actual action in the benchmark, though AI is active. You’ll see characters walking around, but not actually engaged in combat. Since it tests multiple APIs, it’s good for finding the sweet spot for your particular GPU. CoP, as it’s often called, offers a fairly simple set of parameters: pick an API, pick a detail level and, if you like, enable AA and other features.
DiRT2 is the latest in the series of racing titles that started with the original Colin McCrae game. The free, downloadable demo. DiRT2 is a recent game that support DirectX 11, 10 and 9, so is about as current as you can get.
The problem with DiRT2 is that it’s somewhat cumbersome to run as a benchmark. First, you need to run the demo. Then you have to navigate the menu to find the options screen. You start inside your in-game RV trailer, have to leave the trailer to go outside, find the options table, and then select graphics. After all that, you still need to scroll down to the bottom of the graphics options to find the benchmark mode.
BattleForge is an online, real-time strategy game from EA that uses a free-to-play model, so you can download and run it at no cost. Since it’s a real-time strategy game, it uses graphics a little differently than a first person shooter. In the scripted benchmark, for example, there’s about a hundred units running around doing stuff – engaging in combat, throwing spells and otherwise creating mayhem.
The benchmark, as with many games, is buried in the graphics option menu. The benchmark itself is only about a minute long. However, actually running the test is even more tedious than running the DiRT2 benchmark. First, you’ll need to download the installer. When you run the installer, it proceeds to download about 1.3GB of content. Be sure to include the option for high resolution textures, or you won’t be able to benchmark in the high detail mode.
After you install the game, you’ll also need to create a BattleForge account. Finally, you can log in, run the game, and go into the graphics options. Even then, it’s something of a pain, because every time you make a major change to a graphics option, you have to exit the game and restart – which means re-logging in. It’s a good thing that this is a free game.
Still, it’s one of the few RTS-based tests, and it is free, so it’s worth checking out if you’re into these types of titles.
This particular demo supports DirectX 9 and 10, but also has support for Nvidia’s 3D Vision stereoscopic 3D, so you can even benchmark this card wearing shutter glasses, if you think that’s interesting.
As game demos go, it’s not a huge download, at about 580MB. It’s dead simple to run, too. When you run the launcher, choose either DX9 or DX10 mode. When in the demo, you press a key, select “System Settings”, then set the resolution and features. Once you ESC back to the main screen, you hit a key again and select “Benchmarks.” The “Fixed Benchmark” is shorter, and generates more repeatable results. At the end, you’ll get a summary screen, showing you an average frame rate and a chart of frame rate over time.
Resident Evil 5, with all the eye candy turned up, can still stress a graphic card, even though it’s almost a pure port of the console title.
Although based on an actual game engine, the Unigine Heaven is a synthetic test, designed to check out DirectX 11 performance on the latest generation of graphics cards. It even offers manual settings for hardware tessellation, a feature available only on DirectX 11 capable GPUs. It will also run in DX10, DX9 and OpenGL, so you can test a variety of APIs, but remember that performance will vary by API and enabled, API specific feature, like hardware tessellation.
Unigine’s Heaven benchmark runs all currently supported graphics APIs, and supports hardware tessellation under DX11.
If you’re a graphics artist or professional CAD user, you may want to test performance of your card in that context. The problem here is that professional graphics apps vary in performance on specific GPUs even more than games, and the CPU can often be a big factor.
If you do want to see how your graphics card performs, there are several free benchmarks that can assist you. We won’t go into details as to how to run them, but there are a couple of good free benchmarks. In addition, there are several application-specific tests, but those often require you to own the app. If you are, for example, a 3dsmax 9 user, you may want to see how your card (and system performs with benchmarks for that particular app.
One quick and easy source is SPEC, the Standard Performance Evaluation Corporation, a standards group that develops a variety of benchmarks. One free benchmark that’s widely used is SPECViewperf, now at version 10. SPEC also offers app specific tests for 3dsmax 9, Maya 6.5, SolidEdge and others.
It’s easy to get mired in frame rates, feature sets and driver versions. Remember, though, a difference of a few frames per second really doesn’t matter (as long as you’re staying above 45 fps in shooters and 30fps in simulation and RTS titles.) Benchmarking graphics cards is a great way to check out the performance of your rig, and maybe help you decide when it’s really time to upgrade. In the end, though, it’s about how well the games you like perform on the hardware you own. Benchmarking should be a tool to help you enjoy your gaming experience, not a competition unto itself.