Date: July 19, 2013
Author(s): Rob Williams
Intel’s latest processor architecture brings a lot to the table, with the usual suspects dominating the feature’s-list: improved CPU and GPU performance, better power-efficiency, and new instruction sets. We’re taking a look at the desktop line’s highest-end offering here, so let’s see how it stacks up against the last-gen’s champ.
Hmm, is that the doorbell? Hi, come in and sit a spell, as we take the hassle out of understanding Haswell. Will we dwell on its performance and liken it to stepping barefoot on a seashell, and bid it farewell? Or will we exalt it for packing a punch like a dropped barbell and say it is sure to outsell any other CPU as well?
That intro, ladies and gentlemen, is the reason that our look at Intel’s latest and greatest mainstream chip has taken so long. Was it worth it?
Admittedly, I could be lying, and in reality other factors could have contributed to the tardiness. I’ll let that remain a mystery, but given the fact that we are rather late on things (apologies, Intel!), let’s get right to it.
Why the name “Haswell” gives me an intense urge to begin rhyming, I’m unsure, but what I am sure of is that this is a rather interesting “Tock” release from Intel. Haswell CPUs are built-upon a 22nm process like the 3rd-gen Ivy Bridge chips were, but under the hood, Intel has retooled a lot.
Before moving on, behold, the 4th-gen Core processor:
In terms of new features, it’d be difficult to call Haswell the most interesting Tock release from Intel since the company began its Tick/Tock cadence in 2006, but in terms of actual architecture design, it’s the best it’s ever released. However, it’s important to not expect major performance gains here over last-gen; effectively, if you’re already equipped with an Ivy Bridge PC, there’s no reason to upgrade unless you’re planning to move up to a larger model. It’d be ridiculous to move from a Core i7-3770K to an i7-4770K, for example. If you’re rocking an older rig, especially older than the i7-2xxx series, Haswell is worth looking into.
Those hoping to get a reprieve from the socket roulette game that Intel plays with any new architecture release haven’t been thrown a bone with Haswell. Despite the similar size, Intel has adopted a new LGA1150 socket for use with Haswell. While that sucks for those who own LGA1155 and were hoping to upgrade, the change is for a legitimate reason: there have been a ton of power-related changes made here, so it simply wasn’t possible to retain last-gen’s socket.
Intel’s current Core i5 and i7 lineup:
|Core i7-4770K||4||8||3.5GHz||3.9GHz||8MB||HD 4600||84W||$339|
|Core i7-4770S||4||8||3.1GHz||3.9GHz||8MB||HD 4600||65W||$303|
|Core i7-4770||4||8||3.4GHz||3.9GHz||8MB||HD 4600||84W||$303|
|Core i5-4670K||4||4||3.4GHz||3.8GHz||6MB||HD 4600||84W||$242|
|Core i5-4670S||4||4||3.1GHz||3.8GHz||6MB||HD 4600||65W||$213|
|Core i5-4670||4||4||3.4GHz||3.8GHz||6MB||HD 4600||84W||$213|
|Core i5-4570S||4||4||2.9GHz||3.6GHz||6MB||HD 4600||65W||$192|
|Core i5-4570||4||4||3.2GHz||3.6GHz||6MB||HD 4600||84W||$192|
|Core i5-4430S||4||4||2.7GHz||3.2GHz||6MB||HD 4600||65W||$182|
|Core i5-4430||4||4||3.0GHz||3.2GHz||6MB||HD 4600||84W||$182|
|All 4th-gen Core processors are built on a 22nm process, utilize 3D tri-gate transistors and work in the LGA1150 socket.|
Outside of IGPs and TDPs, the trio of i7-4770 models are spec’d identically to their i7-3770 predecessors. Interestingly, despite the power-related overhauling Intel has done with Haswell, the K and non-K models are spec’d 7W higher. We’d assume that it’s the IGP upgrade that has caused this boost; as we’ll see later, both the i7-3770K and i7-4770K deliver about the same overall load in our tests (with the IGP out of the picture).
Intel might have made an awful lot of changes to the Haswell architecture over Ivy Bridge, but the general function layout remains identical:
Haswell’s Architectural Layout
As with the previous couple of generations, Intel packs graphics, a memory controller and PCIe lanes into the CPU, which negates the need for an entirely separate chip on the motherboard. The biggest 4th-gen Core models include 8MB of L3 cache, while the middle-of-the-road i5’s include 6MB (i3’s include 4MB; Pentiums, 3MB).
Regarding efficiency, there are a handful of reasons why Haswell is Intel’s best-ever architecture. As seen in the diagram below, the Unified Reservation Station (task scheduling) has had two ports added; one to improve integer performance, the other for improved store address calculation. In addition, the L2 TLB has been increased in size, and wide vector units have had their bandwidth doubled. Virtualization latencies have been reduced as well.
The overarching goal with these improvements is to increase the overall operation-per-clock cycle – ideally, that should be double what it was on Sandy Bridge for some operations.
A new CPU isn’t a new CPU without a new instruction set or two, and Haswell has got us covered. AVX2 has been added to the mix, which enables support for 256-bit integer vectors, and FMA3 (fused multiply-add) support has also been added. Any media-related workload that supports either of these two instructions should see a great performance improvement.
General performance improvements haven’t been a major focal-point of Intel’s for a while, as it aims to get its processors into every device imaginable, from servers to smartphones. Thus, Haswell’s power system has been overhauled to keep power usage low at home, and battery-life long while on the go.
The bulk of the power improvements have FIVR to thank, or Fully Integrated Voltage Regulator. As its name suggests, this is a voltage regulator built right into the CPU – a rather significant change. With a VRM built-in, efficiency is of course improved, with Intel gaining even finer control on how power is distributed throughout the chip.
The greatest advantage to these changes, and the addition of two new power states, is mobile. Intel claims that Haswell has delivered the greatest increase in battery-life it’s ever accomplished from one generation to the next. In some cases, battery-life could be improved by as much as 50% when in use, and 250% longer when in standby. Clearly, these are some noteworthy gains.
You can’t have a brand-new CPU architecture without brand-new chipsets, so to fill the void is Z87, H87, H81, Q87, Q85 and *catches breath* B85. The contents of the flagship Z87 chipset can be seen in this diagram:
With Haswell, Intel continues to let Sandy Bridge-E (2nd-gen Core) have the exclusive benefit of being able to offer 2x x16 PCIe lanes when using two graphics cards in SLI or CrossFire. Admittedly, this is going to potentially affect only the highest-end of graphics configurations – unless you’re running a $1,000+ configuration, you’re likely safe. PCIe 3.0 offers a ton of bandwidth, after all.
In addition to those PCIe lanes driven by the CPU, the Z87 chipset adds support for 8x x1. Other functionality includes 6x USB 3.0, 14x USB 2.0, Gigabit Ethernet, Intel Audio and support for up to DDR3-1600 (faster RAM can of course be used).
As has become somewhat of an unfortunate trend, Intel differentiates its CPU lineup in ways that are very easy to overlook. You’d imagine, for example, that the highest-end Core i7-4770K wouldn’t lack a single feature – but it does. As seen in the CPU comparison below (taken from Intel Ark), we can see it lacks vPro Technology, VT-d (an I/O virtualization accelerator) and also TSX-NI support – aka: transactional memory.
According to Intel, the reason the K models lack some of these features is because that’s not their target audience. You’d also imagine, though, that a lot of people purchasing a non-K or non-S wouldn’t have much interest in something like VT-d, so it seems likely that there’s more at play here.
If I had to wager a bet, I think it’s because the K is an unlocked chip, and Intel might prefer enterprise customers purchase their higher-end Xeon offerings rather than buy an inexpensive chip and just overclock it (and obviously, it’s not hard to overclock an Intel CPU to bring it from one tier to the next). The fortunate thing in all of this is that those lacking features are going to affect a small number of people, but you could be one of them, so it’s worth noting.
I think it’s safe to say that overall, Intel brings a lot to the table with Haswell. Performance aside, it’s simply an impressive architecture to behold, with a large number of benefits for all-around computing. As enthusiasts, though, we tend to care about performance gains even if they’re not supposed to be huge. So with that, let’s take a quick look at our test system and methodology, and then get right into our benchmark results.
At Techgage, we strive to make sure our results are as accurate as possible. Our testing is rigorous and time-consuming, but we feel the effort is worth it. In an attempt to leave no question unanswered, this page contains not only our testbed specifications, but also a detailed look at how we conduct our testing.
If there is a bit of information that we’ve omitted, or you wish to offer thoughts or suggest changes, please feel free to shoot us an e-mail or post in our forums.
The tables below list all of the hardware we use in our current CPU-testing machines.
|Intel X79 Test Machine|
|Processor||Intel Core i7-3960X (Six-core, 3.3GHz, 3.9GHz Turbo)|
|Motherboard||GIGABYTE G1.Assassin 2|
|Memory||Kingston Beast 4x8GB DDR3-2133 11-12-11|
|Graphics||NVIDIA GeForce GTX 660 Ti|
|Storage||Kingston HyperX 240GB SSD|
|Power Supply||Corsair AX1200|
|Chassis||Cooler Master HAF X Full-Tower|
|Cooling||Corsair H70 Liquid Cooler|
|Et cetera||Windows 7 Ultimate 64-bit|
|Intel Z87 Test Machine|
|Processor||Intel Core i7-4770K (Four-core, 3.5GHz, 3.9GHz Turbo)|
|Memory||Kingston Beast 2x8GB DDR3-2133 11-12-11|
|Graphics||NVIDIA GeForce GTX 660 Ti|
|Storage||Kingston HyperX 240GB SSD|
|Power Supply||Corsair HX850|
|Chassis||Corsair Obsidian 700D Full-Tower|
|Cooling||Noctua NH-U14S Air Cooler|
|Et cetera||Windows 7 Ultimate 64-bit|
|Intel Z77 Test Machine|
|Processor||Intel Core i7-3770K (Four-core, 3.5GHz, 3.9GHz Turbo)|
|Motherboard||GIGABYTE Z77X-UP4 TH|
|Memory||Kingston Beast 2x8GB DDR3-2133 11-12-11|
|Graphics||NVIDIA GeForce GTX 660 Ti|
|Storage||Kingston HyperX 240GB SSD|
|Power Supply||Corsair HX850|
|Chassis||Corsair Obsidian 700D Full-Tower|
|Cooling||Noctua NH-U14S Air Cooler|
|Et cetera||Windows 7 Ultimate 64-bit|
When preparing our testbeds for any type of performance testing, we follow these guidelines:
To aide in the goal of achieving accurate and repeatable results, we stop certain services in Windows 7 from starting up at boot. This is due to the fact that these services have the tendency to start up in the background without notice, potentially causing inaccurate test results. For example, disabling “Windows Search” turns off the OS’ indexing which can at random times utilize the hard drive and memory.
The most important services we disable are:
To ease the tedium of setting up an OS for a round of benchmarking, we rely on Acronis True Image to restore an install that we previously setup. These images include most of our benchmarks, a minimal number of drivers (LAN, graphics), an up-to-date OS and all of our above-mentioned tweaks. We create a total of two OS images; one for AMD, and one for Intel.
To help us deliver a well-rounded set of test results for each processor we evaluate, we use a variety of real-world applications and synthetic benchmarks.
Our current test suite consists of:
Most tests are run twice over with the results averaged. If there is an unnatural variance between the first two runs, then we continue to run the test until we receive a result we believe to be accurate.
If there’s design work that needs to be done, then Autodesk is sure to have the right tool. From 3D modeling to architectural design, Autodesk’s selection of highly-regarded tools is almost mind-numbing, and because both its 3ds Max and Maya applications have long been considered to be some of the best in their respective class, we opt to use them for our benchmarking here.
For the sake of all-around testing, we perform most of our benchmarking on this page with the help of SPEC’s SPECapc 3ds Max 2011 and SPECapc Maya 2012, although we also render an in-depth model/scene in the former. We’ll explain each benchmark as we go along.
We kick off our testing with one of the most comprehensive benchmarks in our test suite: SPECapc 3ds Max 2011. The overarching goal of those responsible for producing SPEC’s benchmarks is to deliver as well-rounded a test suite as possible for a respective field, such as 3D rendering and modeling, to produce accurate results that those responsible for purchasing hardware can take advantage of.
Designed to utilize both the CPU and GPU, SPECapc 3ds Max 2011 comes in both free and professional flavors, with the latter being the version we use. It’s comprised of 58 individual tests and takes about 6 hours to complete on a machine equipped with 12GB of RAM and a six-core Intel Core i7-990X.
Right off the bat, we have some interesting results to analyze. Being the “enthusiast” part, it’s a little disappointing to see the i7-3960X fall behind where GPU performance is concerned, especially since it’s the only platform that boasts dual x16 PCIe slots when going the multi-GPU route. CPU-wise, it cleans house, as you’d expect given the higher number of cores.
Comparing the 3770K to the 4770K also yields some interesting results. In each test, Intel’s latest performs better, but it really makes that evident in the overall CPU test, which proves it to be 15% faster.
For our second 3ds Max 2011 test, we render a scene commissioned from Bulgarian artist Nikola Bechev, entitled “Naomi: The Black Pearl”. The woman is comprised of over 7,000 polys with the entire scene totaling just over 106,000 vertices. Three light sources are used, with the entire scene being enhanced with HDR and ray tracing techniques, and subsurface scattering applied to certain objects. The scene is rendered at 1800×3600 as a production release, with HQ detail levels being used all-around.
SPECapc 3ds Max showed us that the i7-3960X’s overall brawn could at times help it dominate, but that’s not the case here with our real-world test. Instead, Intel’s i7-4770K is the part that’s dominant; a great showing overall.
Like its 3ds Max 2011 variant, SPECapc Maya 2012 is designed to stress various aspects of the tool, such as rendering with standard and HQ methods, working in wireframe mode and so forth across numerous models and one overarching scene titled “Toy Store”.
3ds Max user? The i7-3960X looks to be the better option. Maya user? The 4770K gets the nod. Again, we see Intel’s second-generation Core i7-3960X score lower overall in the GPU tests, to a much greater degree than we would have expected. We’ll see later if that actually carries itself over to gaming.
Like Autodesk’s 3ds Max and Maya 3D tools, Maxon’s Cinema 4D is a popular cross-platform 3D design tool that’s used by new users and experts alike. Maxon is well-aware that its users are in need of some rather beefy PC hardware to help speed up rendering times, which is one of the reasons the company itself releases its own benchmark, Cinebench.
There are a couple of reasons we like to use Cinebench in our testing. For one, it’s freely available for anyone to download, unlike our Autodesk-based tests. Second, it has the capability to scale up to 64 threads, which means we’ll easily be able to rely on it until the next version hits. As a faster CPU can also help improve the GPU computational pipeline, we also like that it includes an OpenGL benchmark as well. The fact that the benchmark completes in mere minutes is another perk.
The results here almost perfectly mimic those we saw with SPECapc 3ds Max. The i7-3960X scores highest in overall performance, but falls behind the others in the graphics department. Overall, the i7-4770K sees modest gains in the CPU, but huge gains in the OpenGL tests (the differences were so large, we had to retest each configuration; the results stand).
The “Persistence of Vision Ray Tracer” is a multi-platform ray tracing tool that allows you to take your previously-created environments and models and apply a ray tracing algorithm based on a script you either created yourself or borrowed from others. The tool is free and has become a standard in the ray tracing community, with some of the ‘Hall of Fame’ results able to be found here.
For our testing, we run the built-in benchmark in both single-threaded and multi-threaded mode. The results are presented in “pixels-per-second” – a simple metric, but one that’s easy to understand.
This graph is a little bizarre to look at, as one CPU excelled in the single-thread test, and another, the multi-thread. Do you want to take a guess as to the multi-thread winner? It’s of course the six-core i7-3960X. The single-threaded champ is the i7-4770K – an excellent showing overall, and reassuring given single-threaded operations are more common in regular desktop use than multi-threaded ones.
With our 3D modeling and rendering tests out of the way, let’s dive right into another popular use for high-end machines: video editing and encoding. Scenarios here could include encoding a large movie into a mobile format, ripping a Blu-ray to your PC and encoding it for HTPC use, or encoding a family video you painstakingly edited.
Adobe’s Premiere Pro likely needs no introduction. It’s a tool used by the amateur and professional video content creator alike due to the extreme control it provides along with all of the important codecs, presets, filters and tweaking options. Premiere Pro can be used for any sort of video, be it real-life, animated, 3D or even game footage.
For our benchmarking, we encode a project that consists of 35GB worth of game footage from Payday: The Heist, which we encode to MPEG2 Blu-ray 1080p/30. The resulting video can be seen here.
To ensure an encode delivers the best possible video quality, we enable the “Maximum Quality Render”, which results in nearly 100% CPU utilization on up to 12 threads (we have not tested on CPUs that have more than 12 threads).
Core count matters a lot with a test like this, so the i7-3960X wins rather handily. It’s the i7-4770K vs. i7-3770K matchup that’s most interesting though, given one replaces the other. There, Intel’s latest proves to be quite the performer, shaving almost 11% off of the total time.
Premiere Pro is meant to be used as a professional tool for editing and encoding, while HandBrake acts strictly as an encoder, able to take one video format and encode it to another according to your specifications. While there are many presets available from the get-go, you’re able to customize whatever’s available, or create your own. It’s a simple tool with complex capabilities.
Here, we have a project that makes use of a Blu-ray rip of Pixies: Live at the Paradise in Boston. With it, we encode the first 10 minutes of the concert to an archival-quality 720p MKV. The archival-quality encode is time-consuming, but it can take full advantage of a 12 threaded processor. For those interested, our H.264 options are:
The results here almost perfectly scale with what we saw with our Adobe Premiere Pro test. The i7-4770K does eke a bit more performance in this test against the i7-3770K than the other, but it still averages out to about 11%.
CyberLink is a company that’s quick to jump on new technologies, and it’s for that reason that CPU vendors – namely Intel – like to promote its products for use in benchmarking. In MediaExpresso’s case, this converter app can take advantage not only of basic CPU accelerators, but QuickSync and also AMD Radeon and NVIDIA GeForce.
We test a total of five configurations here:
Because it’s a little hard to follow in a graph alone, we also include the same results in a table. This allows us to show you the fastest run overall, and then look at how each CPU fared in individual tests without having to squint through the results.
|CPU (BQ)||CPU (FC)||QS (BQ)||QS (FC)||GTX 660|
|CPU = CPU only; QS = QuickSync; BQ = Better Quality; FC = Faster Conversion|
Within these results lies one that’s just a bit too odd: QuickSync + Better Quality on the i7-4770K. For some reason, that configuration required more than 3x the amount of time to complete than the i7-3770K did. We’ve shot an email off to Intel to see if we can’t get to the bottom of this, because it’s the only test that behaved like this (the others remained consistent; we retested from the beginning to verify).
That aside, in both of the tests that used only the CPU, the six-core i7-3960X sits comfortably in front. As expected, and outside of that bizarre QuickSync run, Intel’s i7-4770K gives us the overall performance boosts we’d expect. Interestingly, virtually no difference is seen between the 3rd and 4th gen Core processors when running the GeForce test, although it did prove slower on the 2nd gen i7-3960X.
Photo manipulation benchmarks are more relevant than ever given the proliferation of high-end digital photography hardware. For this benchmark, we test the system’s handling of RAW photo data using Adobe Lightroom 4.4, an excellent RAW photo editor and organizer that’s easy to use and looks fantastic. You can check out our full review of the tool here.
For our testing, we take a total of 500 RAW files spread across 250 .NEFs captured with a Nikon D80 and 250 .CR2 captured across a Canon 40D and 5D Mark II. We export all of these files to a matte-sharpened quality 90 JPEG resized to a resolution of 1000×660 – similar to a lot of photos we use here on the website. The test is timed indirectly using a stopwatch as the program doesn’t record the duration itself.
Lightroom isn’t the most multi-threaded tool out there, but it does prove to execute its batch jobs better on a CPU with a higher number of cores – such as the i7-3960X. Between the i7-4770K and i7-3770K, Intel’s latest is faster, but not by much.
You own hundreds, thousands, or even tens of thousands of songs, all encoded to a pristine lossless format such as FLAC. Your mobile device on the other hand, supports either MP3 or AAC. What’s the solution? There are several, but the one I’ve relied on for almost ten years has been dBpoweramp. It’s both flexible and powerful, which happen to be two important factors for those who take their music seriously.
Recent versions of dBpoweramp have opened up the ability to encode more than one track at once, up to a limit of one-per-thread. With twelve-thread CPUs on the market, that ability can greatly improve overall times. For our testing here, we take 500 unique FLAC files that average about ~30MBs and encode them using the “high-quality” setting to 320Kbit/s MP3.
Snap, so that’s what’s possible with a six-core? While the i7-3960X doesn’t exhibit its overall brawn too well in most of our tests, it sure does here. It’ll be interesting to see just how much of an improvement Ivy Bridge-E’s six-cores will bring – if any; after all, we’re seeing absolutely minor improvements between Ivy Bridge and Haswell here.
SiSoftware’s Sandra is a piece of software that needs no introduction. It’s been around as long as the Internet, and has long provided both diagnostic and benchmarking features to its users. The folks who develop Sandra take things very seriously, and are often the first ones to add support to the program long before consumers can even get their hands on the product.
As a synthetic tool, Sandra can give us the best possible look at the top-end performance from the hardware it can benchmark, which is the reason we use it to test much of our PC’s hardware. The fact that a free version exists so that you can also benchmark against our results is something we greatly appreciate.
The more threads a CPU has coupled with its frequency and architecture refinements, the faster it should be able to calculate complex math. We’re not talking about simple math that can be done on a calculator, but rather advanced calculation that is often used behind the scenes. Sandra’s Arithmetic test stresses the popular Dhrystone integer and Whetstone floating-point algorithms that have acted as a base for a countless number of benchmarks dating back as far back as the 70s.
The winner of most multi-threaded benchmarks is the CPU with the highest core count, and then the highest frequency. It’s no surprise, then, to see the i7-3960X top the list. In our quad-core matchup, the i7-4770K improves upon the i7-3770K’s performance only slightly.
One of the best reasons for upgrading or building a new PC is to increase the performance for multi-media work, whether it be editing or encoding. As we saw earlier in our results, faster CPUs can save minutes or even hours of time. To test such capabilities here, Sandra renders the famous Mandelbrot set in a total of 255 iterations and in 32 colors.
This is a test that’s been around for close to forever, but it still scales extremely well with thread counts and can benefit from new media-centric instruction sets, including AVX.
The six-core i7-3960X has proven itself to reign surpreme where heavily multi-threaded benchmarks are concerned, and while that’s still the case overall here, it doesn’t excel quite as much. In fact, the i7-4770K proves faster in the integer test, undoubtedly the result of fine-tuning done since the second-generation.
You might not be aware of it, but cryptography plays a major role in computing. With some algorithms proving more complex than the others, having a faster processor can dramatically improve performance – especially important on the server front. In Sandra’s benchmark, the mega-popular AES and SHA algorithms are computed, both with 256-bit key sizes.
On the hashing side of things, multi-threadedness doesn’t seem to affect too much, although performance does still scale as we’d expect across these three processors. Encryption by way of AES, however, can take good advantage of additional cores along with general architecture tweaks. The six-core i7-3960X wins hands-down here, but what we really care about is that the i7-4770K shows a noticeable gain over the i7-3770K.
There’s little that can stress a CPU’s worth quite as much as number-crunching, and for that reason, we take full advantage of Sandra’s financial analysis benchmark. In the past, we ran similar tests using an Excel spreadsheet that allowed us to run a macro based on the Monte Carlo pricing, but here, Sandra allows us to also test Black-Scholes and Binomial.
Once again, the i7-3960X proves what six cores can do, while the i7-4770K continues take a few steps ahead of the i7-3770K.
When hard drives densities measured in the megabytes or single-digit gigabytes, data compression became something that even the layman computer user took advantage of. In fact, even entire hard drives could be used in compressed mode to help increase the overall storage. Today, such methods aren’t required thanks to hard drives ranging in the thousands of gigabytes, but compression is still used on a regular basis by many people, either for storing a folder for backup, encoding music, converting a photo and et cetera. On servers, compression is often used to shrink mega-large log files.
For our compression testing, we enlist the help of 7-zip 9.20. We take a 772MB folder that consists of 39,236 highly-compressible files and archive it using an ‘Ultra’ level of compression using the LZMA2 algorithm. This results in an archive weighing in at about 137MB.
The differences between the i7-3770K and i7-4770K are about as minor as it could get, but the six-core CPU once again shows that in some tests, it can truly excel.
In terms of complexity, Euler3D is one of our most advanced benchmarks, and also one of the quickest to run. It calculates the fluid dynamics properties of the AGARD 445.6 aeroelastic test wing as it was tested in-house at NASA’s Langley Research center. It’s calculated using Euler equations, with results printed out as Hz and time-to-complete (seconds). A benchmark such as this is useful to those who work designing products where physics has to be considered, whether it be a wing, a car, a ship and so on.
The scaling exhibited here is just about what we’d expect to see at this point.
SPEC’s CPU2006 is the most comprehensive benchmark in our test suite. Its goal is to test both the general execution performance of a machine and also the chosen compiler, and as such, it makes great use of all available threads across one or more CPUs along with the memory sub-system.
You might not have heard of SPEC before, and if so, it’s likely because the non-profit group creates benchmarks targeted at the enterprise rather than the desktop. The folks responsible for each one of its benchmarks take things extremely seriously, and nothing gets released without extensive review. Many companies belong to SPEC as members, offering input and other insight. Some of these include AMD, Intel, Apple, ASUS, HP, Fujitsu, IBM, Lenovo, Microsoft, NEC, NVIDIA, Novell, Red Hat, Super Micro, VMware, Dell and EMC.
The CPU2006 suite is a about as complicated to explain as it is to run. We’ve prepared what we feel to be the best possible configuration for use with the tool, and as the result of much testing, we use Intel Compiler version 12 coupled with Microsoft Visual Studio 2008 for our testing. This is one of the few current configurations that can deliver submittable results, as Intel Compiler supports the most recent C standard, C99, whereas most compilers do not (in Linux, gcc would be a good replacement).
Due to its inherent design to run each test three times over, we do not run the entire CPU2006 more than once, as it would be redundant. At the same time, a full run on an Intel Core i7-2600K takes just over 13 hours to complete, so it’s not feasible to run the entire suite multiple times over. Because all current CPUs include AVX acceleration, we enable that in here in our testing.
More information on the suite and how we use it can be read about in this forum post.
Unfortunately, we were unable to execute the test on the Core i7-3960X (something we’re looking into, and hoping to rectify for an upcoming look at another processor). However, the most important match-up here is i7-3770K vs. i7-4770K, and what we see is once again the scaling we’d expect. Nothing major, but gains nonetheless.
The great thing about CPU2006 is that it tests the CPUs execution performance against a large number of different scenarios, all seen above. This gives us a well-rounded look at how one CPU compares to another, and in this particular case, that’s important, given we’re looking at a direct sucessor.
Most of the tests exhibit a modest nod to the i7-4770K, which is to be expected. In other tests, the 4770K actually manages to deliver performance quite a bit better; notable examples include 483.xalancbmk in the integer tests, and 436.cactusADM in the floating-point tests. Interestingly, the i7-4770K actually fell to the i7-3770K in one test: 434.zeusmp, a fluid dynamics test. The difference is minimal, but it stands out as it’s the only test to exhibit a lesser result on Intel’s latest chip.
Overall though, Intel’s Core i7-4770K continues to give us the speed boosts we expected.
Futuremark’s no stranger to most enthusiasts, as its benchmarking software has been considered a de facto standard for about as long as it’s been fun to benchmark. While its 3DMark software is undoubtedly the company’s most popular offering, PCMark is a great tool for summing up the performance of a PC with gaming being a minor focus rather than a major one.
Futuremark’s latest PCMark, 8, consists of five main test suites: Home, Creative, Work, Storage, and Applications. The goal of each is to show how a system will perform overall in a given scenario, and their titles sum up each respective goal nicely. The Applications suite consists of two sub-suites; one for Adobe’s Creative Suite (or Creative Cloud), and the other for Microsoft Office. Of all these suites, we run them all except for the Storage, as it’s not that relevant.
For fun, we also include the overall test results with PCMark 7 (just can’t bear to let it go!).
Things don’t look so good for the Core i7-3960X here. Despite boasting six cores, versus the 3770K + 4770K’s four, it only excels in a single test, and which one isn’t much of a surprise: Creative.
Prior to our testing here, we had no idea that Adobe’s Creative Suite took advantage of Intel’s QuickSync, but apparently it does. Using the IGP on either the 3770K or 4770K, the best performance can be seen – they even beat out NVIDIA’s GeForce GTX 660 Ti, which also helps accelerate select Adobe apps.
Overall, a great showing for the i7-4770K; it’s a clear winner among this trio of CPUs. It falls a bit short in the Creative test against the i7-3960X, but it comes close.
The faster the processor, the better its bandwidth and latencies are. Where memory is concerned, however, there are many more factors at play. While frequency plays a major role in overall memory performance, the memory controller can make an even greater improvement, based on its implementation and also its capabilities.
With Intel’s Sandy Bridge-E, we were given a quad-channel controller, while Intel’s (and AMD’s) other platforms stick to a dual-channel design. A quad-channel controller could in theory provide twice as much bandwidth as a dual-channel one. How the controller is integrated into its chip along with the memory’s frequency determines the latency.
While faster memory bandwidth and lower latencies can improve overall computer performance, the faster each core can work with one another along with how much bandwidth a cache can handle rounds out the most important factors of PC performance. The results of all of these are tackled on this page.
Quad vs. Dual channel is summed-up quite nicely here.
Here’s some pretty interesting results, and a testiment to the improvements Intel has made to its architectures over the years. The L1 performance on the i7-4770K is better than both the i7-3960X and i7-3770K, which isn’t too much of a surprise. L3 performance is on par between the two quad-cores, and L2-wise, the 4770K exhibits a nice lead.
Given the much-improved cache bandwidth that we saw on the i7-4770K above, it’s a little surprising to see it fall behind the older i7-3770K in the latency department. Inter-core bandwidth does prove a bit better on Intel’s Haswell, though.
Game benchmarks stand to see the least amount of gain in comparison to our other tests, but they’re necessary for the sake of completeness. Also, while we benchmark hands-on for our graphics card content, we opt for synthetic testing here, as we’re utilizing the same GPU across each setup.
First up is the ever-popular 3DMark benchmark, and for the sake of completeness, we run all three tests (Ice Storm, Cloud Gate and Fire Strike).
Throughout most of our testing, Intel’s Core i7-3960X, despite having six cores, hasn’t outperformed the other CPUs as much as we would have thought. Apparently where gaming is concerned, or at least 3DMark, extra cores do make a difference, as evidenced here. Overall though, the differences are quite minimal, except in the DX9 Ice Storm test.
Real-time and turn-based strategy games tend to be the most stressful on both the GPU and CPU, and Total War: SHOGUN 2 does well to live up to that stereotype. The game is so stressful on a PC, in fact, that the developers included built-in benchmarks that are meant to test a PC in a worst-case scenario sort of way. For our testing here, we use we use both the 720p GPU and CPU benchmarks.
When the CPU is singled-out, Intel’s Core i7-4770K leads the pack. GPU-wise, all of the CPUs might as well be considered the same.
To help consumers understand what sort of power will be sucked from the wall thanks to their CPU, companies like AMD and Intel give us a “TDP” rating, which is in effect the realistic top-end wattage the CPU will hit. CPUs can sometimes go above these wattages, but that’s typically only when tools are used to specifically stress-test the processor.
To help keep our CPU at 100% load during a realistic scenario, we make use of our CyberLink MediaExpresso benchmark; specifically, the CPU Only + Best Quality configuration. But please note: the load results below are just that, (max) “load”. Most often (or never, depending on your usage), your CPU is not going to be at full load, making this a worst case scenario. Also, different motherboards can introduce a variance of up to 10 watts, so there’s no true apples-to-apples comparison unless more than one CPU is benchmarked on the same platform.
Our methodology here is simple: boot the machine up, let it sit idle for 10 minutes, capture the idle wattage, and then begin the benchmark. Because the encode stresses the CPU 100% throughout most of the encode, we only monitor the last couple of minutes’ worth.
Well, one of these CPUs is much hungrier than the others. Both the 3770K and 4770K level out, although we might have seen Intel’s latest get the nod if not for the fact that we had to use different motherboards for each build. If only Intel could stick to a single socket for more than one generation!
At this article’s outset, we established all of what makes Intel’s Haswell architecture so noteworthy. We always expect performance to be improved from one architecture to the next, and of course, we expect the same for power-efficiency. Here, though, Intel has retooled a lot in the quest for the ultimate of both worlds. As we’ve been able to see throughout all of our testing, it’s succeeded. Our power results above are an exception, but it’d take a notebook to prove the enhancements there to us, given that’s where power consumption is really important.
Power stuffs aside, Haswell isn’t going to be a “must-have” architecture for those who rock a PC a generation or two old. The reason to upgrade in that particular case might be to get the most feature-rich motherboards the planet has ever seen, and support for things like PCIe 3.0 and improved SATA 3.0 / USB 3.0 performance. Over Ivy Bridge, Intel estimates users will see a 5~10% performance increase. GPU-wise (IGP), that figure can jump to 2x. Battery-life? +50% for most workloads.
Close-up of a Haswell wafer
One area we didn’t tackle in this article is the IGP – or at least not to the degree I’d like to. In the future, I’d like to test Intel’s latest against AMD’s, and see where things really stack-up overall. We did get some non-gaming testing in, however, thanks to PCMark 8 and MediaExpresso. With PCMark, we found that Intel’s IGP accelerated Adobe’s suite of products better than NVIDIA’s GeForce (something we didn’t expect), and when using QuickSync, media encodes happen faaaaaaaast.
To sum-up, Intel’s Haswell is a great architecture for a number of reasons, and for those building a new system, it’s an ideal choice for the ultimate in performance and efficiency. Its performance improvements over Ivy Bridge don’t warrant even a minor consideration of an upgrade, but for those with older platforms, Haswell is well-worth considering.
Copyright © 2005-2019 Techgage Networks Inc. - All Rights Reserved.