With Core i7’s launch due in just a few weeks, there’s no better time than right now to take a hard look at its performance, which is what we’re taking care of today. In addition to our usual performance comparisons with last-gen CPUs, we’re also taking an in-depth look at both QPI and HyperThreading performance, and some of our results may surprise you.
Recapping the Nehalem Architecture
We’ve been talking about Nehalem since last spring, so overall, I think that most of us by now already have a good grasp with regards to its new features and things we should be excited about. This page isn’t going to be an exhaustive look at the architecture as a whole, but rather be a quick look at some of the features that really make Nehalem unique when compared to Core 2.
If you need a simple way to remember all of what’s new, just remember this string: IMC3CQPIHTBCLKL3TURBO. See? Don’t I make things easier?!
Dynamic Speed Technology (AKA: Turbo)
One of the biggest new features on Nehalem that stunned us a few months ago during IDF was Turbo mode… a feature that actually overclocks your processor to some small degree during full load, something that you’d never expect to see Intel ship as an actual feature. What this means, is that the clock speed given to any Core i7 processor isn’t truly correct, because at full load, it’s increased by at least 133MHz.
At any given time during use, if the processor hits full load on any of its cores, then Turbo will kick in and increase the Base Clock by a multiple of 1x. For example, on the 920 which features a CPU clock of 2.66GHz, if full load is achieved, then it essentially becomes 2.79GHz. What’s even cooler is that if only the first core needs an extra boost, eg: for a single-threaded application, then the multiple becomes 2x. So running a single-threaded application on the same 920 could run on a clock that’s been boosted to 2.93GHz.
On an Extreme edition processor, these individual figures can be adjusted manually depending on the motherboard, but for all the others, they are locked within the processor and cannot be altered. If you turn Turbo mode off (I’m not sure why you’d want to), then your CPU clock won’t budge an inch above stock.
It’s definitely an interesting feature, and one that’s going to be appreciated by pretty-much everyone, even if they don’t realize it. Whether or not low-end Core i7 models will boost all four ratios at full load is unknown, but Intel may very-well adjust such things to keep the models “budget”.
One of the most important new features is the QuickPath Interconnect, or QPI for short. “Interconnect” explains its purpose quite well. It offers a direct link to other system components, most notably the memory and X58 chipset, and though it replaces the typical front-side bus, it serves a similar purpose. Also like the FSB, different i7 models will have different QPI ratings, with the top-end 965 running at 6.4GT/s, and the two below it running at 4.8GT/s.
The term “GT/s” probably doesn’t mean much to you, but it’s the proper term used to represent a gigatransfer, with 1 GT/s being equal to one billion transfers per second. With a clock that runs upwards and downwards, the effective megahertz rating is doubled, which means 1MHz is equal to 2 MT/s, and likewise, 3200MHZ is equal to 6400MHz, or 6.4GT/s.
How the GT/s is really calculated on the CPU is a little more complex, but we’ll be talking more about in an upcoming overclocking article. Generally speaking, each QPI setting in the BIOS will have a separate multiplier, and depending on your Base Clock, the QPI frequency will adjust accordingly. One example I can give is that if a 4.8GT/s setting is chosen in the BIOS, it’s equal to an 18x multiplier, and if the Base Clock is 133MHz, then the raw QPI frequency will be 2400MHz. Just how much does that particular frequency matter? That’s yet to be seen.
The Return of HyperThreading
Another notable feature is HyperThreading, which was first tried years ago, but failed to some degree. It’s making a comeback though, and as we’ll see, that’s a good thing. When HyperThreading first made an appearance, there was a lack of two things: multi-core processors and multi-threaded applications. Since we have both now in some quantity, we can actually begin to appreciate what benefits HT can bring.
In the simplest of explanations, HyperThreading essentially allows one core to be split into two threads, which means more than one job can be handled at any given time. With applications that can utilize more than four threads, benefits are sure to be seen, although that will less likely be the case with single-threaded applications.
Triple-Channel Memory Controller
Intel has long been criticized for not admitting that AMD was correct in that a memory controller belongs on the CPU die, but five years after the launch of Athlon 64, we’re finally seeing it happen on i7. Not ones to simply join a camp, they took things one-step further by given it triple-channel functionality… something that promises to offer uncompromising levels of bandwidth.
As we’ll see later, that’s definitely the case. In fact, what we’ll see later is that even while using a slower kit on Core i7, the bandwidth will be more than twice what we’d see on Core 2 with an even faster kit. With an extreme kit, we might almost reach levels of threefold.
What difference will this make to the majority of people? It’s hard to say, but there’s a good chance that it won’t make an ounce of difference for most users, regardless of how much hardcore gaming you do, or hardcore multi-tasking. Such a thing is a little difficult to measure, but where benefits would be seen are in the server market, where memory is constantly being taxed.
Another aspect of Core i7 that has been altered is the cache hierarchy, which now features not only an L1 and L2 cache, but also an L3, which is where most of the on-die memory is being held. Like Core 2, the L1 cache includes a 32kB instruction and data cache, while the L2 cache has been modified to hold 256kB per core, or 1MB in total.
The L3 cache features up to 8MB of memory, which will likely decrease once more budget-oriented i7 models hit the market. As it stands today though, all three models share an identical cache hierarchy. Just like Nehalem in general, the Cache system is completely modular and scalable, so if 8MB doesn’t prove to be enough later for certain applications, Intel can increase the size as the need arises.
Adding an L3 cache might seem like a needless way to add latency, but that isn’t really the case. To increase performance, whatever data is hiding in the L1 and L2 will be present in the L3, and thanks to the QPI adding incredible bandwidth between the cache and the cores, latency isn’t supposed to be affected much at all.
Time to Tackle the Fun Stuff
That about covers all of the important features of Nehalem, though it certainly doesn’t stop there. Past what’s mentioned above, the power efficiency on i7 is far better than anything we’ve ever seen before, with the ability to turn cores on and off on a whim, underclock when not completely needed, and of course, overclock as well.
Overall, Nehalem is an incredible upgrade underneath the hood when compared to previous generations, and Intel themselves state that it’s the single biggest architectural upgrade since the launch of the original Pentium Pro in 1995. Now that’s a statement.
In short, Core i7 is leaner, meaner and packs more tricks than Penn & Teller. That’s all that needs to be said. So let’s get right into a look at considerations you should have when considering to build a Core i7 PC.