Date: November 3, 2008
Author(s): Rob Williams
With Core i7’s launch due in just a few weeks, there’s no better time than right now to take a hard look at its performance, which is what we’re taking care of today. In addition to our usual performance comparisons with last-gen CPUs, we’re also taking an in-depth look at both QPI and HyperThreading performance, and some of our results may surprise you.
Two summers ago, Intel helped change all we thought we knew about processors with the launch of their Core 2 micro-architecture. At a time when we thought processor development was beginning to slow down, Core 2 forced our minds to be changed, as the improvements over previous NetBurst products were rather significant.
Since that time, the CPU landscape has changed dramatically. At the time Core 2 first launched, Dual-Cores were still considered to be more of a novelty, and some still were unsure why they were even necessary. For those lucky enough to own one, however, the benefits became clear, and the multi-core revolution was quickly born.
At the same time, the thought of Quad-Core processors did little more than spawn laughter. After all, if most people couldn’t take advantage of a Dual-Core, where did a Quad-Core fit in? Well, thinking about it didn’t last too long, as Intel quickly followed up their initial Core 2 launch with a Quad-Core model in the fall of the same year.
As the amount of developers writing multi-threaded applications grows, the benefits of a Quad-Core CPU is better seen now than ever before. In fact, we posted a list just earlier this week
that proves it. Whether you are a multi-tasker, media buff or someone who simply loves having a lot of headroom, Quad-Cores make an excellent addition to any new PC build.
Core i7 is almost here, but that will come as a surprise to no one, as potential release dates have been hovering around rumor-ville for months. The official response came last month, during IDF Taipei. There, Intel told the world that we would see Core i7 before the end of November, although no definitive street date was given.
Today’s article will serve as a preview into what to expect from Core i7 from a performance perspective. This will become the first of a few different articles that we’ll be posting in the weeks to come, which will target more specific areas of Nehalem and its platform. So, consider today’s look as a good way to whet your appetite. There’ll be more good stuff en route.
Core i7, or Nehalem as we’ve been calling it for the past year, becomes part of Intel’s “Tock” step, which denotes a brand-new micro-architecture built on the current process node. “Tick” will come next year in the form of Westmere, a 32-nm shrink of Nehalem. If you are not up to speed on everything that the new micro-architecture brings to the table, the next page in this article was made for you.
When Core i7 hits the street, three models will become immediately available. This is a little different than most other Intel launches, which normally see the highest-end part released first. Instead, this launch will also see the release of both the mainstream and mid-range parts. This is a great thing for obvious reasons, so now the only thing to worry about is stock.
|Intel Core i7 Extreme 965|
|Intel Core i7 940|
|Intel Core i7 920|
|Intel Core 2 Extreme QX9775|
2 x 6MB
|Intel Core 2 Extreme QX9770|
2 x 6MB
|Intel Core 2 Extreme Q9650|
2 x 6MB
|Intel Core 2 Quad Q9550|
2 x 6MB
|Intel Core 2 Quad Q9450|
2 x 6MB
|Intel Core 2 Quad Q9400|
2 x 3MB
|Intel Core 2 Quad Q9300|
2 x 3MB
|Intel Core 2 Quad Q8200|
2 x 2MB
The top-of-the-line i7 processor will be the Extreme 965, at 3.20GHz. As is typical of all newly-launched Intel Extreme editions, this one will be sold at a price of $999 in quantities of 1,000. This means that you can expect a price of closer to $1,100 if you wish to own one. Moving downwards, the 2.93GHz model will sell for $562, while the mainstream 920 will be sold at $284.
Like previous Extreme products, the 965 will be a fully-unlocked chip with a Turbo multiplier capable of hitting 40x, the default being 24x. The 940 and 920 are capped at their stock multipliers (22x and 20x, respectively) and can only be overclocked by increasing the Base Clock, or BCLK for short. That tells us right away that these processors are going to be more of a challenge to overclock than anything from the Core 2 line-up, and we’ll get into the specifics of why later.
We won’t be covering overclocking to a great extent in this initial article, but stay tuned as we’re preparing a dedicated article about it which will be posted at some point this week. Without getting too far off-track, let’s take a look at Intel’s latest baby, shall we?
Comprised of 731 million transistors on a surface area of 263mm^2, Intel’s latest processor is a little bit of a strange beast, since it’s larger in areal density than its predecessor (214mm^2), but uses less transistors. Why that’s the case exactly, I’m unsure.
In the above image, you can see a direct comparison of the QX9770 (left) sitting next to the Core i7 965 (right), but contrary to what you may believe, the reason for the larger overall CPU isn’t entirely attributed to the physically larger die. Rather, because of all of Nehalem’s enhancements, additional pins were required, and by additional, I mean many additional, as you can see in the below photo:
Each of the i7 processors to be launched later this month have identical pin and filter cap layouts on the back, which leads us to believe that each is identical inside, with the obvious multiplier and model code changes. In previous architectures, some of the filter caps would be laid out differently, or some would be missing on the smaller models, but not here. We could assume that smaller models, when eventually released, will look slightly different on the back.
As mentioned above, today’s article is a preview, not a review, as there is a lot more testing that needs to be conducted that couldn’t yet be focused on due to time. So, we’ll be following-up with more specific content over the course of the next few weeks, including a deeper look at gaming performance and overclocking. We’ll finish it all off with a proper “review” nearer to the official launch.
Today’s article will be focusing primarily on two things. First will be simple performance scaling between the three new processors and three top processors from Intel’s Kentsfield line-up, including the QX9770 and Q9450. The second will be a performance look at Core i7’s new features, including Turbo and HyperThreading. I can assure you… these are results you won’t want to skip over.
For a recap of the most important features of Nehalem, turn to the next page. Afterwards, we’ll cover all considerations you should bear in mind if you plan to build a new machine with a Core i7 processor at the… uhh, core.
We’ve been talking about Nehalem since last spring, so overall, I think that most of us by now already have a good grasp with regards to its new features and things we should be excited about. This page isn’t going to be an exhaustive look at the architecture as a whole, but rather be a quick look at some of the features that really make Nehalem unique when compared to Core 2.
If you need a simple way to remember all of what’s new, just remember this string: IMC3CQPIHTBCLKL3TURBO. See? Don’t I make things easier?!
One of the biggest new features on Nehalem that stunned us a few months ago during IDF was Turbo mode… a feature that actually overclocks your processor to some small degree during full load, something that you’d never expect to see Intel ship as an actual feature. What this means, is that the clock speed given to any Core i7 processor isn’t truly correct, because at full load, it’s increased by at least 133MHz.
At any given time during use, if the processor hits full load on any of its cores, then Turbo will kick in and increase the Base Clock by a multiple of 1x. For example, on the 920 which features a CPU clock of 2.66GHz, if full load is achieved, then it essentially becomes 2.79GHz. What’s even cooler is that if only the first core needs an extra boost, eg: for a single-threaded application, then the multiple becomes 2x. So running a single-threaded application on the same 920 could run on a clock that’s been boosted to 2.93GHz.
On an Extreme edition processor, these individual figures can be adjusted manually depending on the motherboard, but for all the others, they are locked within the processor and cannot be altered. If you turn Turbo mode off (I’m not sure why you’d want to), then your CPU clock won’t budge an inch above stock.
It’s definitely an interesting feature, and one that’s going to be appreciated by pretty-much everyone, even if they don’t realize it. Whether or not low-end Core i7 models will boost all four ratios at full load is unknown, but Intel may very-well adjust such things to keep the models “budget”.
One of the most important new features is the QuickPath Interconnect, or QPI for short. “Interconnect” explains its purpose quite well. It offers a direct link to other system components, most notably the memory and X58 chipset, and though it replaces the typical front-side bus, it serves a similar purpose. Also like the FSB, different i7 models will have different QPI ratings, with the top-end 965 running at 6.4GT/s, and the two below it running at 4.8GT/s.
The term “GT/s” probably doesn’t mean much to you, but it’s the proper term used to represent a gigatransfer, with 1 GT/s being equal to one billion transfers per second. With a clock that runs upwards and downwards, the effective megahertz rating is doubled, which means 1MHz is equal to 2 MT/s, and likewise, 3200MHZ is equal to 6400MHz, or 6.4GT/s.
How the GT/s is really calculated on the CPU is a little more complex, but we’ll be talking more about in an upcoming overclocking article. Generally speaking, each QPI setting in the BIOS will have a separate multiplier, and depending on your Base Clock, the QPI frequency will adjust accordingly. One example I can give is that if a 4.8GT/s setting is chosen in the BIOS, it’s equal to an 18x multiplier, and if the Base Clock is 133MHz, then the raw QPI frequency will be 2400MHz. Just how much does that particular frequency matter? That’s yet to be seen.
Another notable feature is HyperThreading, which was first tried years ago, but failed to some degree. It’s making a comeback though, and as we’ll see, that’s a good thing. When HyperThreading first made an appearance, there was a lack of two things: multi-core processors and multi-threaded applications. Since we have both now in some quantity, we can actually begin to appreciate what benefits HT can bring.
In the simplest of explanations, HyperThreading essentially allows one core to be split into two threads, which means more than one job can be handled at any given time. With applications that can utilize more than four threads, benefits are sure to be seen, although that will less likely be the case with single-threaded applications.
Intel has long been criticized for not admitting that AMD was correct in that a memory controller belongs on the CPU die, but five years after the launch of Athlon 64, we’re finally seeing it happen on i7. Not ones to simply join a camp, they took things one-step further by given it triple-channel functionality… something that promises to offer uncompromising levels of bandwidth.
As we’ll see later, that’s definitely the case. In fact, what we’ll see later is that even while using a slower kit on Core i7, the bandwidth will be more than twice what we’d see on Core 2 with an even faster kit. With an extreme kit, we might almost reach levels of threefold.
What difference will this make to the majority of people? It’s hard to say, but there’s a good chance that it won’t make an ounce of difference for most users, regardless of how much hardcore gaming you do, or hardcore multi-tasking. Such a thing is a little difficult to measure, but where benefits would be seen are in the server market, where memory is constantly being taxed.
Another aspect of Core i7 that has been altered is the cache hierarchy, which now features not only an L1 and L2 cache, but also an L3, which is where most of the on-die memory is being held. Like Core 2, the L1 cache includes a 32kB instruction and data cache, while the L2 cache has been modified to hold 256kB per core, or 1MB in total.
The L3 cache features up to 8MB of memory, which will likely decrease once more budget-oriented i7 models hit the market. As it stands today though, all three models share an identical cache hierarchy. Just like Nehalem in general, the Cache system is completely modular and scalable, so if 8MB doesn’t prove to be enough later for certain applications, Intel can increase the size as the need arises.
Adding an L3 cache might seem like a needless way to add latency, but that isn’t really the case. To increase performance, whatever data is hiding in the L1 and L2 will be present in the L3, and thanks to the QPI adding incredible bandwidth between the cache and the cores, latency isn’t supposed to be affected much at all.
That about covers all of the important features of Nehalem, though it certainly doesn’t stop there. Past what’s mentioned above, the power efficiency on i7 is far better than anything we’ve ever seen before, with the ability to turn cores on and off on a whim, underclock when not completely needed, and of course, overclock as well.
Overall, Nehalem is an incredible upgrade underneath the hood when compared to previous generations, and Intel themselves state that it’s the single biggest architectural upgrade since the launch of the original Pentium Pro in 1995. Now that’s a statement.
In short, Core i7 is leaner, meaner and packs more tricks than Penn & Teller. That’s all that needs to be said. So let’s get right into a look at considerations you should have when considering to build a Core i7 PC.
When Intel launched Core 2 in the summer of 2006, there were some upgrading considerations to keep in mind, but nothing to the extent of things with Nehalem. Core 2 retained the same the same socket design, so CPU coolers could carried over. Likewise, if you were using DDR2 with your previous setup, then that could easily be carried over as well.
Well, things aren’t so peachy with Nehalem, and you’re going to have to be prepared to spend a little bit of money if you want to upgrade. The extra cost of entry won’t be so noticeable if you are building an entirely new rig, but prices will be a little higher than what you could get away with in current-gen.
The three main considerations are the motherboard, RAM and CPU cooler. Because Nehalem utilizes the new LGA1366 socket, an X58-based motherboard must be used. Although we won’t know until they hit the street as to their cost, rumored prices place almost every-single launch board above $300, so that’s the figure I’d be sure to keep in mind.
As with any launch, components will become increasingly less expensive as time passes, and I’d expect prices to drop on everything except the processors themselves almost immediately.
Where money might not have to be spent is with your CPU cooler, if you happen to own a current model that is still well-supported by the company who manufactures it. Because the CPU die itself didn’t increase much in size since Core 2, the bottom of any current CPU cooler can still completely cover Core i7’s IHS. This means, all that’s needed to use your current CPU cooler is an updated bracket.
The brackets required for LGA1366 are not that much larger, surprisingly, as you can see in the photo below. These particular brackets are designed for Thermalright’s Ultra-120, with the one on the left being darker in color as it’s meant to be equipped with the TRUE Black version of the cooler. As you can see, the size differences are rather minute overall, but just large enough to force you into spending some cash.
Almost all of the CPU cooler manufacturers I’ve talked to over the course of the past month have told me that they’d be releasing these updated brackets in time for launch. I warn you, however, that if you don’t happen to own what’s considered to be a “popular” model, an updated bracket might not become available. Since the changes between one company’s coolers may be minimal, one of their LGA1366 may work with your cooler, with minor modifications.
On average, I’d expect to have to shell out $10 for one of these updated brackets, although some companies may decide to reward your patronage and give you the updated bracket for free, with a proof of purchase. That’s exactly how Noctua plans to do things, so if your main cooler is theirs, you’re in luck.
Aside from the motherboard and CPU cooler, your other potential worry is RAM. Both Nehalem’s IMC and the X58 chipset are designed to support only DDR3, and I’m really unsure if we’ll see a DDR2 board from anyone, or if it’s even possible. What this means to you is, if you don’t already own a DDR3 kit (and maybe even if you do), be prepared to pick one up.
Thanks to Intel’s decision to create a triple-channel memory controller, every kit on the market right now makes no sense with Nehalem. In time for i7’s launch, there will be many different “tri-channel” kits available from all the popular vendors, such as OCZ, Corsair, Patriot and others. They will come in both 3GB and 6GB densities and range from DDR3-1066 – DDR3-2000 in frequencies.
If you own a current DDR3 kit, there’s a small chance it will not operate properly in an X58 board, unless the modules are designed to run with 1.65v of VDIMM. If you have high-end modules that have a stock speed that require more than 1.65v of voltage, it’s really difficult to say whether it will work or not.
Because i7 is so much more effective with three modules, though, you may want to consider an entirely new kit, rather than just add to the one you have, unless you happen to have 2GB modules. At launch, you will be able to find many 6GB tri-channel kits available for around $200, which isn’t too difficult to stomach.
That about wraps everything up. If you are planning to upgrade, be prepared to spend a little bit of money. If you need to get new RAM, the new build at a minimum should run you around $800, and that might not even be including a CPU cooler. It’s expensive right now, but I’d expect prices to drop rather fast, especially with the RAM kits and motherboards.
At Techgage, we strive to make sure our results are as accurate as possible. Our testing is rigorous and time-consuming, but we feel the effort is worth it. In an attempt to leave no question unanswered, this page contains not only our testbed specifications, but also a fully-detailed look at how we conduct our testing.
If there is a bit of information that we’ve omitted, or you wish to offer thoughts or suggest changes, please feel free to shoot us an e-mail or post in our forums.
The table below lists the hardware for our two current machines, which remains unchanged throughout all testing, with the exception of the processor. Each CPU used for the sake of comparison is also listed here, along with the BIOS version of the motherboard used. In addition, each one of the URLs in this table can be clicked to view the respective review of that product, or if a review doesn’t exist, you will be led to the product on the manufacturer’s website.
Core i7 Test System
Intel DX58SO – X58-based, 2624 BIOS (10/23/08)
DDR3: Qimonda 3x1GB – DDR3-1066 7-7-7-20-1T, 1.56v
Palit Radeon HD 4870 512MB (Catalyst 8.9)
Core 2 Test System
ASUS Rampage Extreme – X48-based, 0501 BIOS (08/28/08)
DDR3: Corsair XMS3 DHX 2x2GB – DDR3-1333 7-7-7-15-1T, 1.91v
Palit Radeon HD 4870 512MB (Catalyst 8.9)
When preparing our testbeds for any type of performance testing, we follow these guidelines:
To aide with the goal of keeping accurate and repeatable results, we alter certain services in Windows Vista from starting up at boot. This is due to the fact that these services have the tendency to start up in the background without notice, potentially causing slightly inaccurate results. Disabling “Windows Search” turns off the OS’ indexing which can at times utilize the hard drive and memory more than we’d like.
To help test out the real performance benefits of a given processor, we run a large collection of both real-world and synthetic benchmarks, including 3ds Max, Adobe Lightroom, TMPGEnc Xpress, Sandra 2009 and many more.
Our ultimate goal is always to find out which processor excels in a given scenario and why. Running all of the applications in our carefully-chosen suite can help better give us answers to those questions. Aside from application data, we also run two common games to see how performance scales there, including Call of Duty 4 and Half-Life 2: Episode Two.
In an attempt to offer “real-world” results, we do not utilize timedemos in any of our reviews. Each game in our test suite is benchmarked manually, with the minimum and average frames-per-second (FPS) captured with the help of FRAPS 2.9.5.
To deliver the best overall results, each title we use is exhaustively explored in order to find the best possible level in terms of intensiveness and replayability. Once a level is chosen, we play through repeatedly to find the best possible route and then in our official benchmarking, we stick to that route as close as possible. Since we are not robots and the game can throw in minor twists with each run, no run can be identical to the pixel.
Each game and setting combination is tested twice, and if there is a discrepancy between the initial results, the testing is repeated until we see results we are confident with.
The two games we currently use for our motherboard reviews are listed below, with direct screenshots of the game’s setting screens and explanations of why we chose what we did.
The Call of Duty series of war-shooters are without question some of the most gorgeous on the PC (and consoles), but what’s great is the fact that the games are also highly optimized, so no one has to max out their machine’s specs in order to play it. Since that’s the case, the in-game options are maxed out in all regards, except the Anisotropic Filtering, which is set to the center of the slider bar.
It might have been four-years-ago that we were able to play the first installment of the Half-Life 2 series, but it’s held up well with its new releases and engine upgrades. This is one title that thrives on both a fast CPU and GPU, and though it’s demanding at times, most any recent computer should be able to play the game with close to maxed-out detail settings, aside from the Anti-Aliasing.
In the case of very-recent mid-range cards, the game will run fine all the way up to 2560×1600 with maxed-out detail, minus Anti-Aliasing. All of our tested resolutions use identical settings, with 4xAA and 8xAF.
Synthetic benchmarks have typically been favored for performance testing, but the results they provide can be fairly abstract, and the methods they use to assign their scores can be dubious at times. By contrast, real-world application benchmarks provide performance metrics that apply directly to real-world usage, and we endeavor to apply both in our performance comparisons.
SYSmark 2007 Preview from BAPCo is a special case, because its synthetic scores are derived from tests in real-world applications. However, we still believe that synthetic benchmarking scores are best used to directly compare the performance of one piece of hardware to another, and not for developing an impression of real-world performance expectations. SYSmark is more useful than most synthetic benchmarking programs in our opinion, because its tests emulate tasks that people actually perform, in actual software programs that they are likely to use.
The benchmark is hands-free, using scripts to execute all of the real-world scenarios identically, such as video editing in Sony Vegas and image manipulation in Adobe Photoshop. At the conclusion of the suite of tests, five scores are delivered: an E-learning score, a Video Creation score, a Productivity score, and a 3D Performance score, as well as an aggregated ‘Overall’ score. These scores can still be fairly abstract, and are most useful for direct comparisons between test systems.
A quick note on methodology: SYSmark 2007 requires a clean install of Windows Vista 32-bit to run optimally. Before any testing is conducted, the hard drive is first wiped clean, and then a fresh Windows installation is conducted, then lastly, the necessary hardware drivers are installed. The ‘Three Iterations’ test suite is run, with the ‘Conditioning Run’ setting enabled. Then the results from the three runs are averaged and rounded up or down to the next whole number.
So far, there isn’t really a clear sign of just how much better Core i7 is, except with regards to 3D rendering. There, the differences are rather significant, to say the least. Can we expect similar results with real-world 3D applications? We’re taking care of that next.
Autodesk’s 3ds Max is without question an industry standard when it comes to 3D modeling and animation, with DreamWorks, BioWare and Blizzard Entertainment being a few of its notable users. It’s a multi-threaded application that’s designed to be right at home on multi-core and multi-processor workstations or render farms, so it easily tasks even the biggest system we can currently throw at it.
For our testing, we use two project files that are designed to last long enough to find any weakness in our setup and also allows us to find a result that’s easily comparable between both motherboards and processors. The first project is a dog model included on recent 3ds Max DVD’s, which we infused with some Techgage flavor.
Our second project is a Bathroom scene that makes heavy use of ray tracing. Like the dog model, this one is also included with the application’s sample files DVD. The dog is rendered at a 1400×1050 resolution, while the Bathroom is rendered as 1080p (1920×1080).
I prefer to begin off our processor reviews with rendering-type jobs, because if there is a crowd that can benefit from faster CPUs, it’s 3D designers. Most of these applications were multi-processor-capable even before Dual-Core processors were available, so they were really ahead of the game, and have a good grasp on how to maximize all available threads, as you can see above.
Interesting results to compare would be our QX9770 and 920. Even though the QX9770 has a much faster clock speed of 3.2GHz, the 2.66GHz 920 proved 21.8% faster overall in our Bathroom render, and 9.6% faster in our Techgage Dog render. These results are incredible, but are made more incredible by the fact that these are not clock-for-clock comparisons.
So let’s tackle that, then. In comparing the QX9770 at 3.20GHz to the new Extreme 965, also at 3.20GHz, the latter delivered results 45.9% faster in our Bathroom render and 31.6% faster in our Techgage Dog render. Judging by these results… if you’re a 3D artist, you need Core i7.
As we’ll see later in the article, Core i7 is faster clock-for-clock than Core 2 Quad (eg: Turbo turned off), but what helps speeds things up further is the addition of HyperThreading. These applications know how to make good use of all threads available, and the graph above proves it.
Like 3DS Max, Cinema 4D is another popular cross-platform 3D graphics application that’s used by new users and experts alike. Its creators, Maxon, are well aware that their users are interested in huge computers to speed up rendering times, which is one reason why they released Cinebench to the public.
Cinebench R10 is based on the Cinema 4D engine and the test consists of rendering a high-resolution model of a motorcycle and gives a score at the end. Like most other 3D applications on the market, Cinebench will take advantage of as many cores as you can throw at it.
Reaffirming our faith that i7 is all-around better for rendering, the results here show just how much potential there is in the new chip. Compare once again the QX9770 to the Extreme 965. For our multi-threaded test, the latter proved to be 33.9% faster, while as a single-threaded render, we saw improvements of 12.3%. So, while it’s still faster, it goes to show just how much rendering jobs thrive on having more threads to work with.
Similar to Cinebench, the “Persistence of Vision Ray Tracer” is as you’d expect, a ray tracing application that also happens to be cross-platform. It allows you to take your environment and models and apply a ray tracing algorithm, based on a script you either write yourself or borrow from others. It’s a free application and has become a standard in the ray tracing community and some of the results that can be seen are completely mind-blowing.
The official version of POV-Ray is 3.6, but the 3.7 beta unlocks the ability to take full advantage of a multi-core processor, which is why we use it in our testing. Applying ray tracing algorithms can be extremely system intensive, so this is one area where multi-core processors will be of true benefit.
For our test, we run the built-in benchmark, which delivers a simple score (Pixels-Per-Second) the the end. The higher, the better. If one score is twice another, it does literally mean it rendered twice as fast.
Although 3ds Max and Cinebench’s results blew our expectations to pieces, the largest performance gains are seen with POV-Ray, with an incredible 55.4% increase between Core 2 Quad’s and Core i7’s of the same clock speed.
What we can garner from our results is that Core i7 in general is going to decrease rendering times for most any type of 3D job, but that’s especially the case if ray tracing is used. Take for example our Bathroom model in 3ds Max. It uses ray tracing to a heavy degree, and that’s where our largest gains were seen. POV-Ray is no different, as its sole focus is ray tracing, so it’s no surprise to see equally-impressive gains.
Not a 3D designer? I wouldn’t stress too much over it, because 3D applications aren’t the only place where ray tracing can be used. Where else? Games! If Core i7 can increase performance with ray tracing-specific render jobs like those shown above, then it would imply that the same increases in performance would be seen in games that also use it.
This raises another question, or thought. If Intel increased ray tracing performance this much with Core i7, what does that mean for Larrabee? Good things, hopefully.
Photo manipulation benchmarks are more relevant than ever, given the proliferation of high-end digital photography hardware. For this benchmark, we test the system’s handling of RAW photo data using Adobe Lightroom, an excellent RAW photo editor and organizer that’s easy to use and looks fantastic.
For our testing, we take 100 RAW files (in Nikon’s .NEF file format) which have a 10-megapixel resolution, and export them as JPEG files in 1000×669 resolution, similar to most of the photos we use here on the website. Such a result could also be easily distributed online or saved as a low-resolution backup. This test involves not only scaling of the image itself, but encoding in a different image format. The test is timed indirectly using a stopwatch, and times are accurate to within +/- 0.25 seconds.
Fortunately for us, rendering jobs are not the only type to feel the benefits of Core i7. Comparing both our 2.66GHz and 3.20GHz chips together, an overall gain of close to 18% is seen. Not nearly as impressive as what we saw on the previous page, but a gain nonetheless, and a noticeable one if you do a lot of photographic work.
When it comes to video transcoding, one of the best offerings on the market is TMPGEnc Xpress. Although a bit pricey, the software offers an incredible amount of flexibility and customization, not to mention superb format support. From the get go, you can output to DivX, DVD, Video-CD, Super Video-CD, HDV, QuickTime, MPEG, and more. It even goes as far as to include support for Blu-ray video!
There are a few reasons why we choose to use TMPGEnc for our tests. The first relates to the reasons laid out above. The sheer ease of use and flexibility is appreciated. Beyond that, the application does us a huge favor by tracking the encoding time, so that we can actually look away while an encode is taking place and not be afraid that we’ll miss the final encoding time. Believe it or not, not all transcoding applications work like this.
For our test, we take a 0.99GB high-quality DivX H.264 AVI video of Half-Life 2: Episode Two gameplay with stereo audio and transcode it to the same resolution of 720p (1280×720), but lower the bitrate in order to attain a modest filesize. Since the QX9770 we are using for testing supports the SSE4 instruction set, we enable it in the DivX control panel, which improves both the encoding time and quality.
We seem to be right back on track with the impressive increases here. Once again, comparing our equally-clocked processors of different architecture-types together, we see a minimum increase gain of 25% where our HD video is concerned, and a much lower increase of ~7% for our mobile video.
What we can see so far is that noticeable performance gains are going to be seen more so on larger projects than smaller ones. We saw it with our 3ds Max test, and now this one. Gains of any kind are nice to see, but it’s the larger ones that are more exciting to talk about.
While TMPGEnc XPress’ purpose is to convert video formats, ProShow from Photodex helps turn your collection of photos into a fantastic looking slide show. I can’t call myself a slide show buff, but this tool is unquestionably definitive. It offers many editing abilities and the ability to export in a variety of formats, including a standard video file, DVD video and even HD video.
Like TMPGEnc and many other video encoders, ProShow can take full advantage of a multi-core processor. It doesn’t support SSE4 however, but hopefully will in the future as it would improve encoding times considerably. Still, when a slide show application handles a multi-core processor effectively, it has to make you wonder why there is such a delay in seeing a wider-range of such applications on the marketplace.
Up to now, we’ve seen increases in performance with Core i7 in every-single test we’ve conducted, but exactly what kind of increases in performance you’ll find seem to be completely random and hard to target. For example, on the last page, we tackled a mobile encode with TMPGEnc Xpress, which had a measly 7% increase in performance, but here, our DVD recode saw much larger increases of ~24%.
Like our TMPGEnc Xpress test though, it’s the HD video encode where the benefits of Core i7 begin to shine. Compare once again our 3.20GHz processors to each other. The Extreme 965 performed the encode job 38.9% faster, while where the 2.66GHz chips are concerned, the 920 saw a boost of 41.0%.
To help show a more “raw” version of the kind of potential Nehalem offers, we ran the Multi-Media test built into Sandra. This test here stresses the CPU’s ability to handle multi-media instructions and data, using both MMX and SSE2/3/4 as the instruction sets of choice. The results are divided by integer, floating point and double precision, three specific numbering formats used commonly in multi-media work.
The domination continues here, with overall performance increases ranging between 15 – 35%.
With each new processor launch, one thing that’s bound to prove faster are mathematical equations, which when all said and done, plays a massive role in a lot of our computing today. The faster an equation can be completed, the faster a math-heavy process can finish.
Sandra includes applications designed to specifically test the mathematical performance of processors, with the main one being the arithmetic test.
This is one particular area where Core i7 seems to shine. Comparing once again our equally-clocked processors, the Extreme 965 delivered a Dhrystone result 37.6% better than our QX9770, while the most stark increase is with Whetstone, where a 59.9% boost was exhibited. Almost identical gains are seen when comparing the 2.66GHz models as well, so the scaling here is quite good.
Crypto is a major part of computing, whether you know it or not, and certain processes can prove slower than others, depending on their algorithms. User passwords on your home PC are encrypted, as are user passwords on web servers (like in our forums). Past that, crypto is used in other areas as well, such as with creating of unbreakable locks on files or assigning a hash to a particular file (like md5).
In Sandra’s Cryptography test, the results are outputted as MB/s, higher being better. Although this is somewhat of an odd metric to go by, generally speaking, the higher the number, the faster the CPU tears through the respective algorithm, which comes down to how fast a password is either encrypted, decrypted, signed, et cetera.
Interestingly enough, this is one test that doesn’t totally glorify Core i7, and from our testing, we found the reason comes down to HyperThreading. As we’ll show later in the article, with HT turned off, the Core i7 results completely shift around. AES loves HT, while SHA doesn’t… it’s that simple.
Most, if not all, businesses in existence have to crack open a spreadsheet at some point. Though simple in concept, spreadsheets are an ideal way to either track information or compute large calculations all in real-time. This is important when you run a business that deals with a large amount of expenses.
Although the importance of how fast a calculation takes in an Excel file is, we include results here since they heavily test the mathematical capabilities of each processor. Because Excel 2007 is completely multi-threaded (it can even take advantage of an 8-Core Skulltrail), it makes for a great benchmark to show the scaling between all of our CPUs.
I’ll let Intel explain the two files we use:
Monte Carlo – This workload calculates the European Put and Call option valuation for Black-Scholes option pricing using Monte Carlo simulation. It simulates the calculations performed when a spreadsheet with input parameters is updated and must recalculate the option valuation. In this scenario we execute approximately 300,000 iterations of Monte Carlo simulation. In addition, the workload uses Excel lookup functions to compare the put price from the model with the historical market price for 50,000 rows to understand the convergence. The input file is a 70.1 MB spreadsheet.
Calculations – This workload executes approximately 28,000 sets of calculations using the most common calculations and functions found in Excel*. These include common arithmetic operations like addition, subtraction, division, rounding and square root. It also includes common statistical analysis functions such as Max, Min, Median and Average. The calculations are performed after a spreadsheet with a large dataset is updated with new values and must re-calculate many data points. The input file is a 6.2 MB spreadsheet.
Continuing along the same line, the results we see here scale pretty-well with what we’ve seen so far, at around 20% across the board, when comparing clock-to-clock. So, we know i7 is great for math, but how about the sub-systems, like memory and inter-core speed? We’ll tackle that on the next page.
Generally speaking, the faster the processor, the higher the system-wide bandwidth and the lower the latency. As is always the case, faster is better when it comes to processors, as we’ll see below. But with Core i7, the game changes up a bit.
Whereas previous memory controllers utilized a dual-channel operation, Intel threw that out the window to introduce triple-channel, which we talked a lot about at August’s IDF. Further, since Intel integrates the IMC onto the die of the new CPUs, benefits are going to be seen all-around.
Before jumping into the results, we already had an idea of what to expect, and just as we did, the results seen are nothing short of staggering.
Because we are dealing with a dual-channel vs. tri-channel comparison, it’s virtually impossible to have apples-to-apples-type results. The main issue is density, because no matter the RAM configuration you have, you can’t match the same density on both platforms. On Core 2, RAM configs were generally 2GB, 4GB or 8GB. With i7, we have 3GB, 6GB and 12GB (12GB!).
Also, we didn’t receive a proper 6GB kit prior to this article, so improvements to be seen are likely to be a bit better than what’s shown above. This is something we’ll tackle for the final review later this month. As it stands, though, the slower 3GB kit proved over 2x faster in terms of bandwidth than the faster 4GB kit on our Core 2 platform.
In terms of latency, not much has changed, even with the move to an integrated memory controller. These results are probably why when I asked about latency at IDF, I wasn’t given a direct answer. This isn’t a problem, per se, but I definitely expected to see much better results than this. Maybe there will still be a good reason to pick up “performance” kits.
How fast can one core swap data with another? It might not seem that important, but it definitely is if you are dealing with a true multi-threaded application. The faster data can be swapped around, the faster it’s going to be finished, so overall, inter-core speeds are important in every regard.
Even without looking at the data, we know that Core i7 is going to excel here, for a few different reasons. The main is the fact that this is Intel’s first native Quad-Core. Rather than have two Dual-Core dies placed beside each other, i7 was built to place four cores together, so that in itself improves things. Past that, the ultra-fast QPI bus likely also has something to do with speed increases.
As we expected, Core i7 can swap data between its cores much faster than previous processors, and also manages to cut down significantly on latency. This is another feature to thank HyperThreading for, because without it, believe it or not, the bandwidth and latencies are actually a bit worse, clock-for-clock, as we’ll see soon.
Crysis Warhead might have the ability to bring any system to its knees even with what we consider to be reasonable settings, but Call of Duty 4 manages to look great regardless of your hardware, as long as it’s reasonably current. It’s also one of the few games on the market that will actually benefit from having a multi-core processor, although Quad-Cores offer no performance gain over a Dual-Core of the same frequency.
For our testing, we use a level called The Bog. The reason is simple… it looks great, plays well and happens to be incredibly demanding on the system. It takes place at night, but there is more gunfire, explosions, smoke, specular lighting and flying corpses than you can shake an assault rifle at.
Because the game runs well on all current mid-range GPUs at reasonable graphic settings, we max out what’s available to us, which includes enabling 4xAA and 8xAF, along with choosing the highest available options for everything else.
A few weeks ago, performance reports were leaked regarding gaming on i7, and it was found that two particular titles suffered a bit here when compared to Core 2 processors. Ironically enough, those exact two titles are the same ones we’ve been using in our motherboard and processor reviews for some time, so sadly, things don’t look so good today.
As you can see, Core 2 is a better processor for gaming with CoD4, and I’m not exactly sure as to why. I do have to stress that all testing was done using 3GB of RAM, while the Core 2 machine had 4GB, so I’m afraid I can’t conclude on anything quite yet. There are a few factors that came into play during testing that lead me to believe the lack of 4GB of RAM did play a role, but I won’t discuss it until I can better test it out. I’d rather spew out fact rather than FUD if at all possible.
The original Half-Life 2 might have first seen the light of day close to four years ago, but it’s still arguably one of the greatest-looking games ever seen on the PC. Follow-up versions, including Episode One and Episode Two, do well to put the Source Engine upgrades to full use. While playing, it’s hard to believe that the game is based on a four+ year old engine, but it still looks great and runs well on almost any GPU purchased over the past few years.
Like Call of Duty 4, Half-Life 2: Episode Two runs well on modest hardware, but a recent mid-range graphics card is recommended if you wish to play at higher than 1680×1050 or would like to top out the available options, including anti-aliasing and very high texture settings.
This game benefits from both the CPU and GPU, and the skies the limit. In order to fully top out the available settings and run the highest resolution possible, you need a very fast GPU or GPUs along with a fast processor. Though the in-game options go much higher, we run our tests with 4xAA and 8xAF to allow the game to remain playable on the smaller mid-range cards.
The same performance hits are seen with HL2, which happens to be a very CPU-bound game. Again though, I can’t conclude on anything quite yet, and as it seems right now, both CoD4 and HL2 are two titles that specifically have issues on Core i7, and HyperThreading really doesn’t seem to have much to do with things.
Although we generally shun automated gaming benchmarks, we do like to run at least one to see how our GPUs scale when used in a ‘timedemo’-type scenario. Futuremark’s 3DMark Vantage is without question the best such test on the market, and it’s a joy to use, and watch. The folks at Futuremark are experts in what they do, and they really know how to push that hardware of yours to its limit.
The company first started out as MadOnion and released a GPU-benchmarking tool called XLR8R, which was soon replaced with 3DMark 99. Since that time, we’ve seen seven different versions of the software, including two major updates (3DMark 99 Max, 3DMark 2001 SE). With each new release, the graphics get better, the capabilities get better and the sudden hit of ambition to get down and dirty with overclocking comes at you fast.
Similar to a real game, 3DMark Vantage offers many configuration options, although many (including us) prefer to stick to the profiles which include Performance, High and Extreme. Depending on which one you choose, the graphic options are tweaked accordingly, as well as the resolution. As you’d expect, the better the profile, the more intensive the test.
Performance is the stock mode that most use when benchmarking, but it only uses a resolution of 1280×1024, which isn’t representative of today’s gamers. Extreme is more appropriate, as it runs at 1920×1200 and does well to push any single or multi-GPU configuration currently on the market – and will do so for some time to come.
Similar to our above games, Vantage doesn’t put i7 in a favorable light. CPU scores are high, but the GPU scores are not, despite using the same exact GPU and same exact drivers (and also exact Windows configuration). As mentioned earlier, gaming is one area we’ll be tackling a lot more this coming week, so please stay tuned as we plan on releasing an article that focuses solely on it.
Two of the major new features on Core i7 are the Quick Path Interconnect and the return of HyperThreading. Both technologies are explained in some depth on page two of this article, so to study up, I recommend checking it out.
Our goals on this page and the next are to find out just what sort of benefit can be had with a) using a higher QPI speed and b) enabling HyperThreading. To help with that, we re-ran almost every-single one of our benchmarks again, but this time with a slightly different focus, and different configurations.
We’re using four configurations in total, but each one uses the same Extreme 965 processor. The first configuration setting is all-around stock, with Turbo and HT enabled. The second is the same setting, but with a lower QPI frequency, from 6.4GT/s (3200MHz) to 4.8GTS (2400MHz).
Our final two settings are the most important. Both run with a default QPI frequency, but the first has Turbo disabled, meaning that the CPU will run at a constant 3.20GHz, no higher or lower, while the final setting has Turbo enabled, but HyperThreading disabled. That essentially cuts down the number of threads from 8 to 4, similar to last-gen processors.
Although we’ve concluded that Core i7 in general is faster than Core 2 in pretty-much every-single test we ran, the biggest increases in performance were seen with our 3D rendering applications. There, we attributed that a big reason for the huge boost was due to HyperThreading, and these next few graphs solidify that fact.
Turning HT off caused very noticeable changes… a full decrease in performance of 25%. Minor changes were seen with the Techgage Dog model, which leads us to believe that ray tracing-based tasks can take better advantage of multi-threaded processors than most anything else.
Like earlier, we once again see similar results between 3ds Max and POV-Ray. Here, we see a 29% performance boost with HT enabled. By contrast, Cinebench saw an 18.4% increase in performance. That’s far from being lackluster, but it’s not quite as impressive as the ray tracing performance.
Our video-related tests showed some nice increases as well with Core i7, but nothing near as stark as with our rendering results. Still, as it appears below, HyperThreading wasn’t really the reason here. Although increases are seen, those alone don’t make up for the general speed increases we saw earlier.
While HyperThreading can be thanked for the main performance increase with rendering, video encoding is a different story. Although I’m unsure the exact reasons we see increases here, one in particular is likely SSE4.2. When we first experimented with SSE4.1 last fall with the launch of the 45nm processors, the differences were huge, so I’d expect that the general enhancements here helped even further.
Where photo manipulation is concerned, at least with Lightroom, HyperThreading makes no difference whatsoever. Enabled or disabled, the results are going to be pretty-well exactly the same, which does strike me as a little odd.
Although it’s a multi-threaded application, I don’t believe it tackles more than one photo at a time. Rather, its algorithm is designed to take advantage of multi-core processors, but with one photo at a time. I think greater increases would be seen if the application was redesigned to tackle more than one photo at a time, rather than how its current implementation works.
When picturing a “hardcore” application in your mind, Office probably isn’t the first one to come to mind. However, as it seems on the surface, Microsoft takes multi-threading very seriously, and I think many other software developers can be taught a lesson. Math in general seems simple, but as you can see in the graph below, the application was able to take advantage of all eight threads, which is to me, quite impressive.
The differences aren’t mind-blowing, but an 18% increase in performance simply by enabling HyperThreading is rather substantial. It’s not so important with Excel, but it gives us a general idea of just how much more efficient other software applications could be, if they were truly multi-threaded.
Sandra has been a favorite of ours for quite a while, for more than one reason. One of the better reasons is because it allows us to test out a variety of tests from within one application, which is nice, especially when you want to quickly test various sub-systems on the PC. Below, we offer five different graphs from Sandra, tackling memory, math and multi-core performance.
The results are quite interesting so far. In our Arithmetic test, HyperThreading made almost no difference where the Whetstone floating-point benchmark was concerned, but it made an incredible difference with the Dhrystone result… 59%!
Similar enhancements are seen with our Multi-Media tests, with each one of the tests being substantially enhanced with HyperThreading enabled.
Earlier in the article, I pointed out that HyperThreading changes just how efficient the Cryptography test is, and here’s proof. Oddly enough, AES256 thrives on HT, but SHA256 absolutely doesn’t. Disable HT entirely, and the results swap places, with SHA outperforming AES. Luckily, the differences in the grand scheme of things are minor, but it’s interesting nonetheless.
If there is one area where neither the QPI speed (at least, within our current limits) nor HyperThreading affected the memory bandwidth and latencies whatsoever. Although I’ve only included the results from the bandwidth above, you can be rest-assured that the latencies were absolutely identical as well.
Finally, our most interesting graph might be the one seen above. With HyperThreading enabled, the inter-core performance is truly incredible… far exceeding the performance of Core 2. With it off though, things change, with the latency and bandwidth actually performing slightly worse than what we saw with the QX9770.
So as it seems, when it comes to inter-core performance, Core 2 Quad is actually a little more efficient, which could be due to a few factors, one being tighter integration of components, and less components in general. But this is one thing that QPI should increase, you’d imagine, but as seen above, adjusting from a 4.8GT/s to 6.4GT/s QPI makes virtually no difference. Luckily, these lack of efficiency doesn’t mean much, since HyperThreading is designed to be turned on all the time, as it really should be.
Due to a couple different factors, I’m not going to spend a lot of time in this article talking about overclocking, and I’ll explain all of the reasons why by the end of this page. Rumors have had it for a while that Core i7 wasn’t going to be as overclockable as Core 2, and for the most part, that’s true, but as usual, it depends on a few different things.
Like previous generations, the Extreme 965 processor has an unlocked multiplier that’s capable of hitting 40x. That in itself tells us that the skies the limit, because with a stock Base Clock and that multiplier, the result would be a 5.32GHz overclock. Something tells me that’s not going to be too common. Bump the BCLK to a reasonable 166MHz, and things get even more unlikely, with a core clock of 6.64GHz.
The same cannot be said for non-Extreme models, though. Both the 920 and 940 have capped multipliers, which are equal to their stock multiplier. For example, the 940 is a 22 x 133MHz processor, so 22x is the limit as far as multipliers go. The only way to increase the CPU clock is to increase the Base Clock, eg: 166MHz.
From that standpoint alone, Core i7 processors are much more difficult to overclock because the Base Clock is far harder to bump up than the Front-Side Bus ever was. To put it into an easier perspective, picture that a 166MHz Base Clock was possible on your particular motherboard. Using a 920 with its stock multiplier of 20x, the best possible overclock would be 3.32GHz.
That fact can be a little upsetting at a time when we are so used to seeing current-gen Quad-Cores hit 3.6GHz stable with modest voltages. On Core i7, that’s not going to happen with a budget CPU too easily, unless you manage to bump the Base Clock to a nice level and have it remain stable. Due to my current lack of overclocking experience on i7/X58, I cannot make a reasonable guess as to what the max BCLK will be for most motherboards.
To make matters a little worse though, I’m not even sure what to expect in terms of overclocking-ability on the Extreme edition, although there are a few things that are holding be back from concluding on anything. Throughout all overclocking and testing in general, Intel’s own DX58SO motherboard was used, which, if history proves itself again, other enthusiast motherboards will offer greater potential.
However, 4.0GHz wasn’t supposed to be too much of an issue on the DX58SO board, but I found that to be the furthest thing from my own experience. While I did manage to hit 4.0GHz (133×30), it in no way was stable, and most times I couldn’t even pass a single Cinebench R10 run.
That’s one issue I was speaking about, because I know for a fact that 4.0GHz is possible on this CPU and that it can run Cinebench runs over and over, so I’m led to believe either our copy of the CPU is less-than-stellar, or I need to toss it into another motherboard and spend a lot more time there.
The reasons I have doubts about my own overclocks is that I’ve been seeing 4.0GHz on this processor since earlier this year, and that was on even earlier silicon than this. In addition, during a visit with Intel this past June, they had this exact model CPU running at 4.0GHz on an air cooler, and it was running just fine.
It’s for those reasons that I know more time needs to be spent with overclocking, and hopefully my eventual outcome will be far better than my outlook right now. Likewise, I look forward to seeing what other editor’s have accomplished with their own overclocks, as that will about prove whether or not our particular CPU has issues.
It goes without saying though, that overclocking on Core i7 is a lot more complicated than overclocking on Core 2, and that’s too bad, but is understandable given the major architectural changes. Overclocking will be a subject in particular that I will be tackling more this coming week, so please keep an eye on the site as I plan to talk about it in far greater detail, and hopefully at that time, I’ll be able to report my stable overclocks as well.
Hopefully by now you have a good idea of what Core i7 brings to the table, and what it doesn’t. I think it’s safe to say that Intel successfully brought back the same spirit that came with the original Core 2 launch. That particular launch unveiled products that were substantially faster than the previous-generation, and the same thing can be said again today.
Nehalem wasn’t merely a rehashed Core 2, but rather a micro-architecture built almost from the ground up. In just two years, we went from seeing fast Dual-Cores to fast Quad-Cores and now even faster native Quad-Cores, plus we get many more perks on top of all of that. You might just have a reason to be excited.
When Core i7 first landed in our lab, the first thing I wanted to check out was Turbo, and after all my experience with it, I can say it’s probably one of the biggest new additions to Core i7, and it’s something that will actually benefit everyone. On the lowest level, if you are using any application that tops out the first core, you’re going to receive a clock speed faster than what it says on the product box, by 266MHz, or 133MHz if you are topping out entire CPU all at once.
HyperThreading is without question one of the most important new features of Core i7. It alone is responsible for massive speed increases in various scenarios, including rendering jobs. The performance gains seen in the ray tracing scenarios specifically were jaw-dropping, although the general 10% – 30% gains seen elsewhere are also going to be appreciated by any 3D artist or video guru, professional or not.
At this point, it’s difficult to see the real importance of the QPI speed, as our tests showed virtually no performance difference between 4.8GT/s and 6.4GT/s. As more models become released, we may begin to see a trend, but as it stands, there doesn’t seem to be any real boost in performance of any kind that can be attributed to just the QPI.
Aside from those new features, a few things do still leave me a bit confused, such as gaming performance and overclocking. As I mentioned on the previous page, I’ll be spending a lot more time on overclocking this coming week, so I’m sure I’ll be able to post a lot more information later this week. Still, it goes without saying that Core i7 overclocking doesn’t spoil us like Core 2 did.
Gaming-wise, things are a little more complicated, because as we found out through the two games we normally test with, Core i7 performs a little worse than Core 2. Why exactly this is the case, I’m unsure, but it appears that those two specific titles are some of the few that experience the issue. This is one thing in particular that will be tested in more depth this week as well, and we’ll be able to deliver follow-up gaming results later this week.
All of that aside though, Intel has once again further secured their spot as the CPU leader with the Core i7. With each new processor launch, we expect to see performance increases, but with i7, some of the increases are mind-blowing. For those who use 3D design tools or video-creation tools on a regular basis, Core i7 was built for you, as the performance seen there definitely blew away the predecessor.
Stay tuned to the site as we’ll be bringing you a lot more on Core i7 in the coming weeks. As always, if you have anything to say, or specific requests of things to test out, feel free to post in our thread linked to below and toss your words onto our virtual paper. As a reminder, today is not the official launch date for the new processors, but I’d expect it to happen within the next three weeks.
Have a comment you wish to make on this article? Recommendations? Criticism? Feel free to head over to our related thread and put your words to our virtual paper! There is no requirement to register in order to respond to these threads, but it sure doesn’t hurt!
Copyright © 2005-2019 Techgage Networks Inc. - All Rights Reserved.