Intel’s 32nm Clarkdale – Nehalem for Everyone

by Rob Williams on January 3, 2010 in Processors

To help kick 2010 off right, Intel has filled out the rest of its current-gen processor line-up with the help of Westmere. We’re taking a look at the desktop variant here, which brings a lot to the table compared to the previous generation. For those who’ve been holding out for that next affordable PC upgrade, the wait has been worth it.

System: Sandra Memory, Multi-Core Efficiency

Generally speaking, the faster the processor, the higher the system-wide bandwidth and the lower the latency. As is always the case, faster is better when it comes to processors, as we’ll see below. But with Core i7, the game changes up a bit.

Whereas previous memory controllers utilized a dual-channel operation, Intel threw that out the window to introduce triple-channel, which we talked a lot about at August’s IDF. Further, since Intel integrates the IMC onto the die of the new CPUs, benefits are going to be seen all-around.

Before jumping into the results, we already had an idea of what to expect, and just as we did, the results seen are nothing short of staggering.

As I mentioned on the front page of this article, the integrated memory controller on Clarkdale is not on the CPU, but rather on the GPU. That means that whenever the CPU needs to access memory, it needs to go to another chip to do it. Simple thinking would tell you that this method shouldn’t prove much slower than the design of previous generations, where the IMC was on the chipset, but that’s not the case.

For whatever reason, with the IMC on the GPU, we see our latencies greatly increased – to the point where the brand-new Clarkdale has the same latency as the low-end Pentium E5200. From the cache performance standpoint, though, nothing at all is lacking, and that feature alone is more important than memory latency.

I queried Intel to see if it considered the higher latency to be a problem, and the response was:

The memory latency on Clarkdale will be higher than what was seen on Lynnfield due to the memory controller being on the GMCH (Ironlake) and not directly on the CPU. However, we don’t see this memory latency as a significant impact on most client applications + the memory BW that you see vs. a Pentium E5200 (Wolfdale).

In the past, I’ve done a lot of personal testing with both high bandwidth and low latency modules, and from what I saw with that, I’d have to side with Intel’s thinking. I don’t believe that higher memory latencies aren’t going to effect certain workloads, because they certainly will, but for the majority of people, no real difference will ever be seen. In an article I published last fall, which took a look at memory performance on Core i7, only a single application of all we tested saw a difference with tighter timings (Adobe Lightroom). Clarkdale = Mainstream, so these higher latencies are likely to be of very low concern for most people, and for good reason.

That’s not to say that I wouldn’t like to see even tighter numbers, though, because I certainly would. If the IMC was built into the CPU, we wouldn’t see this issue. Essentially, Clarkdale’s implementation is similar to previous integrated designs, where the GPU is part of the Northbridge. There, the IMC is also built into the GPU, because for a platform like this, high performance just isn’t needed. When the time comes where we see the GPU and CPU fused together (whenever that happens), we’ll undoubtedly see the latencies decrease. But for now, I can’t view this as a real problem, given that we’ve yet to see throttled performance in our real-world tests due to it.

Sandra 2009 Multi-Core Efficiency

How fast can one core swap data with another? It might not seem that important, but it definitely is if you are dealing with a true multi-threaded application. The faster data can be swapped around, the faster it’s going to be finished, so overall, inter-core speeds are important in every regard.

Even without looking at the data, we know that Core i7 is going to excel here, for a few different reasons. The main is the fact that this is Intel’s first native Quad-Core. Rather than have two Dual-Core dies placed beside each other, i7 was built to place four cores together, so that in itself improves things. Past that, the ultra-fast QPI bus likely also has something to do with speed increases.

Continuing the domination trend, the Core i5-661 proves superior compared to its lower competition where both cache bandwidth and latencies are concerned. For Nehalem-based processors, there’s no competition in this regard.

Rob Williams

Rob founded Techgage in 2005 to be an 'Advocate of the consumer', focusing on fair reviews and keeping people apprised of news in the tech world. Catering to both enthusiasts and businesses alike; from desktop gaming to professional workstations, and all the supporting software.

twitter icon facebook icon googleplus icon instagram icon