Date: March 17, 2008 - Author: Rob Williams
At a press briefing, much more was revealed about the numerous upcoming technologies from Intel. In this article, we will be taking a look at Nehalem, Dunnington, Tukwila and Larrabee, along with a look at the new QuickPath Interconnect.
During an Intel press briefing Monday morning, Sr. Vice President and General Manager of the Digital Enterprises Group Pat Gelsinger discussed many products that will be discussed even further at the upcoming Intel Developer Forum in Beijing. Though most of the important information was revealed during this briefing, even more info may be unearthed at the IDF.
As the title of this article suggests, not too much was left untouched. Pat discussed products that affects almost all markets, minus mobile, including Dunnington, the six-core server-bound processor, Larrabee, the discrete graphics processor and also Nehalem, the chip that many who are reading this article are looking forward to. In this brief article, we'll touch on all of these and also delve into a few of the technologies that are supported by these upcoming processors.
We have been hearing so much about Nehalem since last spring, that it was great to finally learn more about it now. Given the fact that the new architecture is due to hit late this year, more information couldn't have come at a better time. As we've known for a while, Nehalem is a brand-new architecture that will be completely modular, in that many different configurations can be built.
Nehalem will be pushing Quad-Core harder than ever, and the initial offerings look to offer that configuration exclusively. Included in each Nehalem processor will be L3 Cache, which will be shared among all of the cores, and also an IMC and QPI (QuickPath Interconnect). Once the Octal-Core chip is offered, it will include two QPI's to improve speed between both processors - a process that should be faster than ever.
As it appears, the only limit to Nehalem is what could be fit into the size of the processor. Though it's unlikely to be offered directly at launch, integrated graphics can also be added in, as long as there is enough die space to support it. The above figure is not an accurate representation of the die-size and module sizes, so even though the Quad-Core doesn't look like it would support integrated graphics, it will. This is one of the biggest features of Nehalem, after all. However, "iGraphics" may not be available directly at launch.
Aside from what's been mentioned so far, Nehalem will also introduce new uArch enhancements, such as increased parallelism, faster "unaligned" cache accesses, a second level TLB hierarchy and a second level branch predictor.
The increased parallelism was achieved by increasing the size of the out-of-order window to allow increased efficiency, while increasing the buffer sizes of the cores to assure that they would not become a bottleneck.
Multi-Threading is also making a comeback, but this time, it should prove more adequate than the previous generation. Each core will be able to execute two threads at once, enabling a total of eight on a Quad-Core and sixteen on an Octal-Core. With such processing, bottlenecks can occur easily and cause application lag, but thanks to other architecture improvements, such as much higher memory bandwidth and lower latencies, only improvements will be seen over previous generations.
Depending on the workload, performance increases of 20 - 30% could be achieved with the help of the effective multi-threading, although specific scenarios were not supplied. Although the power envelope increases with the multi-threading counterpart, the increases should outweigh the higher power draw, hopefully.
One of the largest benefits of Nehalem will be the integrated memory controller (IMC) which will support DDR3 exclusively. It's unknown at this point if motherboard manufacturers will be able to opt-in to include DDR2 support on their boards, but it may not be entirely necessary. By the time Nehalem hits the market, DDR3 prices should have gone down substantially and should only continue to plummet as DDR3 adoption will be increasingly enforced, thanks in part to this launch.
On the desktop side of things, Nehalem desktops will be offered as both a single-socket and dual-socket configuration, with three memory channels per processor. That would essentially allow up to six DIMMs on a single processor and twelve in a dual processor configuration. Without a doubt, no one will need to go hungry for more memory.
Nehalem's "Tock" counterpart, the 32nm Sandy Bridge, will be available in late 2009 or sometime during 2010.
QuickPath Interconnect - The Tech Formally Known As CSI
Beginning with Nehalem and Tukwila, the common Front-Side Bus will be replaced with QuickPath Interconnect, a feature built-into the processor that integrates the memory controller and connects the CPU/s with other components via a high-speed interconnect.
One of the main benefits of QPI is the fact that it's integrated right onto the processor itself, and because the IMC is as well, it allows for much faster transactions. This is required due to the fact that with the improved performance on these chips, bottlenecks could occur, but are less likely to show face with this configuration.
In dual processor situations, each processor will have its own dedicated memory and caches, and because each CPU will include an IMC, memory bandwidth should be increased dramatically - Intel claims up to 4x what we currently see. If for some reason one processor needs to steal memory from the other processor, it can do so at very fast speeds through the QPI.
There are a few main points to take away from QPI. First is the fact that it's much more efficient than the typical FSB, and given the technical aspects, the increases should be huge. With the IMC and QPI in the processor, the interconnect lanes will be incredibly fast, improving bandwidth all around, while reducing latency.
Without a doubt, QPI isn't something that should be taken lightly. It should dramatically increase performance all around, and I cannot wait to test out the performance benefits first-hand. The QPI alone might be one of the biggest things about Nehalem to get excited over.
Dunnington will be the last major upgrade for the Xeon family before we jump into 32nm. The processor will be offered in one configuration that includes six cores (Hexal-Core? Sexal-Core?) and be built using 1.9 billion transistors. Interestingly, unlike the current Xeon line-up, Dunnington will be toning down the amount of L2 Cache and is instead offering 16MB of L3 Cache.
As seen in the figure below, the processor seems to be a native six-core offering, dissimilar to the current Quad-Core offerings which are essentially two Dual-Core dies slapped beside each other (Nehalem will be native Quad-Core). Despite the seemingly larger die size, Dunnington will not include an integrated memory controller, like Nehalem.
One might ask, "Why six?" and it's a good question. The obvious move would have been to offer an eight-core chip. As we learned at September's IDF, Octal-Core desktop and server chips are in the works, so where a six-core chip came from is interesting. Pat elaborated on this, and explained that Intel tested out different configurations and found this to be the sweet spot. He touched on the fact that some applications may benefit more from eight cores and less cache, and others may benefit from less cores and more cache, but Dunnington hits the sweet spot by settling right in the middle.
Also touched on is the fact that Intel's Xeon line-up offers the most efficient processors on the market, in terms of performance/watt. SPECpower is quickly becoming a standard industry benchmark, as it values these exact figures. Intel boasted that on the official top list for the benchmark, they hold the top ten spots. It's interesting to note, however, that no AMD processor is listed in the entire list of 22 system configurations.
Itanium is not a product that any of our regular visitors would be interested in, primarily because it's not a desktop chip, or a normal server chip for that matter. Itanium is designed not to offer great performance alone, but superb performance when stacked together. These chips live their lives in large mission-critical servers and can also reside in super-computers.
Up to this point, Itanium 2 processors have been available in Dual-Core offerings only, but the near-future will bring the first Quad-Core offerings. Specific models and frequencies were not mentioned, but these processors with include over 2 billion transistors and will offer up to 30MB of L3 Cache. In addition, it will also utilize Intel's revamped Multi-Threading technology, along with the QuickPath Interconnect.
As is the case most of the time with our desktop processors, simply doubling the amount of cores never doubles the actual performance. Rather, it falls to around a 90 - 95% increase. With the architecture upgrades with Tukwila, however, Intel believes that over a 2x increase will be seen over the current Itanium 9100 series.
Sure, the title there is one of the lamest I've ever come up with, but luckily, the technology doesn't seem so lame. Larrabee will be a completely scalable GPU that can contain many different cores and implement a new cache architecture and can utilize a new vector processing unit.
Not too much was revealed about Larrabee overall, but Pat made it perfectly clear that it would not be developed as a competitor to ATI's or NVIDIA's high-end offerings. As an integrated solution, Pat explained that there is simply not enough surface-area to support such a powerful beast, not to mention that it would be impossible to develop one to compete with those cards and retain a reasonable power envelope. This assumption seems fair, as current high-end GPUs can consume as much as 150W.
What it will compete with, however, would be the low-end to mid-range GPU offerings. However, because Larrabee is scalable to include many cores, the end result will be one with other features in mind - not only a GPU.
In addition to regular 3D graphics use, Larrabee can be used to handle audio and video processing as well, which in addition to the SSE instruction sets, could be utilized to increase performance dramatically. How well the architecture will be put to good use, however, is yet to be seen.
Although likely unrelated to their September's acquisition of Havok, Larrabee will also be able to handle physics processes, and given the amount of cores available to a Larrabee chip, the potential is enormous. Other benefits would include life-like rendering and global illumination, improved ray-tracing, superb AI... the possibilities are great.
So while Larrabee will not be a "killer" GPU in itself when compared to high-end offerings, it can do much more than spit out spiffy graphics, which can be put to extremely good use when combined with your primary graphics.
Intel Software - Helping to Get The Show On the Road
The last thing Pat touched on was new and improved Intel development software to both aide and encourage developers to support all of their current and upcoming technologies.
When questioned as to how difficult it would be to develop for Larrabee and the IA++, Pat gave a general answer that I didn't quite catch, but it seems that if you are programmer of any sort, developing around these new technologies should be similar to learning how to develop around any new library or programming language. Before ceasing discussion though, Pat mentioned that Larrabee would be fully compatible with OpenGL and DirectX, so interoperability should be rather straight-forward.
There's not much to be said that hasn't already been said, but the future is certainly going to be interesting. Every product discussed here is exciting in it's own right, but Nehalem will be the chip that many people will be looking forward to by years-end, although things are sure to kick off on a slow foot. By mid-next year, Nehalem should be in full-swing and will be in many enthusiast's rigs.
Larrabee is also something to pay attention to, because the possibilities are great. If game developers and software developers alike take full-advantage of what the technology offers, we could be seeing some incredible things. Looking back at the 45nm launch, I remember how much the SSE4 instruction set impressed me. It increased encoding times by more than 50%, but that's an instruction set for one purpose. Imagine a full chip that can be manipulated in many different ways.
Although we will not be at IDF in early April, we will continue to report on anything newsworthy that comes out of the show. Things should really become interesting at this fall's IDF in August, where demos of all the products discussed here should be.
If you have a comment you wish to make on this article, feel free to head on into our forums! There is no need to register in order to reply to such threads.
Copyright © 2005-2009 Techgage Networks Inc. - All Rights Reserved.