With Intel’s first ‘Knights Corner’ products expected to hit the channel in late 2012, NVIDIA’s CTO Steve Scott has taken the opportunity to call Intel out on some of its claims. The biggest one being targeted is that code won’t have to be “ported” to operate on Knights Corner, based on the fact that it has an x86 design. Claims like these have long been bounced around by both parties, and this isn’t the first time we’ve seen one company has publicly doubt the other.
Knights Corner will be the first product to release based on some of Intel’s various projects throughout the years, including Larrabee. While it’d be possible for Intel to release KC as an add-in PCIe card, early samples have been shown in a typical CPU package. If KC and Tesla performed 1:1, NVIDIA would have the clear advantage as it is much more cost-efficient to install 4 – 6 GPUs into a motherboard than it is to install 4 – 6 CPUs.
Steve Scott admits that code may compile just fine for Knights Corner, although he insists that won’t be ideal. Much like code that needs to be “ported” for CUDA, tweaks will need to be made, and further optimizations that are made available by the architecture. Essentially, if you’re simply porting over code, you’re not going to be exploiting all of what’s being offered to you. He goes on to state that based on what he knows about KC, he’s not even confident it’ll perform well even with optimized code:
“The idea of running flat MPI code (one rank per core) on a multi-node MIC system seems quite problematic. Whatever memory sits on the MIC PCIe card will be shared by more than 50 cores, leading to very small memory per core. From what I know of the MPI communication stack, that won’t leave much memory for the actual data – certainly far below the traditional 1-2 GB/core most HPC apps want. And 50+ cores all trying to send messages through the system interconnect NIC seems like a recipe for a network logjam. The other concern is the Amdahl’s Law bottleneck resulting from executing all the per-rank serial code on a lower-performance, Pentium-class scalar core.“
These concerns are valid, but the article assumes that Knights Corner is going to be released as PCIe, when all resources I’ve found point to it being a CPU. As a CPU, it can be built using traditional designs where one large pooled cache is present, and each core has access to its own (possibly also shareable) cache. The real challenge will be achieving cache coherency with all of those cores, an area that NVIDIA is more familiar with than Intel.
It’s a waiting game at this point. We’ll no doubt learn a lot more about Knights Corner at IDF this fall.