Intel released the Nehalem server CPUs in March of this year. The desktop chips based around the Core i7 architecture were released late last year. So now that the CPUs have been out for a few months we’re starting to see them gain a lot of traction in the data center. No surprise there. New servers being sold with the latest Intel CPUs? Insanity! A lot of people have seen some of the differences. For example, my counterpart Brian recently wrote a post on the new way memory is configured and accessed. But what else is new and different? Is this a good direction or are we going down a wrong path?
Forever the Intel architecture has used the Front Side Bus (FSB) to access the North Bridge chip, and therefore RAM and sometimes video, as well as communicate between CPUs. The Nehalem does away with that and in place of the FSB uses the new Quick Patch Interconnect. It’s basically a point-to-point communication channel similar to what AMD has been doing for a little while now with HyperTransport. Each CPU has its own QPI to the system memory as well as another dedicated QPI to another processor in a dual-socket system. The point here is speed. Fast memory access. Fast communication between CPUs. The first Nehalem systems have a QPI speed of 25.6GB/s. That is double the speed of the 1,600MHz FSB on the previous X48 chipset. Again, speed!
As Brian mentioned in his post memory access and configuration has changed with the Nehalem systems. With the new architecture the memory controller has moved on to the CPU, again similar to AMD’s Opteron. You’d think with AMD being ahead on these features they would have had a better advantage… Anyway, by moving the memory controller from the North Bridge chip to the microprocessor you reduce latency. The downside is that now the memory controller is part of the CPU! Therefore, to change to a new memory technology requires a new CPU. You can’t just pop an existing Nehalem CPU on to a new board and get a new memory controller like in the past. Again, as Brian mentioned the Nehalem systems also have three memory channels instead of the traditional two. What this means is that you need to be installing DIMMs in sets of 3. If you do them in an even number the system reverts back to dual-channel mode and you lose a lot of memory performance. Is it a lot? I’ve seen numbers in the 70% range. Yes, that’s a lot, but that’s only on really, really memory intensive apps. For the majority of apps you may see better performance by putting in more memory even if that means going to dual-channel mode.
Along with those major architectural changes there are other minor, but important, additions as well.
Wait..what? We’ve seen this before. Many people will remember the original Pentium 4 CPUs had HyperThreading. It was….at best…a mixed bag. Some threaded apps were faster but some were, oddly, slower. For those that may not have used it in the past, HyperThreading is a technology that lets a single core execute on two threads at once. It’s not like getting two cores. Many elements in the core are shared and can only work on one thread at a time. Intel claimed about a 30% increase in performance on the original P4. The key is how the operating system and apps handle it. The new implementation of HyperThreading in the Nehalem is more advanced and sophisticated than the previous version plus the Nehalem brings a whole new story, as well. All modern operating systems support HT and should therefore know how to handle it.
A lot of PC enthusiasts have been overclocking their CPUs for years. The first time I did it I had to swap physical oscillator crystals on the motherboard. Now it’s about getting the right timings, voltage, and equipment. I just sold my last Windows system which was an Intel Q6600 2.4GHz QuadCore running at 3.3GHz. There is usually headroom in the chips to be found. Intel is now offering ways to use a bit of that room, but within well defined parameters. This technology is called Turbo-Boost, how very Fast & Furious.
The idea here is that, if possible, the Nehalem CPU can boost one or more cores in speed as long as it can stay within its TDP (Thermal Design Power). As long as it doesn’t go outside of its spec’d power draw or cooling it will increase speed on the cores. It does this in small steps, 133MHz per step. On the current chips this ranges from a maximum of one step up to four steps. Each CPU model has a certain number of “available bins”. Each bin is a step. For example, I just mentioned I sold my last Vista machine. I’m planning on replacing it with a Mac Pro (if I can get over the price). These are workstation class machines that use the Nehalem server chips. The chip in the single-socket Mac Pro is a W3520, 2.66GHz Nehalem Quad. Looking at the spec sheet here you can see that this chip has available bins of 1/1/1/2. That list corresponds to core numbers of 4, 3, 2, 1. So if possible it will go up one step if all 4 cores are busy, or 3, or 2. If only one core is busy it will jump that core two steps. Note that the CPU will only take the cores up steps if it can stay within its TDP. If the CPU is running too hot already it won’t move at all. Two steps is only 266MHz so on my example Mac Pro that’s only a 10% increase on a single core. If I went crazy and got the dual-socket Mac Pro it would be better. If you look you’ll find that there are different Nehalem server chips for single-socket systems and dual-socket systems. The dual-socket chips have two Quick Path Interconnects (one for memory, one for the other CPU) and a lower TDP at 95w compared to that of a single-socket chip at 135w. This means they run cooler and have more room to play. For example, the 2.93GHz dual-socket chip has 2/2/3/3 available bins so it can run two cores 400MHz faster or all four at 266MHz faster.
Turbo-Boost is an interesting concept. It allows the CPU to shift performance to where it is needed without causing other power and cooling problems. 10% to 15% may not sound like a lot but if you can get that for free without putting any more strain on your power and cooling, why not? On the workstation side many applications are still single threaded so you’ll see a performance increase as the CPU increases the clock speed on that core. You know, that single-socket Mac Pro is nice but a dual-socket with the 90w chips would be nicer… Just email me for my PayPal address for donations! Worth a shot….
Like all technology the Nehalem chips will continue to evolve as the market evolves. 6-core CPUs are expected before too much longer. The mobile versions of the chips will be here late this year, though it appears they will be clocked down a bit. Intel is also planning to shrink the die size again from the current 45nm to the smaller 32nm. A smaller die means less power which means less heat…which usually turns in to faster clock speed offerings.
While I was at Cisco Live I attended a session where Intel discussed the new Nehalem architecture and the benefits. What I got out of that was not to look at the Nehalem as just another iteration of CPU. The performance these chips bring is amazing due to the architectural changes they made. If you still have single or dual-core systems in your data center you owe it to yourself to do a ROI on moving to these newer processors. An 8-core Nehalem isn’t just 8x faster than a single-core older Xeon, it’s much faster than that. You can consolidate 8 to 12 older systems to these and save GREATLY on power and cooling. An older single-socket system uses a lot more power than you realize so sit down and do those calculations! Intel was showing ROIs in some data centers of 6 months. Now, that’s probably a bit optimistic but if you are running out of space, power, or cooling right now it may be very realistic. New data center space isn’t cheap and neither is an upgraded AC system.