Nehalem Part 3: The Cache Debate, LGA-1156 and the 32nm Future
by Anand Lal Shimpi on November 19, 2008 8:00 PM EST- Posted in
- CPUs
What’s Next: A Preview of Westmere and Sandy Bridge
Conroe was designed to address a deficiency in the desktop and mobile markets and Nehalem to tidy up the workstation/server space, so what’s next? Westmere and Sandy Bridge are the 32nm followons and Intel has already hinted at the major changes coming in that generation: power consumption and floating point performance.
Westmere will be little more than a die shrink to 32nm, we may get some more cache but I wouldn’t expect significant performance improvements other than from clock speeds. Westmere will take Nehalem’s power efficiency and combine it with a pure power reduction to be quite a threat.
Sandy Bridge will add support for AVX:
By the end of 2009 we should have support for both OpenCL and DirectX 11 by GPUs from all vendors, including Intel with Larrabee. These APIs in combination with the highly parallel nature of the GPUs that will be able to run them, should allow for some incredible speedups on highly data parallel applications. While most of these applications are currently limited to the scientific field, we’ll start to see them appear in the consumer space (we’re starting to already with video transcoding and Photoshop).
Not all applications are data parallel enough to run well on a GPU, but they may require more than what present day CPUs can offer in terms of floating point throughput. Intel’s AVX instructions are designed to bridge the gap between the CPU and the GPU, offering an alternative to developers who could stand the gain performance from running some of their code on a GPU but would rather keep the work on the CPU itself to make programming simpler. Developers will move their code off to the GPU if the performance is worthwhile, but if you can get similar performance gains without recoding, that’s the preferred avenue. Make sense?
Eventually I’m guessing we’ll see the Larrabee and Nehalem lines of the x86 ISA merge, AVX is merely the first step in that direction.
Final Words
Another day, another chapter on Nehalem comes to an end. I’m back from 10 days in Texas and California, visiting the usual suspects and there’s much more to write about. We’re finally getting wind of X58 motherboards at well below $300, have much more to talk about with overclocking and there’s still that issue of multi-tasking performance.
Nehalem may have launched, but our work is far from done this year. Stay tuned.
33 Comments
View All Comments
IntelUser2000 - Saturday, November 22, 2008 - link
To: ltcommanderdataActually you can't compare to Dothan. You have to compare to Conroe/Penryn. Conroe's L2 latency is at 14 cycles. I think it went up to make up for the complexity of the core(which is more than Dothan). Nehalem makes it even more complex.
The reason individual transistors can run at 200GHz+ within certain research labs but nowhere near with a commercial chip is they have to synchronize every part of the chip with the clock.
The CPU designers seem to take some chances when making a chip. Likely that's the reason for the delays for certain products as if you make a wrong decision then the prototypes might not come up as you wanted and you gotta make up for it.
That's probably the reason that Conroe didn't come with SMT as the Israeli team managing the chip wasn't experienced as the team that made the P4. They probably could have but risking it would not have been a good idea.
The Israeli team clings on proven technologies while the Hillsbro team makes up more radical ones, like Trace Cache, Out of Order, SMT, etc.
JonnyDough - Friday, November 21, 2008 - link
It should be exactly like Penryn. Die shrink = less heat = higher clocks = performance increase.ltcommanderdata - Friday, November 21, 2008 - link
The point is that Penryn was not just a dumb shrink of Conroe with added cache as Presler was of Smithfield. Penryn wasn't a major redesign, but it did have architectural tweaks over Conroe including speeding up how the execution units divide numbers and execute shuffles. The FSB was also reworked to allow half multipliers while lower power states were added in mobile versions. VT support was enhanced and of course SSE4.1 was added.I believe clock-for-clock Penryn is on average 5% faster than Conroe while the difference can be substantially higher for SSE4.1 optimized apps. When I say I hope Westmere is more like Penryn, I'm hoping for similar tweaks to be made to increase performance clock-for-clock, rather than just relying on 32nm to increase clock speeds. I don't believe Intel is releasing another SSE instruction set before AVX in Sandy Bridge, so I guess they'll have to dig deeper for a performance boost.
VaultDweller - Thursday, November 20, 2008 - link
"We’re finally getting wind of X58 motherboards at well below $300"Oh, please do share! This is what I'm interested in. Without this I would not even consider touching Nehalem with a ten foot pole.
In the past I brushed off X38 and X48 completely, as it was so hard to find reasonable motherboards based on these chipsets. X58 is shaping up to be the same.
The problem is that when I found X38 to be too expensive, I was able to find my peace with a P35 board (a P5K Premium). If I had building a system when X48 was hot off the press, I could find comfort knowing that P45 was right around the corner. There is no such comfort with Nehalem - the only lower-priced chip platform on the radar is based on a different socket, like S754 all over again.
I don't want to cripple or limit the options for my next system build by going with LGA1156, but I don't want to pay $300-450 for a motherboard either.
heavyglow - Thursday, November 20, 2008 - link
this is exactly what im thinking. im concerned that intel will abandon LGA1156 and ill be left with nothing.3DoubleD - Thursday, November 20, 2008 - link
I can think of the reverse scenario where AMD abandoned the 940 platform and released all FX processors on 939. Neither option is safe, just pick one you don't mind sticking with if you have to.Kiijibari - Thursday, November 20, 2008 - link
It's so small because Nehalem is a 100% Server design.Because of this Intel went ahead with the inclusive cache design. It comes in quite handy in MP systems, if you just have to probe one L3 only instead of 4 L1/L2 caches.
But there is one drawback, bigger L2 kills the benefit of the L3 size.
Neglecting the L1 Caches, Nehalem has an effective L3 size of 7 MB, as 4x256kb are just copied data from the L2.
Now imagine what would happen if intel would double the L2. Effective L3 cache size would have shrunk to 6MB, 2 MB waste .. that a lot of transistors.
To make L2 problems worse, Intel reintroduced Hyperthreading. Great technique, no doubt, but now we even have 2 threads struggling for the tiny, little 256kb cache.
I guess all the decisions pay off in a server environment, but to state that intel designed the small size L2 Caches because of the latency only is just a fine excuse for all the wanna-be gamers, who once heard that CL3 memory is better than CL5.
cheers
Kiiji
plonk420 - Thursday, November 20, 2008 - link
If 8core i7s will work on x58, i'll likely bite sooner rather than doing a "wait and see."does this seem highly likely? or is it anyone's guess?
Casper42 - Thursday, November 20, 2008 - link
Speaking of which, I ran across this today on accident:http://www.ecs.com.tw/ECSWebSite/Downloads/Product...">http://www.ecs.com.tw/ECSWebSite/Downlo...ilName=M...
The ECS X58B-A
Contains:
6 DDR3 Slots
2 x16 Slots
1 x4 Slot
2 x1 Slots
1 PCI Slot
The Manual makes mention of SLI as well which was surprising to me.
I can see that a machine with this ECS Board, a 920 proc and 2 x 9800GTX+ cards (Currently going for around $150 each) and you could have a pretty potent little machine for around $1000
iwodo - Thursday, November 20, 2008 - link
So we wont see new Mobile Part till 2010 ?That doesn't sound right to me at all. If that is the case then the rumours about it being a 32nm part may be right.
However, the idea Intel not updating their Mobile Part for 18 months doesn't sound right to me at all.