The Nehalem Preview: Intel Does It Again
by Anand Lal Shimpi on June 5, 2008 12:05 AM EST- Posted in
- CPUs
Two years ago in Taiwan at Computex 2006 Gary Key and I stayed up all night benchmarking the Core 2 Extreme X6800, the first Core micro-architecture (Conroe core) CPU we had laid our hands on. While Intel retroactively applied its tick-tock model to previous CPU generations, it was the Core micro-architecture and the Core 2 Duo in particular that kicked it all off.
At the end of last year we saw the first update to Core, the first post-Conroe "tick" if you will: Penryn. Penryn proved to be a nice upgrade to Conroe, reducing power consumption even further and giving a slight boost to performance. What Penryn didn't do however was shake the world the way Conroe did upon its launch in 2006.
After every tick however, comes a tock. While Penryn was a die shrink of an existing architecture, Nehalem is a brand new architecture built on the same 45nm process as Penryn. It's sort of a big deal, being the first tock after the incredibly successful Core 2 launch.
731M transistors, four cores, eight threads
It's like clockwork with Intel; around six months before the release of a new processor, it's sent over to Intel's partners so they may begin developing motherboards for the chip. It was true with Northwood, Prescott, Conroe, Penryn and now Nehalem. And plus, did you really expect, on the eve of the two year anniversary of our first Core 2 preview, a trip to Taiwan for Computex without benchmarks of Nehalem? In the words of Balki Bartokomous, don't be ridiculous :)
Yep, that's what you think it is
Without Intel's approval, supervision, blessing or even desire - we went ahead and snagged us a Nehalem (actually, two) and spent some time with them.
(Sorry guys, stop making interesting chips and we'll stop trying to get an early look at them :)...)
108 Comments
View All Comments
Anand Lal Shimpi - Thursday, June 5, 2008 - link
I really wish I could've turned off hyperthreading :)DivX doesn't scale well beyond 4 threads so that's the best benchmark I could run to look at how Nehalem performs when you keep clock speeds and number of threads capped. With a 28% improvement that's at the upper end of what we should expect from Nehalem on average.
Take care,
Anand
SiliconDoc - Monday, July 28, 2008 - link
Great answer, expalins it to a tee....However that leaves myself and I'd bet most of the fans here with not much real world use for 4x4HT ...
I don't know should we all steal rented DVD's... by re-encoding - only use I know of that might work for the non-connected enduser.
Not like "folding" is all the rage, they would have to pay me to do their work - especially with all the "power savings" hullaballoo going on in tech.
That's great, 28% increase, ok...
So I want it in a 2 core or a single core HT... since that runs everything I do outside the University.
lol
I guess the all core useage all the time, will hit sometime....
Calin - Thursday, June 5, 2008 - link
First of all, the Hyperthreading in the Pentium 4 line brought at most a 20% or so performance advantage, with a -5% or so at worst. I don't have many reasons just now to think this new Hyperthreading would be vastly different.As for the scaling to 8 cores, maybe the scaling was limited due to other issues (latency, interprocessor communication, cache coherency)? It might be possible that DivX on this new platform to increase performance from 4 to 8 cores?
bcronce - Thursday, June 5, 2008 - link
Intel claims that the new HT is improved and gives 10%-100% increase. The main issue with the P4 is that it had a double pumped alu that could process 2 integers per clock. This was great for HT since you could do 2 instructions per clock. The problem came with competition for the FPU, which there was only 1. This would cause the 2nd thread in the logical cpu to stall and thread swapping has additional overhead.You also run into the issue of L2 cashe thrashing. If you have 2 threads trying to monopolize the FPU and also loading large datasets into cache, you're cache misses go up while each thread is bottlenecked at the fpu.
techkyle - Thursday, June 5, 2008 - link
I'd like to know what AMD fans are thinking. As one myself, I'm starting to wonder if I'm going to give in and become an Intel fan.Intel implementing the IMC:
I can only say two things. One, it's about time. Two, THIEF!
Return of Hyper Threading:
It seems to me that some sort of intelligence must go in to the design of multi-core hyper threading. If two intensive tasks are given to the processor, the fastest solution would be simple, devote one core to one thread, a second core to the other. With Hyper Threading back with a multi-core twist, what's stopping one thread the first core, first virtual and the second thread on the first core, second virtual?
Another nail in the coffin:
AMD can provide no competition to high end Core 2 Quad machines. Even if the K10 line can jump up to par performance with the Core architecture, can they really expect to have a Nehalem competitor ready any time remotely close to Intel's launch? AMD can't afford to keep playing the power efficient and price/performance game.
AMD is going to be in an even worse position than when it was X2 vs Core 2 if they can't pull something out of their sleeves. Barcelona already isn't clock for clock competitive with the Penryn and now we hear that early Nehalems are 20-40% above Core 2?
If AMD's next processor flops, is it possible for them to drop desktop and server processors and still be a functioning (not to forget profitable) company? It's no longer a race for the performance crown. It's becoming a race to simply survive.
bcronce - Thursday, June 5, 2008 - link
"Return of Hyper Threading:It seems to me that some sort of intelligence must go in to the design of multi-core hyper threading. If two intensive tasks are given to the processor, the fastest solution would be simple, devote one core to one thread, a second core to the other. With Hyper Threading back with a multi-core twist, what's stopping one thread the first core, first virtual and the second thread on the first core, second virtual? "
The way Windows lists cpu is first physicals, then logicals. So in task manager the first 4, on a quad core w/ HT, will be your physical cpu and the last 4 will be you logical.
Windows, by default, will put threads on cpu 1-4 first. It will move threads around to different CPUs if it feels that one is under-taxed and another is way over-taxed.
Programmers can also force Windows to use differnt cores for each thread. So, a program can tell Windows to lock all threads to the first 4 cpus, which will keep them off of the logical. You could then allow a thread that manages the worker threads to run on the logical cpus. You would then be keeping all your hard data-crunching threads from competing with themselves and let the UI/etc threads take advantage of HT.
Spoelie - Thursday, June 5, 2008 - link
Supposedly, Shanghai (the 45nm iteration of Phenom) will be around 20% faster clock per clock over Phenom. This is what AMD said itself some time ago and not verified by an independent source. Judging by current benchmarks, this would put Shanghai at the same or slightly higher performance level of Penryn.As such, a very crude estimate is that Shanghai should be as competitive to Nehalem as K8 is to Conroe. Not a very rosy outlook so let's hope this early information is not accurate and AMD can pull something more out of its hat.
BTW, last I heard is that Bulldozer will come at the 32nm node at the earliest, since the design is supposedly too complex for 45nm. So no instant relieve from that corner. AMD will be fighting a harsh battle the coming years.
Calin - Thursday, June 5, 2008 - link
"Two, THIEF"AMD's vector processing is 3DNow!, if I remember correctly. Yet, the Intel's versions of it is are touted on its processors instead (SSE2, SSE3). Now who's the thief?
swaaye - Thursday, June 5, 2008 - link
MMX? Or, MDMX? Who copied who? Nobody, really. SIMD has been around forever.Retratserif - Thursday, June 5, 2008 - link
I really would not try and think of it as a Fan base. A majority of the OC'er and Benchmarkers use what works for what they are doing. I have owned and water cooled AMD CPU's. It was great at the time.Once Conroes came out the door swung the other way. Technology is like the ocean, it comes in waves. All waves die out. Fortunately we as the user are living it up because of Intel's success. At the same time I truly hope that AMD/ATI does something in response to the high power cpu's. If not we get what ever intel wants to give us.
There will always be to sides to each story. Since Intel is on top and unscathed, they have time to perfect chips before they go mainstream. Same way we have seen the delay in Yorkfields. There was something seriously wrong, and they had time to address it before it was in the hands of thousands of users.
Ok, I can say I am a fan.... of what ever works the best for what I do. Price/Performance/Practicallity. You take what you can afford and make it work harder for every penny you put in it.
One thing you have to keep in mind. AMD is selling more budget CPU's and integrated/onboard video PC's to large companies like Dell and HP. They are moving more aggressively into typical home PC and mobile use. Intel just does not do very well there atm. With Ati in the pocket and being pretty green on power consumption, you can get a good mobile AMD that will do everything a typical PC user will ever need for 2 years at a good price.