Final Words
If AMD's Socket-AM2 only offers a minimal performance increase, then why on Earth is AMD moving to it?
AMD has done a tremendous job of making DDR-400 last with their architecture. When Intel first talked about moving to DDR2 there was concern that AMD's delayed move to the new memory technology would result in it being behind the curve, but the absolute opposite held true; Intel showed no benefit from DDR2 initially and AMD did just fine with only DDR-400.
However times are changing, and after a very long hiatus Intel will soon resume increases in FSB frequency, not to mention that their new Core architecture is considerably more data hungry than anything we've seen to date. So on the Intel side of the fence, the greater bandwidth offered by DDR2 will finally have a real use. With Intel DDR2 demand increasing and more manufacturing shifting away from DDR, it now makes sense for AMD to jump on the DDR2 bandwagon as well. If AMD does it early enough, the transition to DDR2 will be complete before any of its products desperately need it, which is always a better route.
It's not the most convincing reason to switch to DDR2 today, but AMD has stayed on DDR1 far longer than anyone expected and it's better to be early than never. The fact of the matter is that CPUs will get more cores, reach higher clock speeds and feature more data-hungry architectural changes, all of which require more memory bandwidth. AMD's options are to either add more memory bus pins to the already staggering 939-pin package, or to embrace a higher bandwidth (and lower voltage) memory standard; the option it chose makes a lot of sense.
There's also this issue of efficiency; based on our ScienceMark results, AMD was able to build an extremely efficient DDR-400 memory controller into their processors. The Rev E processors are able to deliver over 5GB/s of memory bandwidth, which is extremely close to the 6.4GB/s theoretical maximum offered by a 128-bit DDR-400 memory interface. The Rev F AM2 processors we've tested aren't able to break 7GB/s yet, which albeit an increase of 35% over the best Socket-939 numbers we've seen, still ends up being only 53% of the peak bandwidth offered by a 128-bit DDR2-800 memory controller compared to the almost 80% we saw on the Rev E.
If we use history as our predictor of the future, it may take a few more revisions of AM2 before we see that sort of efficiency, if we ever do. AMD has come a very long way since the performance we saw back in January, and if that's any indication we may just end up seeing better performance out of Rev G and H processors in the future. The verdict is also not out on Rev F; although the launch is only two months away, we keep on hearing that availability won't be until July. While that's not enough time for AMD to be making major changes to the silicon, it is quite possible that the changes have already been made and they're just waiting to get new chips back from the fab.
Based on what we saw with the Rev E cores and DDR-500, coupled with our results here with DDR2-800, it looks like Socket-AM2 will offer minor performance gains across the board if paired with very low latency DDR2-800, but otherwise it looks like it'll offer performance as good as Socket-939. If you're looking for numbers, with DDR2-800 at 3-3-3 we'd expect to see 2 - 7% gains across the board, with the 7% figure being reserved for applications like Quake 4 or DivX and the 2% figure being far more common.
Why would you move to Socket-AM2? If you're well invested in an up-to-date Socket-939 system, and if these numbers we've seen here today hold true for shipping AM2 platforms, then there's no reason to upgrade immediately. However, if you're buying or building a brand new system, then by all means AM2 makes a lot more sense than Socket-939. Like it or not, DDR2 is the future, and AM2 will be the new socket for AMD's future 65nm parts as well. DDR2 is also competitively priced with DDR memory while generally offering higher bandwidths, and with most manufacturers transitioning to DDR2 now we expect to see further DDR2 price cuts.
With AM2 you are investing in memory that will have a longer lifespan and a motherboard that will have a better upgrade path than Socket-939 today. The only other advantage other than a more secure upgrade path that AM2 offers is AMD's upcoming Energy Efficient desktop CPUs. We're particularly intrigued by the 35W Athlon 64 X2 3800+; if you thought AMD's processors were cool and quiet, a 35W X2 should blow you away. (It might overclock really nicely as well!)
The disheartening news for AMD and its fans alike is that if AM2 can't offer significant performance increases over what we have now, then all Intel has to do is execute Conroe on schedule, delivering the performance we've been promised and 2006 will be painted blue. AMD has been telling us that 2007 is the year we'll see major architectural changes to their processors, so AM2 may very well be as good as it gets for now. That's still very good, of course - the fastest X2 chips still outperform the fastest Pentium D chips - but it looks like after three years K8 may finally get some competition for the performance crown.
107 Comments
View All Comments
mino - Tuesday, April 11, 2006 - link
1) 3-cycle L1 on K7/K8 is the fastest required, it goes from the internal structure if the scheduler and the pipeline that 2-cycle chache would do almost no good. Also they would have to reduce L1 size to 32k+32k which would hurt. It simply does not make sense to change L1 at all, maybe on K8L but IMHO 128k+128k would help much more than 2-cycle latency.2) 17-cycle L2 is PRETTY GOOD for 1M L2 with exclusive structure!!! IMHO it is possible to do 16-cycle, maybe 15, but nowhere near Dothan's 10-cycle. Also remember lower-latency L2 has scaling problems (that's why intel made prescott's L2 slower than NW's)
3) Concerning the memory subsystem(caches + memory) (on single-socket K8/K8L) the biggest issue is the robustness(amount of on the fly acceses to memory) and latency of the memory controller. To solve this is not trivial thing. IMHO to add 2-4M L3 with random access ~50 cycles would do.
4) In the >4 sockets front all they need is effective caching of MOESI snoops.
You are also forgot K7/K8 is mostly KISS architecture. It is just wery well balanced so has good performance in the end. However do one wrong change and you are screwed.
KISS == Keep It Simple Silly
About "weak" SIMD implementation on AMD, don't fool yourselves guys. Only x86 architecture faster than K8 on SSE/SSE2 is Netburst aka SIMD-by-intel.
About conroe, ita has twice as wide ALU's and FPU's than PIII/K7/K8, this means it has huge resources at disposal to calculate SIMD.
Same goes for K8L 2 quarters later. That said K7/K8 core has far more FP power than P6 architecture. On FP Conroe and K8 are about aquall.
but K8L will wipe the floor with K8 and Conroe on FP. Conroe will wipe K8 on INT and be still faster than K8L by decent margin.
Overall we are for another PIII vs. K7 battle with single very important change - AMD has a platform it had not back in the K7 vs. PIII days.
fitten - Thursday, April 13, 2006 - link
I find the K8L a somewhat odd strategy. I guess they are targeting the Itanium market because Opterons already have a good part of the HPC market. Given that the HPC people are the ones that really care about FPU performance and that they are still a fairly small market segment, it seems an odd target. Integer performance rules the roost for servers... web, database, and just about everything else you can think of other than number crunching simulations and the like. Desktop uses for FPU are a few like games and some mathmatical stuff. Intel is focusing on integer performance at least as much as FPU with Conroe (Conroe gets a good dose of both), which makes sense to me since so much of the work done on computers, both desktops and servers, is dominated by integer operations. K8L speculation says only FPU horsepower will be added... just doesn't seem like a sound decision to me.Zoomer - Monday, April 10, 2006 - link
Hey anand, could you take out 1 of the two modules and do a quick test on that?With doubled (in theory) bandwidth with ddr2, wouldn't the dual channel mem controller be even more redundant? Perhaps we'll see a new 754-ish socket? :)
Zoomer - Monday, April 10, 2006 - link
Hey anand, could you take out 1 of the two modules and do a quick test on that?With doubled (in theory) bandwidth with ddr2, wouldn't the dual channel mem controller be even more redundant? Perhaps we'll see a new 754-ish socket? :)
Furen - Monday, April 10, 2006 - link
I dont believe we will. Even S1 will be dual-channel, and this is what would have benefited the most from being single-channel (since the pincount would be much lower the package could be much smaller).BaronMatrix - Monday, April 10, 2006 - link
Looking at the intensive timing and bus speed tweaks USING the SAME RAM as the latest XE955 article I would have expected the same kind of thing here. Anand doesn't look at lower speed lower latency for whatever chip he used. That RAM will do 3-2-2 at 667. Obviously AMD is more sensitive to latency.ChristTheGreat - Monday, April 10, 2006 - link
AMD is sensitive to latencies, cause of the memory controller. I'm sure that 3-2-2-9 DDR2 from OCZ, would give much more performance on AMD.Again, this is only a CPU that they use to test, so it's not the true CPU. They wouldn't give us the performance it gives before it's launch. That's like killing yourself right now if the performance is poor....
I saw an article, that AMD could be working on DDR2 latencies. You think that 4-4-4-12 is good timings? 12 = tRAS
"tRAS is the time required before (or delay needed) between the active and precharge commands. In other words, how long the memory must wait before the next memory access can begin."
In fact, you have better frequencies, but lower timings.... What you need, is higher frequencies, and lower timings.
So we will have to wait till they launch Socket AM2, to know the true performance of AM2.
defter - Monday, April 10, 2006 - link
4-4-4-12 are good timings, even for DDR2-667. It isn't easy to find reasonable priced DDR2-667 that works on those timing with standard voltage.
Some people forget that 99% of consumers won't be using super expensive overvolted 3-3-3-10 DDR2-800 memory just to get few percents of extra performance. And if you compare AMD CPU + super fast DDR2-800 against Intel CPU (which runs fine on DDR2-667 because of FSB limitation) then you need to take into account higher price of memory on AMD system.
Wesley Fink - Monday, April 10, 2006 - link
We are continuing to test the AM2 on different AM2 boards. On another motherboard we could run at 3-3-3 DDR2-800 with the OCZ PC2-8000 memory. Latency was a bit lower and bandwidth a bit higher, but nothing realy changed from Anand's conclusions. We have also been running DDR2-667 and DDR2-533 tests with this new super fast OCZ memory and cheaper mainstream DDR2 memory, and we will be sharing those results as soon as testing is complete.cornfedone - Monday, April 10, 2006 - link
The crap the mobo companies have been shoving out the doors the past couple years is pure garbage as any number of hardware review sites have confirmed. It looks like the AM2 mobos might be more half-baked crap. Until you can test the shipping CPUs on a quality mobo that allows proper memory timing, it's difficult to know what AMD's AM2 CPUs will or won't deliver. If I had a dollar for every bogus claim Intel has made, I'd be a Billionaire so I wouldn't hold my breath that Conroe will perform as Intel claims.