Athlon 64 X2: New Memory Dividers and Multitasking Performance
by Anand Lal Shimpi on August 12, 2005 2:00 PM EST- Posted in
- CPUs
PAR2 + Encoding
Creating and using parity files with archives is another very CPU and memory intensive task. We fired up Quickpar and restored a single file from an archive and its set of PAR2 files; alongside the PAR2 process, we performed a couple of tasks.First, let’s look at the archive reconstruction throughput without any other tasks:
PAR2 Archive Reconstruction Rate in MB/s (Higher is Better) | DDR400 | DDR480 | % Improvement |
PAR2 | 931MB/s | 941MB/s | 1% |
DDR480 only offers a 1% performance advantage here, but now let’s add MP3 playback in the background:
PAR2 Archive Reconstruction Rate in MB/s (Higher is Better) | DDR400 | DDR480 | % Improvement |
PAR2 + MP3 Decode | 904MB/s | 918MB/s | 1.5% |
The DDR480 performance advantage jumps to 1.5%, but now, let’s add a H.264 encode on top of that:
PAR2 Archive Reconstruction Rate in MB/s (Higher is Better) | DDR400 | DDR480 | % Improvement |
PAR2 + MP3 + H.264 Encode | 843MB/s | 880MB/s | 4.4% |
And now, we have a 4.4% performance advantage.
You can see how the performance differences scale according to the tasks that you’re pairing together.
23 Comments
View All Comments
Araemo - Friday, August 12, 2005 - link
I'm curious, does windows XP support NUMA?A quick google on the topic gives me conflicting info.
People seem to think it does, if you manually turn on PAE(Which has its own performance overhead, right?), but MS's website says "NUMA is supported only on Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition."
What I've read recently suggests that in the A64 X2 cpus, each core has one memory controller enabled, which suggests that NUMA could be usefull for performance reasons. However, what I read originally when the X2's were coming out was that one core simply had both its memory controllers disabled.. Does anyone know which of these two is correct?
In any case, it sounds like memory latencies to different memory addresses will be different between the cores.
Either one core will always have a higher latency, or each one will have low latency to some addresses and high latency to others.
Starglider - Friday, August 12, 2005 - link
The Athlon64 die contains a dual-channel DDR memory controller, three hypertransport transcievers, one or two processor cores and a crossbar switch that links them all together. Adding an extra processor core to the X2 didn't duplicate any of the other parts, so no there aren't any disabled memory controllers on there. Both cores are connected to the memory controller through the switch, so they have equal access to both channels (which are interleaved anyway when both active). NUMA would not be relevant because the banks aren't independently addressable by the OS and deliver exactly the same bandwidth and latency to both cores anyway. NUMA is only useful if your system has more than one processor socket, i.e. is an Opteron system.Araemo - Friday, August 19, 2005 - link
Thanks for clearing that up for me, but the # of sockets really has nothing to do with it. It is the # of independant memory controllers that matters, and AMD could have placed multiple single-channel controllers on the die if they thought the performance would be improved, but if the memory controller is 'external' to the core(Accessable via HT instead of a more direct link.. not that HT isn't good.), then I guess it doesn't matter. I was thinking the memory controller was part of the same HT node as the CPU core, but the method you described makes more sense anyways. If you have the memory controller logically seperated from the core, it can serve DMA requests from the northbridge/southbridge without bothering the CPU at all, as DMA should be.Diasper - Friday, August 12, 2005 - link
It looks to me like future dual-core games will benefit from the extra bandwidth. The logic for that being using a high-efficient dual-core engine both cores should be demanding as much bandwidth as possible and so consequently, we might see something more akin to the multitasking with Doom3 performance numbers.Either way the numbers should be over the numbers we saw first time when testing dual-core with only a single-core game so say that's 5%+ im[provement at DDR500. Either way I think this information is pretty significant for those going with dual-core processors.
Now where did my high sppeed low latency 1GB sticks go...
Oh yeah and first.
Zebo - Friday, August 12, 2005 - link
I don't know about that. Anand did'nt mention timings. I can only assume they are the same since he did'nt mention them at DDR400 and DDR480 respectivly... Which is faster? Who knows really... My feeling is if he let DDR400 at low latency it's capable of while DDR480 had high latency which it runs you would see neligible differences. Again not enough information...Diasper - Friday, August 12, 2005 - link
That's probably larger correct. I suspect they'll be running a similiar setup to before (http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?... where they were running 2 x 512MB sticks that could do 2-2-2 timings all the way upto DDR500 or so.But yeah, can we get any clarification on that please - it's appalling that you didn't include your test system criteria although we can probably guess and trust it was done correctly.
Zebo - Friday, August 12, 2005 - link
Yeah that VX stuff is most excellente.. The review I *really* want to see is how well DDR2 667 on M2 competes with say DDR 500 with it's new found low latency.. I have my money on "old tech":PDiasper - Friday, August 12, 2005 - link
Interestingly, looking at the results:For a 20% increase in memory speed we saw upto 10% increase in speed (approx) suggesting that X2 is bandwidth confined at least 10% when running full tilt so you'd be looking to be running at least DDR440 speeds or otherwise be risking lessened performance.
Of course, given the uneveness of memory requests from both processors, I guess we could presume they would benefit with more memory speed although benefits would lessen above a certain speed (eg the guesstimate DDR440) as it is unlikely that you'll typically come across a scenario where both processors are demanding maximum memory bandwidth at the exact same moment.
I guess that's speculation at best - but unless your an engineer that's about all you can do...
Spacecomber - Friday, August 12, 2005 - link
I think we can assume that it is the same set up as with the first article, as the previous poster suggested.From the article:
Space
Diasper - Friday, August 12, 2005 - link
[q}http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...Ah, well I probably shouldn't skip over stuff so quickly to get to the results - however why when in the previous test was the memory run at DDR500 now run here is only run at DDR480?
That rather nullifies the comparative significance of the test as the same test wasn't run. :/