AMD's Phenom X4 9950, 9350e and 9150e: Lower Prices, Voltage Tricks and Strange Behavior
by Anand Lal Shimpi & Gary Key on July 1, 2008 12:00 AM EST- Posted in
- CPUs
The Story of Phenom's Erratic Performance
A few months ago I called AMD with a problem. In testing for AMD's Phenom re-launch, I encountered a major issue: Phenom performance actually degraded since I first tested it. There were two benchmarks in particular that saw performance go down: SYSMark 2007 and Adobe Photoshop CS3. SYSMark gave me scores that were around 10% lower than what they were when Phenom launched, despite these being B3-stepping parts. Photoshop was far worse, with performance being around half of what it was at the Phenom launch.
I originally attributed the changes to something strange that happened with the move to Vista SP1. I theorized that somehow the TLB fix was being applied to B3 stepping parts and negatively impacting performance, but WinRAR and memory tests didn't support the thought. AMD couldn't figure out what was happening so I chalked it up to a problem with my testbed or testing methodology, something I'd have to revisit at a later time.
The SYSMark issue actually went away on its own; I swapped from my 780G motherboard to a 790FX without reinstalling Windows to see if it was a 780G/integrated graphics issue, the performance problems remained. But upon swapping back, once again without a Vista reinstall, my SYSMark scores magically jumped around 10%. The "fix" lasted long enough for me to finish the benchmarks for the Phenom re-launch review, but sure enough the problem reappeared when I tried to re-run one test after I'd gotten everything I needed for the review. I never did figure out what was causing my Photoshop performance issues however.
The Culprit: Cool'n'Quiet
When I started testing for today's review, I ran into the same issue again. I always start by benchmarking SYSMark first, since the suite takes forever to complete on a single CPU. As soon as I got my first results, I realized my problem was back. Determined to find the cause I tried everything...again. The one thing I didn't change however was the Cool'n'Quiet setting in the BIOS, but I did try it this time.
Cool 'n Quiet is the marketing name for AMD's on-the-fly clock speed/voltage adjustment. Depending on the software load on the CPU, AMD's Cool'n'Quiet will adjust the p-state of the CPU cores - which includes adjusting core voltage and clock speed. If you're running a processor intensive game or application, CnQ will make sure your CPU runs at full speed, but if you're just typing a text document it will underclock/undervolt the CPU.
Phenom's CnQ is a more advanced version of what was in the Athlon 64 X2, it not only allows for individual core clock speed adjustment but is able to transition between states faster than previous versions of CnQ (at least that's what AMD implies).
SYSMark 2007 Preview Overall Score | CnQ/EIST Enabled | CnQ EIST/Disabled | Performance Increase from Disabling CnQ/EIST |
AMD Phenom X4 9350e | 101 | 113 | 11.8% |
AMD Athlon X2 6400+ | 121 | 123 | 1.7% |
Intel Core 2 Quad Q9450 | 153 | 156 | 2.0% |
Disabling CnQ increased my SYSMark scores by around 12% and cut my Photoshop CS3 render times in half (58.7s with CnQ enabled, 35.2s with CnQ disabled); enabling CnQ had the opposite effect. Gary ran similar numbers using PCMark Vantage and found a 5% difference. AMD originally insisted that the problem was because SYSMark introduces unrealistic pauses into its benchmark (so called "think" times or periods of time while the system is waiting for user input), but since we found the same issue in other benchmarks (PCM Vantage and our Photoshop test), we believe this is more than just a SYSMark issue.
The SYSMark problem was mostly repeatable, it would consistently produce lower scores with CnQ enabled on the Phenom CPUs. The Photoshop scores were a bit more erratic - the problem went away for a little while but has since returned and won't go away again, even with CnQ disabled. It is worth mentioning that the majority of our benchmarks wasn't impacted by the problem, but that doesn't mean that it won't impact your daily usage.
Note that the same problem doesn't plague the Athlon X2, this appears to be a Phenom/K10 issue only. As a reference we ran some numbers with Intel's SpeedStep enabled vs. disabled and didn't see any similar behavior.
I had found the source of my problems, but I didn't understand why it caused them.
36 Comments
View All Comments
MikeODanyurs - Thursday, July 17, 2008 - link
This article got me thinking, why would you purchase a 9350e for a HTPC (which was what I was planning) when for the same or less money, you could get a 9850 Black Edition and just set the muliplyer to 10 (instead of 12.5). You'd have a CPU that you could use later on at full speed or OC'd, but for now on a HTPC underclock it to the 9350e speed and you would still have the bus at 4000 (not 3600) and the NB and HT speed at 2.0GHz (not 1.8GHz). Anything I would like to know is what the power usage would be if you did this (125w), compared to a 9350e (65w).Wwhat - Friday, July 11, 2008 - link
In regards to your description of the strange behavior I'd like to point you to this article on microsoft.com:http://support.microsoft.com/?id=896256">http://support.microsoft.com/?id=896256
An excerpt:
Possible decrease in performance during demand-based switching
Demand-Based Switching (DBS) is the use of ACPI processor performance states (dynamic voltage and frequency scaling) in response to system workloads. Windows XP processor power management implements DBS by using the adaptive processor throttling policy. This policy dynamically and automatically adjusts the processor’s current performance state in response to system CPU use without user intervention.
When single-threaded workloads run on multiprocessor systems that include dual-core configurations, the workloads may migrate across available CPU cores. This behavior is a natural artifact of how Windows schedules work across available CPU resources. However, on systems that have processor performance states that run with the adaptive processor throttling policy, this thread migration may cause the Windows kernel power manager to incorrectly calculate the optimal target performance state for the processor. This behavior occurs because an individual processor core, logical or physical, may appear to be less busy than the whole processor package actually is. On performance benchmarks that use single-threaded workloads, you may see this artifact in decreased performance results or in a high degree of variance between successive runs of identical benchmark tests.
..
It also explains how to change the policy in windows regarding this behavior, and while this is about XP I would not be surprised vista inherited this.
Rev1 - Saturday, July 5, 2008 - link
I dunno im gunna wait till nahalem comes out.gochichi - Thursday, July 3, 2008 - link
I see that they have a lot of issues still. They need make sure that their board partners and chipset designs are really stable and ready for the "future"... I thought that was the whole point of AMD's spider platform, that they weren't gonna be playing around with switching sockets on users and such.Well, anyway... I love having a quad-core (Intel) and I think that these products are all at a nice performance level. I do believe it helps for them to have ATI right now. Because if I were going to buy a cross-fire system, which I think makes more sense now than at any other point of time, it would be tempting (though still ultimately wrong) to get a 2.5 AMD quad-core to have everything match.
Wow, it's a great time to be a computer shopper right now. Today's AMD would have trounced yesterday's Intel, but Intel is just so lean and so efficiently producing better and better stuff that it's a tall order to even compete. AMD does seem to be making their appeals to the big guys pretty effectively, seems like when you go to brick and mortar stores about half of the systems are AMD based.
Looking into the future, AMD has what it takes to remain relevant in the market. They need to switch to a smaller process and save energy, and keep the 3 or 4 core thing going... no dual cores at all is the right thing to do.
I'm thinking that AMD could really grow into a huge beast with their AMD/ATI combo, just look at the beating NVIDIA is getting all of a sudden. The Radeon 4850 is going for $170.00??? And it's in abundant supply... wow, sucks to be lil' NVIDIA right now. AMD beat Intel when Intel was doing all kinds of illegal stuff that made it irrelevant, next time AMD beats Intel (if) it's going to be a very very big deal.
AMD has market share and name recognition, all they need now is a killer product, and I think they're getting really close to having one.
Aries1470 - Thursday, July 3, 2008 - link
Hey, just speculating now...
In a couple of years (maybe some time next year?), we should start seeing CPU's incorporating GPU's, or vice versa.
So the battle would be something like:
AMD/ATI
Intel/Intel
VIA/S3 Hey, they are still on single cores, as far as I know, but they do have the Chrome 440 GTX, of which is a DX10.1 product, but that doesn't mean much.. I haven't searched for a review yet ;-)
nVidea/??? correct me if I am wrong, but at the moment they have No rights for x86 CPU??
Well, that is my 2¢ worth.
p.s. the Chrome 440GTX is with a 64bit memory us? Hmm... not really wide. ahmmm...
Calin - Friday, July 4, 2008 - link
VIA is playing in a league of its own (or played, until the Intel Atom). Even with the Atom, the northbridge is using a lot of power, so VIA could still compete.VIA is now all about integrated platform with low performance at very low power draws (very low compared to the x86 world of today).
As for the 64-bitness of the S3 Chrome memory controller, this too helps save both costs and power.
garydale - Thursday, July 3, 2008 - link
I realize that the Windows world is still 32bit, but we're talking about processors here. I run a pure 64 bit Linux (Debian/Lenny) with (almost) no 32 bit applications - certainly not the ones I use frequently that need full processor speed. Why are the tests all done with 32 bit software?Even if you can't run all the tests in 64 bit mode, surely a few benchmarks are available in 64 bit so we can get an idea of how the various processors perform in their native modes?
ohodownload - Wednesday, July 2, 2008 - link
- The AMD Phenom X4 9950 Black Edition 2.6GHz @ 140W; $235
- Energy-efficient AMD Phenom X4 9350e 2.0GHz @ 65W; $195
- Energy-efficient AMD Phenom X4 9150e 1.8GHz @ 65W; $175
The first thing to notice is that AMD is launching a new model set 100 MHz higher for the same price as the previous 9850 at 2.5 GHz. The small boost in frequency should enable this CPU to be equivalent to Intel’s Q6600. Unfortunately, this is to the detriment of power consumption and the TDP gains 15W compared to the previous model for a total of 140W. Therefore, even with its Q6600 Intel holds a clear advantage in terms of power consumption all the while being slightly less expensive (officially $224)....
more : computer-hardware-zone.blogspot.com/2008/07/new-3-amd-phenom-x4s.html
aguilpa1 - Wednesday, July 2, 2008 - link
I was just noting how Anandtech's articles are often a little slow to come by (new ones) but WOW! CHock full of great info, insight and indepth information. Its like a good book you just can't quit reading till its done. Great Job GUYS!Christian Packmann - Wednesday, July 2, 2008 - link
1. Are there no CPU drivers for the Phenom? These should allow changing C&Q configuration with Windows' energy options, better than fiddling in the BIOS. This works on my Athlon64 3200+ on WinXP Pro.I'm also running AMD Power Monitor, which allows quickly switching the used energy setting from its tray icons popup menu, so I don't have to open the energy options every time.
This is great for debugging C&Q-related problems, which can happen on single-core CPUs too.
2. I'd guess that the isse with the Phenom is a software bug in the CPU driver or BIOS, but it /might/ be a CPU bug.
CPU frequency changes are always driven in software (doesn't matter if it's a kernel driver or BIOS) by writing to some model-specific CPU registers, see the "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 10h Processors", http://www.amd.com/us-en/assets/content_type/white...">http://www.amd.com/us-en/assets/content...e/white_... .
You can set the VID and Frequency with these registers. A frequency ramp-up or ramp-down usually consists of several steps of changing voltage and frequency until the target frequency is reached. There is a short waiting period after each change until the new voltage step has stabilized.
These waiting periods seem to be contained in some more CPU registers; if these values are wrong, C&Q will obviously not work as planned.
Now I don't know if these waiting values are determined by the CPU or written to it by the BIOS/kernel driver during its respective initialization (I only skimmed this section in the BKDG, not enough time ATM). If they are determined by the CPU itself and C&Q goes wrong, this would indicate a CPU bug. If they are determined by the BIOS however, this would be a BIOS bug and fixable with a BIOS update.
If you find a Phenom motherboard which doesn't exhibit these problems, then the BIOS would obviously be the culprit.
Oh, and a wild-ass-guess: maybe these waiting periods are (in part) influenced by temperature, and vary with changing CPU temperatures - this would of course imply that they aren't determined once at startup, but recomputed at given intervals. This should be testable, too.
3. Even if it's just a BIOS bug, a fix shouldn't completely solve the performance loss with PS, due to the split power planes of the Phenom. As long as the kernel throws threads around the cores willy-nilly, you will get a performance loss if C&Q does work properly. Frequency ramp-up does always take some time if C&Q actually works. This specific problem cannot occur on CPUs with a single power plane (which the C2Qs still have, I think), as all CPUs will always have full voltage as long as a single core runs at 100%; only the frequency might need to be adjusted. Of course the performance loss you measured is extreme and indicates broken C&Q ramp-up speed.
BTW, this wild thread-changing you observed will always cost some performance for most code, as a core change makes the L2 contents useless. This is only 512K for the K10, but on C2Q a change between cores 0/1 and 2/3 will invalidate up to 6MB, depending on CPU model and cache usage. This doesn't matter much with todays memory bandwidths, but I'd still regard this a kernel bug.