i5 / P55 Lab Update - Now with more numbers
by Gary Key on September 15, 2009 12:05 AM EST- Posted in
- Motherboards
i5 / P55 Lab Update -
We welcomed Anand back into the office with open arms this past weekend. He immediately started working on an in-depth analysis of clock for clock comparisons for several processors as a follow up to our Lynnfield launch article (among many other things). This analysis along with a quick i7/860 performance review will be available in the near future.
In the meantime, I have additional performance results using the P55 motherboard test suite along with some unusual results from our gaming selections. I am not going to dwell on with commentary in this short update. We will let the numbers speak for themselves at this point. Let’s get right to the results today, but first, the test setup.
Test Setup-
For our test results we setup each board as closely as possible in regards to memory timings and sub-timings. The P55 and 790FX motherboards utilized 8GB of DDR3, while the X58 platform contained 6GB. The P55 and X58 DDR3 timings were set to 7-7-7-20 1T at DDR3-1600 for the i7/920, i7/870, and i7/860 processors at both stock and overclocked CPU settings.
We used DDR3-1333 6-6-6-18 1T timings for the i5/750 stock setup as DDR3-1600 is not natively supported in current BIOS releases for this processor at a stock Bclk setting of 133. We had early BIOS releases that offered the native 1600 setting but stability was a serious problem and support was pulled for the time being. Performance is essentially the same between the two settings. When we overclocked the i5/750 to 3.8GHz, we utilized the same DDR3-1600 7-7-7-20 1T timings as the i7 setups.
The AMD 790FX setup is slightly different as trying to run DDR3-1600 at CAS 7 timings on the 1:4 divider is extremely difficult. DDR3-1600 is not natively supported on the Phenom II series so this divider is provided with a caveat that you are overclocking the memory bus. The same holds true for the Lynnfield (i7/8xx, i5/7xx) processors as DDR3-1333 is officially the highest memory speed supported and it is DDR3-1066 for the Bloomfield (i7/9xx).
Without resorting to some serious overvolting and relaxing of sub-timings, we set our AMD board up at DDR3-1600 8-8-8-20 1T timings. The difference in performance between C7 and C8 DDR3-1600 is practically immeasurable in applications and games on this platform. You might pick up an additional few tenths of second in SuperPi or a couple of extra points in AquaMark or 3DMark 2001SE, but otherwise performance is about equal.
However, in order to satisfy some of our more enthusiastic AMD supporters, we also increased our Northbridge speed from 2000MHz to 2200MHz to equalize, if not improve, our memory performance on the AMD system. Yes, we know, further increasing the NB speed will certainly result in additional performance but the focus of this short article is to show clock for clock results at like settings. Personally, I would run DDR3-1333 C6 with 8GB as this platform favors tighter timings over pure bandwidth.
Last, but not least, I only ran the i5/750 without turbo enabled and the P45/C2Q setup is missing. I am still completing those numbers. Anand will be providing additional analysis on the other Lynnfield processors in his update. The image gallery below contains our Everest memory results with each processor overclocked at similar memory settings along with voltage/uncore/subtiming options. I will go into these in more detail once the motherboard roundups start. For the time being, the 860/P55 offers slightly better throughput and latency numbers than the 920/X58 when overclocked. At stock, the numbers favor the Lynnfield, but primarily due to the turbo mode.
Other than that we are in a holding pattern on the P55 roundups at this time trying to figure out some unusual game and 3D Render results with our GTX275 video cards. I will discuss this problem in the game results.
77 Comments
View All Comments
crimson117 - Tuesday, September 15, 2009 - link
"It is items like this that make you lose hair and delay articles. Neither of which I can afford to have happen."Thank you for making me almost choke on the scone I was eating.
Ocire - Tuesday, September 15, 2009 - link
That's some strange numbers you got there...Just an idea that popped up my mind: Could effective PCIe bandwidth be the key here?
You can do the bandwidth test that comes with nVIDIAs CUDA with the shmoo-option for pinned and unpinned memory on both platforms.
If you get higher numbers on the AMD platform, it could be that with P55 and X58 the card is in some cases interface bandwidth bound. (Which isn't that uncommon in some GPGPU applications, too)
Holly - Tuesday, September 15, 2009 - link
I was thinking about bandwidth as well, but then I realised with P55 CPU has direct lanes to first PCIe 16x slot while X58 platform runs via Northbridge... It's way too different approach to both produce same problem imho...This seems like driver issue to me. Maybe the CPU and GPU parts of the engine+drivers run asynchronous and communication in between gets suffocated (st like i7 manages to compute too fast and data have to wait for next loop to go through)...
no clue though, it's just my best guess...
TA152H - Tuesday, September 15, 2009 - link
A lot of people are getting confused by the PCIe being on the processor as opposed to the Northbridge. Because the IMC shows advantages on the processor die in terms of memory latency, and moving the floating point unit did with the 486, it's assumed moving things onto the processor die should be faster. If you look closer though, it becomes clear this just isn't so.The reason the memory controller on the processor has advantages comes down to two important considerations - it's transferring to and from the processor, and it's got finer granularity than something running at lower clock speeds.
Let's look at the second first. Let's say I'm running the processor at eight times the speed of the northbridge. The number isn't so important, and it doesn't even have to be an integer, to illustrate the point, so I'm just picking one out of the air.
Let's call clock cycle eight the one where the northbridge also gets a cycle and can work on the request from the processor, and for the sake of simplicity, the memory controller can work on it. If I get something on processor cycles one through seven, I could start the memory read on the IMC, but the slower clock speed doesn't that level of granularity, so the read, or write, request waits. This is a gross oversimplification, but you probably get the point.
Perhaps more importantly, you're transferring from memory, to the processor. It's too the actual device the memory controller is attached to. And since the memory controller is on the processor itself, there's less overhead in getting generating the request to the memory controller.
The PCIe 16 slot is a different animal. You're not generally using the processor for this, except for now. It's going from one device, typically to memory, with possible disastrous consequences for the brain-damaged Lynnfield line.
With a proper setup, there wouldn't be any real difference. I'm not sure how the Lynnfield is set up though, and I'm not sure it's a proper setup like the x58. I'll explain. If we look at the x58, the video card will transfer memory to the Northbridge, and then the Northbridge will interface to the memory. It has channels on it to do that, and it doesn't involve the processor at all, or need to.
On the Lynnfield, now you're involving the processor, for no good reason except for cost (which is a really good reason, actually). So, now the video card sends a request to the PCIe section of the processor, and the processor has to do the transfer. Now, this is the big question - just how brain-damaged is the Lynnfield? Does it actually have a seperate path to handle these transfers, or does it basically multiplex the existing narrow memory bus to handle these? It's almost certainly the latter, since I don't know how much sense it would make to only use part of the memory bus for all other transfers. Anand can hopefully answer these questions, although this site tends to never look very deep at things like this, so I'm doubtful. I guess they don't think we're interested, but I surely am.
The end result is, you could have competition for the already narrower memory bus, where the processor can get locked out of it while PCIe transfers are going on. This is consistent with some benchmarks other sites show, where the Lynnfield struggles more than it should on games.
I don't want to make this sound worse than it is though. Most video cards have a lot of memory, and the actual number of requests to memory hopefully isn't very high. Even on a Bloomfield, any request to main memory is a slow down since video memory will always be faster. Also, cards will keep getting more memory, whereas the human eye is not going to get able to discern better resolution, so presumably cards will not have to use main memory at all, in the future. And, keep in mind, processors have big L3 caches, so don't need to go out to memory all that often. So, it's not catastrophic, but you should see it, and more as you stress the CPU and GPU. Again, if you look at other websites that did some serious game stressing, you do see the Bloomfields distance themselves from their brain-damaged siblings as you stress the system more.
yacoub - Tuesday, September 15, 2009 - link
I think we'd all be interested to know the answer to this quandary, however I also think you're exaggerating a bit when you call it brain damaged and say things like "disastrous consequences". In reality what you mean is that in extreme bandwidth saturation situations like SLI it's possible it might be slower than X58, and maybe when the DX11 generation of cards come out they will actually require enough bandwidth for it to be even more noticeable in SLI comparisons. But for the vast majority of us who run a single GPU, so long as a single DX11 card doesn't fully saturate the available bus bandwidth and thus doesn't perform any less than on X58, P55 is just fine.TA152H - Wednesday, September 16, 2009 - link
I agree, when I was typing it, I meant disastrous consequences with respect to that particular instance when the processor has to multiplex. In the context of overall performance, I don't think it would be that huge. I wasn't clear about that.But, you're off with regards to the saturation. You don't have to saturate the bus, at all. You don't need two cards. That type of thinking is fallacious, in that it assumes only part of the bus is used.
In reality, ANY time the video card needs the memory bus, and the processor needs to read memory, you've got a collision, and one or the other has to wait. This would happen more often if you have more stress on the processors, or video subcomponent, but could happen even with one card, and one processor being used. It just would be much less frequent, and probably insignificant.
These are the type of compromises these web sites should be bringing up, but they simply don't. It's not just this one, but shouldn't a tech site bring up questions like this, instead of just publish benchmarks (which, as we know, are far from objective and can paint a different picture based on the parameters and benchmarks chosen).
I wouldn't expect this from PC Magazine, but from 'tech' sites? They just aren't very technical.
Gary Key - Tuesday, September 15, 2009 - link
P55 will be just fine with the DX11 cards.. whistles and grins evilly looking at the results, however, for top performance in SLI or CF, it is X58 all the way. That said, I have not been able to tell the difference between 130 FPS and 128 FPS in HAWX yet, nor between 211 FPS and 208 FPS in L4D. :)jonup - Wednesday, September 16, 2009 - link
does your comment mean that you have/have seen a HD 5870 in action?yacoub - Tuesday, September 15, 2009 - link
It appears more driver/optimization-related given that some games that are less bandwidth-intensive are showing the strange performance and others are not.yacoub - Tuesday, September 15, 2009 - link
Hey I wonder if Gary tried renaming the .exe just to see if it was a driver bug with certain game engine optimizations! :)