AMD Virtualization Improvements

The performance-related improvement to Barcelona comes in the way of speeding up virtualized address translation. In a virtualized software stack where you have multiple guest OSes running on a hypervisor there's a new form of memory address translation that must be dealt with: guest OS to hypervisor address translation, as each guest OS has its own independent memory management. According to AMD, currently this new layer of address translation is handled in software through a technique called shadow paging. What Barcelona offers is a hardware accelerated alternative to shadow paging, which AMD is calling Nested Paging.

Supposedly up to 75% of the hypervisor's time can be spent dealing with shadow pages, which AMD eliminates by teaching the hardware about both guest and host page tables. The translated addresses are cached in Barcelona's new larger TLBs to further improve performance. AMD indicates that Barcelona's support for Nested Paging requires very little to implement; simply setting a mode bit should suffice, making the change easy for software vendors to implement.

Power Management

The most recent aspect of Barcelona's design that AMD revealed is how it handles power management. Although all four cores still operate on the same power plane (same voltage), Barcelona's Northbridge now runs on a separate power plane. Barcelona's core and Northbridge voltages can vary between 0.8V - 1.4V independently of one another.

In a conventional platform architecture, the Northbridge and the CPU are already on separate power planes given that the Northbridge is external to the CPU. The benefit of this arrangement is that the two chips can power down independently of one another, so when the memory controller has little to do, it can power down until needed. With AMD's K8, this wasn't true as the Northbridge and CPU core(s) were on the same power plane. In Barcelona, they are separated to improve power efficiency.

The individual cores still share the same reference voltage, but each core has its own PLL so that they can run at different clock speeds depending on load. While voltages of all four cores have to be equal, clock speed and thus current draw can be reduced depending on load - which will amount to power savings under normal usage conditions. The implications on the desktop are particularly interesting since it's rare that most desktop workloads will keep all cores pegged at 100% utilization.

Barcelona supports up to 5 independent p-states for each core, varying only in clock speed. The p-states are completely hardware controlled, so you will not need a driver to enable support for the power management features. AMD also increased the amount of clock gating done on Barcelona compared to K8 at both the block level and logic level. AMD wouldn't give us any more detail than this, but given how long it's been since the K8's introduction we'd expect that there's a lot that can be done.

The performance efficiency enhancements to Barcelona, coupled with updated power management, further clock gating and 65nm process allow AMD's first quad core part to operate within the same thermal envelope as current Opterons.

Getting Spendy with Transistors - L3 cache Final Words
Comments Locked

83 Comments

View All Comments

  • agaelebe - Friday, March 2, 2007 - link

    Wow! A lot of dicussion in here.
    And, by the way, very interesting article.

    I'm a software engineer from Brazil and I'm planning to change my PC this year.
    I've bem using AMD processors since the K6.
    Today I've a XP Mobile 2500+(@2.2ghz), 1gb ram, 200gb and an AGP 6600GT
    My PC is not very slow, but I'm thinking in going dual core to speed things up(office applications, web development and some games).
    I can run some of the newest games, but not in high graphics.
    I expect that my PC can run C&C 3 (Already run the demo in 1024 medium, but have some craches although it's not running it slow)

    So, today I'm thinking in 3 options:
    1) Stay with this computer and wait until AMD launchs it's new architecture (I pretend to go with an average price Kuma)

    2) Go with Intel Core 2 Duo (e6300 or e6400). They're not expensive and for games I can easily make an overclock and gain more performance.

    3) Buy a good AM2 board and a cheap Atlhon X2 (3600) and wait new AMD processors and then change only the processor.

    Here in Brazil the taxes are to high, so I'm planning in buying a PC with these specs:

    - CORE 2 Duo e6300/6400 or X2 3600/3800
    - mid-tier motherboard (
    - 2 x 1gb DDR 800 4-4-4-12
    - 2 x 250 gb
    - X1950pro 256 or 512
    - 500watts power

    So the prices are below:

    e6300 box US$ 300 (same price for a X2 4200+ box)

    x23800 box US$ 220

    motherboard: US$ 220

    ram: US$ 400

    video: US$ 450

    DVD: US$ 70

    case: US$ 150

    HDs : US$ 250

    Power: us$ 180

    So I plan to spent about 2000 dollars (Sadly, I can buy this same PC in US for the half of the price).

    My new PC should spent not to much power so I can leave it turned onall day long(max 150watts on iddle without monitor), otherwise I'll keep my old computer turned on just for downloding stuff)

    So, If someone has an opinion, I'd like to "hear" it. You can give another options to, or make some comments about the specs I'm choosing now.

    I had Pentium 75 and after that only AMD CPUs... Should know I surrender to the Core 2 Duo or believe that AMD can really beat it until the end of 2008?

    And thanks for the cooperation and patience.
  • Zebo - Saturday, March 3, 2007 - link

    Athlon 64 AM2's arnt exactly slow so if you're an AMD fan just get one..like a 3800+ or 3600+ and overclock it. It will be at least 4x faster than what you have now and accept K8L Agena core later. It will be cheaper than C2D by about $50 USD and You'll also pay cheap for a GeForce 6100 Motherboard which is only $50 USD. Overall expect the the AM2 system to be about $100 USD cheaper.

    Keep in mind that C2D is 20% faster clock for clock in most apps so it's not exactly a quantum leap here getting a C2D.. Gap gets a lot larger when overclocking since C2D's overclcok higher like 3.2Ghz is common on air vs. only 2.8Ghz for AM2, so, at the end of the day a C2D setup is able to be about 40% faster over most benchmarks. That is getting significant and why enthusiasts are buying C2D's.
  • agaelebe - Friday, March 2, 2007 - link

    And,as always, sorry with the errors and not so good writing...
  • Kiijibari - Thursday, March 1, 2007 - link

    Hi,

    never heard of of that before, does anybody know what it is ?
    So far I see 2 pad areas at the DIE photo, therefore I assume that it would be also 2 interfaces, e.g. x8 PCIe like Sun uses ?

    bb

    Kiijibari
  • mino - Friday, March 2, 2007 - link

    It should be some management/coodrination stuff (can-t remember the name of that bus).
    Every northbridge and CPU has that.
  • davecason - Thursday, March 1, 2007 - link

    Anand,

    Great article! I know it took a lot of time and I wanted you to know I really appreciate your effort. It is the kind of article that keeps me coming back to your site.

    -Dave
  • yyrkoon - Thursday, March 1, 2007 - link

    quote:

    On average, about 1/3 of all instructions in a program end up being loads, thus if you can improve load performance you can generally impact overall application performance pretty significantly.


    Page 5, paragraph 4 'pretty significantly'. Well is it, or is it not it ?

    http://www.wikihow.com/Avoid-Colloquial-%28Informa...">http://www.wikihow.com/Avoid-Colloquial-%28Informa...

    Aside from my gripe concerning writing style, good article :)
  • trisweb2 - Friday, March 16, 2007 - link

    Usually we criticize writing style based on a whole experience... obviously Anand is one of the best technical review writers on the Internet; if you bother to read his articles more fully perhaps you'd realize that. The colloquial writing sometimes brings it to a more personal level that a reader can better relate to and understand -- it works especially well in this case, where it's a future design, we really don't know how it's going to perform. That he can guess and say "pretty significantly" tells me he understands the uncertainty of the situation, and the language communicates that fact perfectly well. It would be more confusing if he said it would impact performance "significantly" as you want him to, as that would imply that he was more certain than he might actually have been.

    Masters are allowed to bend the rules, and Anand is one, so lay off.
  • yyrkoon - Thursday, March 1, 2007 - link

    *Is it, or is it not*

    /me hangs head in shame
  • baronzemo78 - Thursday, March 1, 2007 - link

    Any rough guess as to how Barcelona will compete with Core2 in gaming? Many articles have shown how Core2 gets you a slight FPS boost in games that aren't graphics card limited. I'm curious how Barcelona will fit in with the overall picture of DX10 cards like G80 and R600.

Log in

Don't have an account? Sign up now