AMD Puts Servers, Datacenters in a Piledriver with Opteron 6300 Series Processors

5

Comments

+ Add a Comment
avatar

Andrew.Hodge

Strong bump in clockspeed from interlargos. Nice to see that the. Performance per watt has gone up for AMD. Can't wait to see what steamroller does.

avatar

limitbreaker

If only I could get a am3+ based 16 core capable of being overclocked to 5.5GHz. Now that would be one hell of a CPU if it could cost less than 500$

avatar

Diablo-D3

AM3+ is a problem because these server chips (just like every Opteron before them, and every server socket before this one) are designed to use more HyperTransport lanes than AM2/2+/3/3+ provides, and they are also designed to allow registered memory (in short, allows double the number of DIMM slots via longer traces at a slight increase in latency).

AMD currently has two Opteron sockets, C32 and G34. C32 is not expensive for a higher end workstation machine, and comes in both single socket and dual socket boards. G34 is the highly expensive one, meant for two and four socket configurations... the 16 core CPUs mentioned in the article are meant for G34 and is not available for C32.

These 12 and 16 core CPUs are MCM (multi-chip module) CPUs using two conventional 6 and 8 CPU dies put together inside the package. As in, imagine it as if you put two 8 core Piledrivers in the same package, ending up with 16 cores, two 8mb banks of L3, twice as many internal HTX lanes, dual memory controllers (acting in unison to serve all 16 cores), 12 DIMMs per socket in a quad channel arrangement, more than twice as many external HT lanes (due to quad socket arrangements), and the only down side is you use DDR3-1600 instead of 1866 (but still get almost twice as much memory bandwidth due to quad vs dual channel).

The problem is cost: the cheapest 6300 with 16 cores is $800, is 2.3ghz/3.2 turbo and you need to put at least four DIMMs in per CPU for maximum memory bandwidth and DDR3-1600 registered ECC is not cheap, and lets say, for fun, we want 32gb of memory (the maximum of AM3+, btw; G34 maxes at 128 per socket, 512 possible.

Cheapest DDR3-1600 registered ECC, 4x8gb: Around $225
Cheapest single socket G34 mobo that supports Abu Dhabi: About $220.
Total cost: $1245 ($77.71 per core @ 16 cores)

Cheapest DDR3-1660 registered ECC, 8x4gb: Around $232
Cheapest dual socket G34 mobo that supports Abu Dhabi: around $400
Total cost: $2232 ($69.75 per core @ 32 cores)

Cheapest DDR3-1660 registered ECC, 16x2gb: Around $280
Cheapest quad socket G34 mobo that supports Abu Dhabi: around $800
Total cost: $4280 ($66.88 per core @ 64 cores)

Want to do it again with C32?
Cheapest 43xx C32 8 core Piledriver: Not released yet, but around $250

Cheapest DDR3-1600 registered ECC, 4x8gb: Around $225
Cheapest single socket C32 mobo that supports C32 Piledrivers: $230
Total cost: $715 ($88.13 per core @ 8 cores)

Cheapest DDR3-1600 registered ECC, 8x4gb: Around $232
Cheapest dual socket C32 mobo that supports C32 Piledrivers: $300
Total cost: $1032 ($64.50 per core @ 16 cores)

And want to stick with AM3+ with a normal Piledriver?
Cheapest 8 core AM3+ Piledriver: $180
Cheapest DDR3-1866, 4x8gb that isn't a crappy brand and is 1.5v compliant: $154
Cheapest AM3+ board with a 990FX that isn't a crappy brand: $135
Total cost: $469 ($58.63 per core @ 8 cores)

So, really, its up to you on what you really want.

avatar

limitbreaker

Thanks for adding all that up :-)
The only good option would be if AMD would design a 16core piledriver/steamroller on a 18nm die based on am3+.
With better efficiency and smaller die they could fit it all together but i doubt there would be any real market for it when Intel already can make 8 core chips (non xeon) if they wanted to in no time. They have everything they need to do it but the desire.

avatar

Diablo-D3

Not enough room. The G34 socket is at least twice the size of AM3, so they can produce MCM CPUs on current tech.

The Bulldozer family design works like this: two hardware thread frontends, 4 integer ALUs, 2 floating point ALUs. This means that two threads can reach maximum hardware efficiency easier by, say, thread 1 doing lots of integer math, and thread 2 doing lots of FP math.

In conventional CPUs, you'd have two physical cores each with two int alus and two fp alus, and if one core is doing lots of int and the other core is doing lots of fp, two fp alus go idle and two int alus go idle.

What you REALLY want is connecting every hardware thread frontend to every ALU on the Bulldozer family chip (all 16 int alus and 8 fp alus). As in, if you're running a single threaded app and have no other active processes on the machine at the moment, you could concurrently issue 8 SSE instructions (assuming no dependency chin between the 8 instructions) and load up the entire chip using a single thread.

x86 chips do not exploit instruction level concurrency enough, imo.

Log in to MaximumPC directly or log in using Facebook

Forgot your username or password?
Click here for help.

Login with Facebook
Log in using Facebook to share comments and articles easily with your Facebook feed.