What multi-core cpus mean for the industry
There's a change coming. Plentiful ram and clock rates were the story of the last 5 years. Massive parallelization in the form of multi-core cpus will be the story of the next 5. A year from now, every system in a data center will have 2-4 cores and 1-4 cpus. In just a few years, I predict you can buy $7500 systems from Dell each having upwards of 32 cores in it (4 cpus each having 8 cores). Intel sees an 80 core cpu by 2011. That's a lot of power.
It's pretty easy to see how to take advantage of this processing power in easily parallelized computing applications (image processing, video processing, linear algebra), but it gets more difficult to imagine the techniques when you start thinking about lowly unix tools like sort or uniq? I'm sure these can be made multithreaded (partition the data into n sets, sort each set concurrently, then do a merge sort in one pass) and should be if a short few years, we'll have 2-8 cores in most boxes. (If I'm not working, maybe my contribution to OSS will be to multithread common utilities.)
Google, with their map-reduce mindset, is well positioned to reap the benefits of this change. Say some problem takes 1000 units of cpu time. Further, say, map-reduce adds whopping 50% overhead to that problem (slicing up the work, sending it out to nodes, waiting for their results, doing the union, etc). So, for a 1 processing element (PE) cluster, that job would take 500 (the overhead) + 1000 (work) == 1500 ticks. But, what if you have 2 PEs? Immediately you're back to 500 + 1000/2 == 1000 ticks. With 4 PEs, you've broken even. It's a hard pitch to your boss to buy 4x the hardware to get the same results as you already get, but what happens when you can get 20 PEs doing the work for you? You can finish the job in 550 ticks, and eventually, you're gated only by the map-reduce overhead.
These are big changes and figuring out how to make the most of this new class of hardware will be a cornerstone of high performance system design.
It's pretty easy to see how to take advantage of this processing power in easily parallelized computing applications (image processing, video processing, linear algebra), but it gets more difficult to imagine the techniques when you start thinking about lowly unix tools like sort or uniq? I'm sure these can be made multithreaded (partition the data into n sets, sort each set concurrently, then do a merge sort in one pass) and should be if a short few years, we'll have 2-8 cores in most boxes. (If I'm not working, maybe my contribution to OSS will be to multithread common utilities.)
Google, with their map-reduce mindset, is well positioned to reap the benefits of this change. Say some problem takes 1000 units of cpu time. Further, say, map-reduce adds whopping 50% overhead to that problem (slicing up the work, sending it out to nodes, waiting for their results, doing the union, etc). So, for a 1 processing element (PE) cluster, that job would take 500 (the overhead) + 1000 (work) == 1500 ticks. But, what if you have 2 PEs? Immediately you're back to 500 + 1000/2 == 1000 ticks. With 4 PEs, you've broken even. It's a hard pitch to your boss to buy 4x the hardware to get the same results as you already get, but what happens when you can get 20 PEs doing the work for you? You can finish the job in 550 ticks, and eventually, you're gated only by the map-reduce overhead.
These are big changes and figuring out how to make the most of this new class of hardware will be a cornerstone of high performance system design.
1 Comments:
Turns out the Intel 80 core bohemeth won't have 80 x86 cores. Instead, they'll just be FPUs.
Post a Comment
<< Home