Another random topic to discuss a viewpoint on hardware.
If anyone's notice the trend these days on how many cores there are in a processor over time, they might've noticed that we've pretty much flatlined for mainstream parts since 2007 (saying the first true quad-core part is the AMD Phenom). Sure, higher end parts and servers have gone up, but we've only quadrupled the core count in the past 7 years to 16 cores. Except the chip with the highest core count for a general purpose processor. So why is it that processors have flatlined, but GPUs, which are many-core parts, continue to rise? The answer to the best of my guess is just the fact that GPUs, unlike CPUs, have two things that make a multicore processor of any kind work really well: problems that are deterministic and independent of each other.
What do I mean by deterministic? Essentially, an operation done on a computer is completes in the same exact time, every time its run. If I feed the computer 1+1, then I can say that it will always take, for example, 1 clock cycle to get the result. It's easy to see: the values are known and the computer doesn't need to gather any more information. However, if the operation was 1 plus some user inputted value, then the operation is no longer deterministic. Sure, once the value is received, it will churn out the answer in 1 clock cycle, but the input could take a second or years. For a GPU running graphics, all of the data is present and accounted for. Sure, some problems require, like post processing effects, require an input, but that input has been generated by the time the step is reached. Even with physics, the inputs are known and can be run independent of each other, regardless if the result ends up in say a collision. It'll be accounted for in the next slice of time. But for a CPU, it handles all other kinds of programs, a lot of them of which are waiting for something to happen.
Take for instance, your word processor. 99% of the time it'll probably sit there, waiting for some input. Sure, you could be hammering away at the keyboard at 100WPM (which by the way is the equivalent to 500 characters a second), but to a computer, that's an eternity. By the time you've let go of the key, it's already processed the input and is waiting for you to press the next one. Even an internet browser, sitting there waiting for you to click a link or something from the network to happen. Even if some gaudy Flash animation is playing in the background, it's probably not going any faster than 30FPS, or 33ms per frame. Still, that's a long time for a computer (a tick on a 4.0GHz processor is 0.25 nanoseconds, or 250 picoseconds, or still about 121 million times faster than 30FPS). There's no point in throwing on more workers on a problem that requires idling most of the time.
The other area is independence. Ever since we've decided to make computers do more than just compute numbers, we've run into a bit of a snag known as resource management. As an example, two or more programs are fighting for resources. It can be anything from needing to use the hard drive, to something as simple as owning a single byte in memory. And then there's another separate, but related problem. What if one core was working on something, switches context to another program, then another core picks up where the other left off? Since we'd like to run our programs as much as possible in cache memory where it's very fast, we have an issue with making sure each core's cache contains the same data if they're working on the same problem (called cache coherency). If cache wasn't updated every time a core picks up the program, then it would be working on stale, incorrect data. It would be like if two collaborators decided to work on a Wikipedia page. One person does some work, but needs to do something else, so another user (presumably elsewhere in the world), picks up the job. Except how does that person know what was added? The data that person picked up is stale, old, and if worked on to completion, will be incorrect.
While we can have hardware ensure cache is coherent across all cores working on the problem... it becomes a little silly when much of your silicon isn't devoted to actual processing, but holding an infrastructure to maintain data across all players. So why isn't this a problem with GPUs? Simply put, despite their flexibility and computational prowess, we treat GPUs like simple computers. Simple like the old mainframes in the 60s and 70s. The CPU in place of the human, sorting out the punch cards for processing. They still have some logic similar to a CPU for managing jobs and reordering them, but most of their programs are "Do this until it's done, then spit out the results". A GPU core cannot afford to switch contexts because of the types of jobs we feed it and the purpose it serves. Imagine playing a game, and for whatever reason the GPU decides that rendering the Windows GUI was necessary instead halfway through rendering the game's frame. Nobody would like that.
So to take away from all this... Most of the programs we use daily in our lives are ill-suited for multi-core processors. You would be throwing silicon into creating more workers, rather than using that silicon to make workers that have better tools. They would all need constant updates because they can be interrupted at any time. And while we can blame software for not catching up with hardware and where it's going, the problem is fundamental: you can only parallelize your operations so much before you hit a wall. If adding another core doubled the effort, adding another would only add 50%. Another 25%, another 12.5%, and so on. But the cost of adding that core remains constant, if not increasing because of the infrastructure you have to build to support it.