The PC just had it's 30th birthday, yet its own father and Steve Jobs have all been claiming it's demise that we are now in the 'post-PC' era of computing. At the same time, AMD and ARM have announced a 'heterogenous computing' initiative. Intel announce the SCCC, a 48 core chip connected with a high speed bus. PC's have evolved with Thunderbolt, a 10gbit/s serial interface, and PCI express devices (up to 128 Gbit/s), most commonly utilized by GPUs, the modern day math coprocessor. Furthermore, with SIMD, SSE and the like, AMD and Intel are differentiating their CPUs with hardwired functionality performing specific tasks at break-neck speeds. Multi core CPUs from AMD and Intel also share HyperTransport and QPI to provide incredible intercore and memory bandwidth, further adding to the buses used in modern computing. And finally, with the enhancements in silicon fabrication, primarily pursued by Intel, as well as the integrated SOC primarily pursued by ARM, and copied by AMD in their APUs we have many patterns developing. Silently, the networking side is also keeping up, with 10-gigabit Ethernet matching the speed of Thunderbolt's serial interface and both bus technologies offering remote direct memory access.
To an old hat technologist like me, there are two observations I've made. Firstly, the PC is anything but dead. With cloud computing and virtualization being all the rage to ensure that big iron hardware with their lucrative profit margins get sold is being seen through by buyers stacking up many cheaper machines - PC's - doing the same job. The second thing I've observed is that the humble PC has metamorphosed into a completely different beast than it was 30 years ago. Even the x86 instruction set has moved on, and modern Intel chips can't do 8 and 16 bit operations without some heavy software emulation (not that it matters, they are still a lot faster than the old chips). More importantly though, all of the advancements in the PC give off more than a fair whiff of the Transputer. Heck even the 'new' iWARP (Intel Wide Area RDMA Protocol) is not fundamentally very different to iWarp (Intel's answer to Inmos' Transputer).
Distributed computing, or nowadays to use it's more trendy name, grid and cloud computing, has always been and always will be the overriding best architecture to use for any system - be it a software architecture or a hardware one, or both (appliance) architecture. With enhancements in distributed computing, primarily in the file system and database area as championed by Google (BigTable, GFS) and Apache (Hadoop), two of the most difficult applications to make distributed have been cracked, and we see the results running havok powering any of the mega sites like Facebook, Google and the like.
The problem is that these hardware giants in AMD, Intel and ARM don't want to unlock their ability to pursue vendor lock-in options by introducing hardware ingredients specific to the platform, but at the same time don't want to upset the golden goose that is the PC. The problem is that the PC is a system architecture that harks back to the original 1980 IBM PC, a cheap computer that has grown in popularity since it's introduction because it was an affordable independent computer. With the web, a new lease of life has been given to the client/server architecture of the mainframe, and extremely large monolithic servers sending information to a thin client piece of software, the browser. However, the monolithic server has also had to evolve and has itself become a farm of cheaper PC's working together to perform the computing requirements of the client - and the client is still a fat client, as smartphones and PC's can do more themselves than ever before.
Distributed computing therefore is necessary to power the 'cloud' of clients and servers alike. But the architecture remains resolutely the independent PC.
I think it's time that the silicon vendors standardise around 3 extremely fast buses: an open Hypertransport/QPI/PCI express interface that like PCIe allows for multiple channels to be used in parallel to provide the bandwidth necessary for low latency, highly parallel and high performance devices, such as memory and GPU's. We also need a serial interface, perfectly encapsulated in Thunderbolt, and finally we need a WAN bus, also perfectly encapsulated in 10 and 100Gbit Ethernet. All three buses share two things in common: they provide at least 10 GBps bandwidth, and allow for direct memory access from a remote device - the necessary requirements for a group of PC's to share resources across the boundary of a single computer.
What next then? The OS is necessary, and RIM's QNX is a perfect example of how an OS can share resources across a network of machines (but linux, VxWorks and many other OS's can also be configured to do so just as well - Windows on the other hand would need quite a bit of tweaking), or a network of cores within a machine. An OS needs to arbitrate resources available to it, and the 'cluster' OS aggregates the resources available on an entire network with any given node on that network. It is important to for an OS to see the entire network of hardware resources as one homogenous machine - whether the 'machine' is one box with multiple modules of CPU's, RAM, GPUs or whether it's a network of machines interlinked with Thunderbolt or Ethernet.
Lets hope the researchers see this and usher evolve the PC into a true new Transputer. This would mean that the hardware we purchase is simply 'compute nodes', and if we need more power, we simply add more nodes, and scale up and down as the application in hand requires. And that's proper computing.