The current technologies that I see showing up more and more include some of the following:
- The MapReduce algorithm was made popular by Google using it to build their search indexes, but now the approach seems to be showing up as the solution to solve everyone’s massively parallel data processing needs. Apache Hadoop is a Java implementation of a MapReduce framework, and checking out their page listing their users (several pages long), the impressive list of usage examples ranges from clusters of a few machines to clusters with 1000s of machines and 1000s of CPUs with upto 100 terrabytes of data. Those stats are something that I’ve personally never come across implementing typical business systems, even ones with 1000s of users – definitely some interesting problems being solved in these areas.
- Memcached shows up when reading about the implementation of most online services, in the context of avoiding database hits and keeping as much of the frequently accessed data in memory as possible as a performance optimization.
- There’s no shortage of ‘my tried and tested old school technology is still better than your new fangled web technology’, and this article commenting on some misconceptions about what MapReduce is and is not is interesting reading too.
What’s clear is there’s plenty going on out on the edge of massive multi user online services right now, and plenty of interesting developments to keep an eye on.