From working on new systems for what I call ‘regular’ businesses, it’s rare nowdays (in the past 6-7 years at least ) to work on anything that isn’t web-enabled with some mix of HTML/CSS/Javascript, implemented using Java server-side, using some RDBMS for data storage and retrieval. What I find interesting is the current mix of technologies being used by some of the more popular web-presences (Google, Yahoo, MySpace, Facebook, Twitter, Flickr etc) that are using technologies other than this standard technology stack to address different problems of running an online service with potentially millions of users, or in the case of Google and Yahoo, providing online search engines with searchable indexes containing millions of websites.
The current technologies that I see showing up more and more include some of the following:
- The MapReduce algorithm was made popular by Google using it to build their search indexes, but now the approach seems to be showing up as the solution to solve everyone’s massively parallel data processing needs. Apache Hadoop is a Java implementation of a MapReduce framework, and checking out their page listing their users (several pages long), the impressive list of usage examples ranges from clusters of a few machines to clusters with 1000s of machines and 1000s of CPUs with upto 100 terrabytes of data. Those stats are something that I’ve personally never come across implementing typical business systems, even ones with 1000s of users – definitely some interesting problems being solved in these areas.
- Memcached shows up when reading about the implementation of most online services, in the context of avoiding database hits and keeping as much of the frequently accessed data in memory as possible as a performance optimization.
- There’s no shortage of ‘my tried and tested old school technology is still better than your new fangled web technology’, and this article commenting on some misconceptions about what MapReduce is and is not is interesting reading too.
What’s clear is there’s plenty going on out on the edge of massive multi user online services right now, and plenty of interesting developments to keep an eye on.