Become.com’s system is a massive web search engine that crawls the web for goods being sold and offers comparisons between like products. The crawler engine is written in Java and indexes more than 3 billion web pages and generates index data of over 8 terrabytes of data over 30 distributed servers during a 7 day run.
The crawler code is written in 39,000 thousand lines of code running over 40 to 50 machines, with 180Gb of total allocated memory and running upto 5000 threads.