Using Lucene to index Code

This is an interesting project if you have some spare time to play with this – Lucene is a facinating open source project from the Apache stable that builds and searches full text indexes.

OnJava.com have an article showing how you can use the Lucene engine and API to build a facility to index and search source code. It would be interesting to compare how this is similar/different from the source search facility in Eclipse, as I believe the search facilities in Eclipse are also based on Lucene.

Intro to StAX XML API

DevX.com have an introductory article to using the StAX XML API. Whereas the DOM API loads the entire document into memory and treats it as a tree structure and SAX parses the document one-time on the fly, pushing events to document code when matching nodes are found, StAX takes a different approach, somewhere between both. StAX treats the XML as a stream, and can pull content from the stream on demand, when needed.

The database engine that holds the most relational data in the world, and yet no-one talks about it.

The developer of this database engine claim that more relational data is stored in this database across the world than any other. The largest OLTP database in the world (in terms of data size, is the Land Registry, at 23,101 GB) is also stored using this database (according to the Winter Corporation survey). So is the largest in terms of millions of rows (UPS; 89,621 million).

So what is this database that no-one talks about? The Register’s Developer site has an interesting article laying out these facts and more, and questions why DB2 does not receive more press or advertising from IBM. They suggest that IBM has possibly become complacent and feel that they have no need to advertise or push the product – why should they if they are already the database market leader (in some but not all areas). Seems a dangerous policy to me.