How do you determine what information to retrieve? By searching. From the Lucene home page:
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
I think that page is somewhat misleading, as it lumps together various pieces of software that are related to searching. For example, it features Lucene (as a search engine), Solr (a search server based on Lucene) and Nutch (a web spider built on top of Solr). Not all of them are comparable to each other.