Hobson is the real expert on this point of NLP, but in a nutshell... LSI is one method to create a dense vector/mathematical representation of the content of a document in a particular corpus.
For the entire corpus, find the list of unique words.
For each document in the corpus count up the occurrences of each of those words. Some will be zero.
Then use a dimensionality reduction technique such as SVD to get a vector that represents the meaning of that document 'in relation to'all the other docs.
You can then use the same path as above on me documents and then a similarity metric such as cosine distance to find similar documents in the corpus and surface them (in a search query, say).
Hope that helps,
You get good luck from rubbing the belly of a tiny ad:
create, convert, edit or print DOC and DOCX in Java