• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
  • Tim Holloway
  • Carey Brown
  • salvin francis

What is latent semantic indexing?

Ranch Hand
Posts: 388
Android Tomcat Server Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

Can someone explain what is latent semantic indexing? Where can it applied?

Sorry, I still new to this.

Posts: 7
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Randy,

Hobson is the real expert on this point of NLP, but in a nutshell... LSI is one method to create a dense vector/mathematical representation of the content of a document in a particular corpus.

For the entire corpus, find the list of unique words.  
For each document in the corpus count up the occurrences of each of those words.  Some will be zero.  
Then use a dimensionality reduction technique such as SVD to get a vector that represents the meaning of that document 'in relation to'all the other docs.

You can then use the same path as above on me documents and then a similarity metric such as cosine distance to find similar documents in the corpus and surface them (in a search query, say).

Hope that helps,
You get good luck from rubbing the belly of a tiny ad:
create, convert, edit or print DOC and DOCX in Java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!