• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Devaka Cooray
  • Knute Snortum
  • Paul Clapham
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Frits Walraven
  • Ganesh Patekar
  • Tim Holloway
  • salvin francis

What is latent semantic indexing?  RSS feed

Ranch Foreman
Posts: 252
Android Java Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

Can someone explain what is latent semantic indexing? Where can it applied?

Sorry, I still new to this.

Posts: 5
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Randy,

Hobson is the real expert on this point of NLP, but in a nutshell... LSI is one method to create a dense vector/mathematical representation of the content of a document in a particular corpus.

For the entire corpus, find the list of unique words.  
For each document in the corpus count up the occurrences of each of those words.  Some will be zero.  
Then use a dimensionality reduction technique such as SVD to get a vector that represents the meaning of that document 'in relation to'all the other docs.

You can then use the same path as above on me documents and then a similarity metric such as cosine distance to find similar documents in the corpus and surface them (in a search query, say).

Hope that helps,
Hang a left on main. Then read this tiny ad:
how do I do my own kindle-like thing - without amazon
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!