This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Other Open Source Projects and the fly likes Taming Text Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Taming Text " Watch "Taming Text " New topic
Author

Taming Text

paul nisset
Ranch Hand

Joined: May 13, 2009
Posts: 165
Hi ,
The book uses a question answer look up system as an example project with Mary Shelly's Frankenstein as the source text.
Are the technologies and techniques in the book applicable to a web based search engine ?
I'm thinking in terms of amount of input text and performance issues/limitations.

Thanks,
Paul
Grant Ingersoll
Author
Greenhorn

Joined: Jan 03, 2013
Posts: 8
Hi Paul,

The Frankenstein example in the first chapter is really just a toy to get people thinking about the problem space. Chapter 8 contains a system that is a few levels up, but still not production ready, IMO. I would suggest that the concepts and basic principles are applicable for a web-based engine, but there is a whole lot more engineering and capabilities that need to go into a system in order to make it effective in that area. I would say, it is a bit closer to ready if you are looking for a bit smaller scale, but you still have a lot of work to do, as the example really only handles simple fact-based questions and only returns a window around the candidate answer.

As for performance at web scale, you often will need leverage some type of distributed text analysis pipeline up front to handle the incoming documents.

HTH,
Grant
paul nisset
Ranch Hand

Joined: May 13, 2009
Posts: 165
Thanks.

I was thinking about different use cases for text search applications.
It is a particularly big problem when it comes to company documentation . The answer is in there ....somewhere.
Grant Ingersoll
Author
Greenhorn

Joined: Jan 03, 2013
Posts: 8
It probably is closer to ready for company documentation, intranet, but still a non-trivial exercise. What do you have in place for search? I'd probably start there first.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Taming Text
 
Similar Threads
confusion regarding mitidimentional array please help
New to Java Micro Edition
Can I use Hibernate in my RFID based attendance tracking application?
Is it better to wait for EJB 3.0 exam?
Doubts in Java