File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Fast indexing / searching of a text file

 
jay vas
Ranch Hand
Posts: 407
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi : I am writing programs that read data from massive text files.
What is the best way to do this in java ? Is indexing a possibility and if so what is the way to jump from one index to another ? Thanks, Jay
 
Nitesh Kant
Bartender
Posts: 1638
IntelliJ IDE Java MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah definetly indexing is a way to reduce the amount of time to fetch a record.
I am not sure what are your requirements but Apache lucene is a free text search engine that you may be interested in.
This article gives an insight into how to use a RandomAccessFile to build a small database. Although, it may not fit perfectly into your requirement but may give you a headstart about indexes and accessing records using indexes.
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Reading data from a file is something different than searching an index of the file, because the index typically does not contain the full text of the indexed documents. So whether an index would help depends on what exactly you need to do with the text.

I don't understand what you mean by "jump from one index to another" - random access of the file contents?
 
Nitesh Kant
Bartender
Posts: 1638
IntelliJ IDE Java MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf: the index typically does not contain the full text of the indexed documents.

True, but the index will typically give me the record pointer, isnt it?
So, if i have indexed a text file to give me record pointers & record length for records containing a particular value for the indexed field, i can quickly retrieve the record from the file. isn't?

Ulf:I don't understand what you mean by "jump from one index to another" - random access of the file contents?

I assumed this! Its worth while getting this confirmed.
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
the index will typically give me the record pointer, isnt it? So, if i have indexed a text file to give me record pointers & record length for records containing a particular value for the indexed field, i can quickly retrieve the record from the file. isn't?


It is possible to to create an index like that. But that may or may not address the underlying problem. In particular, we don't know if there's a notion of structure or records within the files. That's why I asked the original poster for clarification what he's trying to do.
 
Nitesh Kant
Bartender
Posts: 1638
IntelliJ IDE Java MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf:That's why I asked the original poster for clarification what he's trying to do.

Oh yeah absolutely, your question was perfectly valid. I was just confirming my understanding
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic