aspose file tools*
The moose likes Beginning Java and the fly likes Indexed text files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Indexed text files" Watch "Indexed text files" New topic
Author

Indexed text files

Jan-Henrik Clausen
Greenhorn

Joined: Jun 28, 2010
Posts: 13
I am very new to Java (coming from Delphi). In Delphi I wrote routines to handle an indexed textfile. I am now looking for a roughly similar set of routines in Java or for tips how to implement this. I do not want to use anything outside standard Java, especially not databases.

Here a rough outline of how the Delphi routines worked:

There is a textfile. It contains plain text (8-bit ASCII chars). The "@" char at the first position in a line means that the line is not part of the text itself but a command (or commands) that controls the indexing. In the simplest case the following chars after @ represent the string which indexes the text which follows until end of file or the next occurence of a @-line (whichever comes first).

Example for a textfile:

@Greetings
May I heartily express my best wishes to you, Earthlings!
@Message
I am happy to provide you with a vast improvement in hyperspace road traffic.
As you know these improvements are fundamental for our prospering
galactic economy.
@Explanation
I am sure that you are as overwhelmed as I am about these prospects and that
you will gladly contribute your planet as source of debris needed to outfit the
road curb of this new way to happiness, peace, wealth and overall welfare.



Now I hope for something like that:



Where k contains the text after the key.

Example:

GetIndexedTextEntry ("Message") should return:

I am happy to provide you with a vast improvement in hyperspace road traffic.
As you know these improvements are fundamental for our prospering
galactic economy.


The method should also check if a file with the index exists (name of the file is like the textfile but with the extension ".tix") and if the date of last change of the index file is newer or at least the same as the date of last change of the text file which is to be indexed. If no index file exists an index file will automatically generated and contains alphabetically sorted pairs of keys (fixed length, 20 chars) and byte position of the first char of the indexed text chunk.

I hope you can help with the transfer of this quick, dirty and very useful text file management routine...
Sona Patel
Ranch Hand

Joined: Mar 30, 2009
Posts: 75
Hi Claudius,

You can do like this - read indexed file and store all the values into a hash table. hash table stores values a key-value pair. So store values into hash table as key="Message" and value="I am happy to provide you with a vast improvement in hyperspace road traffic.As you know these improvements are fundamental for our prosperinggalactic economy. "

While retrieving , you can retrieve value based on key. Read more about Hashtable. It will help.
Jan-Henrik Clausen
Greenhorn

Joined: Jun 28, 2010
Posts: 13
Sona Patel wrote:Hi Claudius,

You can do like this - read indexed file and store all the values into a hash table. hash table stores values a key-value pair. So store values into hash table as key="Message" and value="I am happy to provide you with a vast improvement in hyperspace road traffic.As you know these improvements are fundamental for our prospering galactic economy. "

While retrieving , you can retrieve value based on key. Read more about Hashtable. It will help.


Thanks for the answer. I thought about that, but the disadvantage is that I have to keep all the text in the memory of my computer. The text file which I aim at indexing is about 1.5 MB length with about 1000 text chunks. When I read the specifics correctly that should cause a problem on several platforms, at least with the JVM running on my Windows 7 computer.

That is the reason I need something like the Hashtable operating on the hard disk. The Delphi routines I outlined above do exactly that and they do it quite fast. May be I should rewrite them myself and think about that as training for Java (which I really need anyway).
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14156
    
  19

Welcome to JavaRanch!

You can read a text file line by line in Java in the following way:

As Sona says, storing the data in a hash table sounds like a good idea. Use a HashMap for this (don't use class Hashtable, it's a legacy collection class that has been superseded by HashMap):

I'm not going to give you a complete solution (here at JavaRanch we'd like to help people to learn Java, not to hand out complete solutions), but you'd have to do something like this: inside the loop, check if the line starts with an @; if it does, start collecting lines that form the content of a new entry. Also, if the line starts with an @, you'd have to store the previous entry that you were collecting data for (if there is one) in the map, and at the end of the file, you'd have to store the last entry.

Please ask more questions or let us know if you've solved the problem!

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14156
    
  19

Claudius Calvus wrote:Thanks for the answer. I thought about that, but the disadvantage is that I have to keep all the text in the memory of my computer. The text file which I aim at indexing is about 1.5 MB length with about 1000 text chunks. When I read the specifics correctly that should cause a problem on several platforms, at least with the JVM running on my Windows 7 computer.

1.5 MB is not a lot of memory on today's desktop computers, this should easily fit in memory.

However, if you really don't want to keep the data in memory, then instead of storing the data itself you could store the position of the data in the text file, and when you need the text, open the text file again, jump to that position and read it. Unfortunately, classes like BufferedReader and FileReader don't have methods to tell you at which position in the file they are exactly, so this might be harder to implement (you'd have to keep track of the position yourself).
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1763
    
    7

If you don't want to read the entire file contents into memory, you should have a look at the documentation for java.io.RandomAccessFile.

Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
Jan-Henrik Clausen
Greenhorn

Joined: Jun 28, 2010
Posts: 13
My original intent was to save myself this recoding in the hope somebody has already written routines for "random access text files". I didn't find anything like that with Google, so I just started to write it myself now.

Thanks for all the tips, I hope I can use them.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Claudius Calvus wrote:My original intent was to save myself this recoding in the hope somebody has already written routines for "random access text files". I didn't find anything like that with Google, so I just started to write it myself now.

You don't actually want generic routines for "random access text files" though--you're looking instead for "text files indexed in a way nobody else does", which is a very different thing. Random file access already exists. All you need to do is create a map of keys and offsets. But for such a small amount of data, I'd just keep it in memory, because there's no real reason not to.
Jan-Henrik Clausen
Greenhorn

Joined: Jun 28, 2010
Posts: 13
David Newton wrote:
Claudius Calvus wrote:My original intent was to save myself this recoding in the hope somebody has already written routines for "random access text files". I didn't find anything like that with Google, so I just started to write it myself now.

You don't actually want generic routines for "random access text files" though--you're looking instead for "text files indexed in a way nobody else does", which is a very different thing. Random file access already exists. All you need to do is create a map of keys and offsets. But for such a small amount of data, I'd just keep it in memory, because there's no real reason not to.


Yes, you are right. I did not look for generic routines (although being a newbie I wasn't sure if something like that existed). I looked for routines who can index a text file and then get a chunk out of it as described in my first post. Thanks for all the hints, I have now written my own class "rndacctext" for it and it works quite nicely. If anyone is interested, drop me a message...

Oh, and it IS a time-saver on my 1 year-old laptop. Even with something like (only) 1+ MB text. Quite a time-saver in fact. Java is not very good in searching through 1 MB of text for a certain phrase.

I don't know exactly the customs how you handle threads ready to close. This is one ^^.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

What do you mean Java isn't "good" at searching through a text file? It's only minorly less convenient than in a scripting language if you're using reasonable libraries, and generally faster after JVM startup.
 
Don't get me started about those stupid light bulbs.
 
subject: Indexed text files