Whether you use Java or C++ is somewhat irrelevant here. Java has binary trees and linked lists too. You can use the standard libraries, or make your own. I'd probably use whichever language you and/or your team are most comfortable with. For the remainder of this post I'll talk in terms of Java, though you could do similar things in other languages.
LinkedList would probably be my first approach, assuming that the text file is small enough (or RAM is large enough) that the whole thing can be comfortably kept in memory all at once. With LinkedList you'd usually want to access the list using an Iterator, not the List itself. Avoid any method where you specify an index. If that's not possible, then LinkedList may not be a good fit.
A binary tree (e.g. TreeSet) might well be a good option if the file is normally supposed to be sorted in some way, e.g. alphabetically or numerically.
Either way, if the file is too big to fit in memory, one option would be to split it into smaller segments of a more appropriate size, then process the segments one at a time by reading them into a LinkedList or whatever. Or you might just use a database instead if the operations you need to perform start becoming complex.
IF (big if) your text is plain ASCII you might be able to process it as bytes - thus saving the time and memory consuming conversion to characters.
There is one very important question when thinking about the process: can it be done in one straight through pass, only looking at a small chunk of text at any one time or do the later contents affect the treatment of the early contents/
The file will be a 1M+ line ASCII text file of instructions. Each line would then need to be tokenized (basically a CSV) and then executed sequentially.
Joined: Jan 30, 2000
OK, sounds good. After you process one line, you you even need to remember it? It sounds like you may not need a List at all here. What sort of result do you expect to get, and what do you ultimately need to do with it? Write a new file? Put data in a database? Something else? Maybe you can simply write new lines to the output file (or insert new rows to the database) as you go, instead of first putting them in one big List and then reading the List into a file or DB. That way you would avoid needing the memory to store everything in a List at once.
Also, before you spend a lot of time trying to find the fastest, most efficient code possible, it would be a good idea to just write some code that does what you want, as simply as possible, and find out how fast it is. It may well be that it's already fast enough - and if not, you can make measurements to find out which parts, specifically, are making it slow. It's usually much better to improve performance using hard data rather than speculation.
Joined: Feb 12, 2007
All data would need to stay in memory. There will not be that many entries, but their fields will need to be modified and searched very often.
The best way to work with massive amount of data is to use stream processing or process buffered blocks at a time.
However, since the entire file needs to be (can be) placed in memory, maybe it is not that large afterall. It is better to put more focus on designing a data structure that optimizes the speed of the operations you'll need to perform on these data.
With the right data structures, array resizing overhead is just a small concern. [ March 01, 2007: Message edited by: Chu Tan ]
Author and all-around good cowpoke
Joined: Mar 22, 2000
There will not be that many entries, but their fields will need to be modified and searched very often.
Is this a place for a custom class where each instance represents the content of a single line and there are methods to aid the modification and searching?