| Author |
Accessing a File
|
alan partridge
Greenhorn
Joined: Jan 08, 2002
Posts: 4
|
|
I am using dictionary file consisting of thousands of lines, each line containing a word. In my program I want to randomly access just one of these words, I am hoping that someone could give me the most effecient solution because at the moment I BufferedReader.readLine(); the whole lot into an ArrayList then randomly select from that and then ditch the object.
|
 |
Jim Yingst
Wanderer
Sheriff
Joined: Jan 30, 2000
Posts: 18652
|
|
Hmmmm... I assume that these lines are variable-length, and you don't know in advance how many lines there are. This would seem to make it necessary to read the who file through once in order to get a line count at tleast - and either store some stuff in memory as you go, or be willing to re-read from the beginning to count up to a certain randomly-generated line number. Neither seems particularly fast or elegant. What about this: use File.length() to get the number of bytes in the file. Generate a random number in this range. Open a FileInputStream and use skip() to get to the desired offset. Open a BufferedReader wrapping an InputStreamReader wrapping the FileInputStream, and read a line twice. The first line read is just to move to a line boundary, since the randomly-generated offset has most likely put you in the middle of a line. The second line read will be a normal line. If either readLine() comes back null, then go back tothe beginning of the file instead. Close all streams when you're done. This is about the fastest, lowest-memory-overhead method I can think of for this. The only problem is, the probability of selecting a given word is approximately proportional to the lenght of that word. If that's not acceptable, you'll have to try something else. How often will this procedure be performed on a given file? If it's more than once, it's probably worthwhile to store some info about the file, to facilitate subsequent accesses. Line count is of course highly useful - also, perhaps a sort of limited index which stores the offsets of every 10th word, so that to find line 738 you look up the position of line 730, then read 8 lines forward (as opposed to reading all 738 lines). Naturally "every 10th word" could be any other number - you would probably want to make that configurable, to optimize it later.
|
"I'm not back." - Bill Harding, Twister
|
 |
alan partridge
Greenhorn
Joined: Jan 08, 2002
Posts: 4
|
|
Thank you Jim, I used the first of your ideas and am happy with the way it works private String getRandomWord(){ File file = new File("words.txt"); String str = null; long ran = 0; try{ if(!file.exists()){ System.out.println("The words.txt File does not Exist\n"+ "it must exist in the same directory as this program"); System.exit(1); }else{ long randomRange = file.length(); System.out.println(randomRange); do{ ran = ranGen.nextLong() % randomRange; ran = (ran < 0)? -ran: ran; FileInputStream fis = new FileInputStream(file); BufferedReader br = new BufferedReader(new InputStreamReader(fis)); fis.skip(ran); br.readLine(); str = br.readLine(); fis.close(); br.close(); }while(str == null); } }catch(IOException io){ io.printStackTrace(); } return str; } alan [ February 14, 2002: Message edited by: alan partridge ]
|
 |
 |
|
|
subject: Accessing a File
|
|
|