I have a very simple code snippet which is counting the number of lines in a text file by sequentially reading the lines a BufferedReader and incrementing a counter. There's no problem reading lines with text, or even single blank lines between paragraphs, but once the readLine() method encounters multiple blank lines together it *sometimes* fails to count one or more lines, according to a pattern I can't quite determine. Does anyone know what sort of checks I need to perform to make sure that all lines (even the blank ones) are counted until the EOF?
Joined: Mar 04, 2003
OK, it just didn't make sense, so I checked out the test file in another program ... and lo and behold, no problem. D'oh! Pity about having already made a public fool of myself. Ah well, probably won't be the last time. Real questions to follow soon, hopefully.
It's possible that the problem is that differnt files have different types of line separators, and the number of lines depends on what you consider a line separator. On Unix \n is a separator, while on Windows it's \r\n, and on Mac I think it's \n\r maybe. BufferedReader and many programs have a flexible interpretation which may include each/all of these as lines. So consider something like: "foo\r\n\nbar" If you look at this in Notepad, it's two lines (with a funny box just before "bar", indicating the lone \n. In Unix, it's three lines (middle line blank, with a funny char after "foo".) On a Mac, it may look like one line. Using BufferedReader, it's three lines, middle line blank, no funny chars. OK, so this is complex, I know. What it boils down to is, you need to consider whether this is worth worrying about, and if so, what exactly do you want to consider the definition of a line? Note that the BufferedReader API for readLine() explains exactly how that method works. If you want to use a different definition, you'll have to implement it yourself.