File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes End of file indication Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "End of file indication" Watch "End of file indication" New topic
Author

End of file indication

Vlad Rabkin
Ranch Hand

Joined: Jul 07, 2003
Posts: 555
Hi ,
There is record number in the header of file in my assignement.
To cash the database I have to loop all record.
There are two possibilities to indicate end of file:
1. EOFException
2. a)get size of FileChannel.size(), calculate number of record ( I know header length and record length.
I like 2. better, but since I cache records using DataInputStream instead of FileChannel (I use FileChannel only to write in DB), FileChannel is not yet opened at the moment I cache the database.
Should be Ok to use EOFException, or it is a bad style?
Tx,
Vlad
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Personally I think using the exception is bad style unless you can't find an alternative. Note that the File class has a length() method too, so you can use that to get the size easily without a FileChannel.


"I'm not back." - Bill Harding, Twister
Philippe Maquet
Bartender

Joined: Jun 02, 2003
Posts: 1872
Hi Vlad,
I have not such a number of records in the header of my file. But if I had one, I think that I would throw what I called an InvalidDataFileException is that number is incompatible with the normal file size as computed.
I would do that for consistency with my own current design : when opening the database, such an InvalidDataFileException is thrown if one of those cases occurs :
  • the file signature is invalid;
  • the record length as computed from individual fields lenghts read from the header is different from the record length mentioned in the header too;
  • the record length is incompatible with the current Charset.


  • Best,
    Phil.
    Vlad Rabkin
    Ranch Hand

    Joined: Jul 07, 2003
    Posts: 555
    Hi Jim and Phil,
    I have forgot to write "no" in my text. So, I wanted to say:
    There is NO record number in the header of file in my assignement. /QUOTE]
    Sorry.
    Ok, I what Jim've said is exactly what I was afraid of.
    Ok. Do you think it is Ok that to do the following in constructor of Data(Singletone):
    1. Open File (File file = new File("...");
    2. Create fileChannel for writing):

    3. Calculate size of the file and then number of records (I need it to loop records till the end to cach them later)
    4. Create DataInputStream read and cache all record and close DataInputStream.
    Do you think it is acceptable?
    Tx,
    Vlad
    [ September 13, 2003: Message edited by: Vlad Rabkin ]
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Vlad,
    I have forgot to write "no" in my text. So, I wanted to say:

    Forget my previous post too then !
    4. Create DataInputStream read and cache all record and close DataInputStream.

    Why don't you use your FileChannel to read records ?
    Personally, I use RandomAccessFile directly to read the header (no need for DataInputStream), then I get the FileChannel to read the records.
    Best,
    Phil.
    [ September 13, 2003: Message edited by: Philippe Maquet ]
    Bharat Ruparel
    Ranch Hand

    Joined: Jul 30, 2003
    Posts: 493
    Hello Vlad,
    You wrote:

    Calculate size of the file and then number of records...

    That is what I do and therefore I don't have to deal with EOF marker or exception.
    Regards.
    Bharat


    SCJP,SCJD,SCWCD,SCBCD,SCDJWS,SCEA
    Vlad Rabkin
    Ranch Hand

    Joined: Jul 07, 2003
    Posts: 555
    Hi,
    Bharat

    That is what I do and therefore I don't have to deal with EOF marker or exception

    Ok. You all have convinced me not to use EOFException!
    Phil
    I use RandomAccessFile directly to read the header (no need for DataInputStream), then I get the FileChannel to read the records.

    Well, I used DataInputStream for Header to avoid problem with requirements, and DataInputStream for record jus not to use another stream or RAF. I know you have wrote some time before that Sun just said that header is in DataInputStream format, nobody restricts to use RAF for reading.
    As Max gave idea to use nio I use FileChannel for writing records (at least because it is faster, but I was to lazy to rewrite code for reading, moreover, there is one small advantage to use DataInputStream/RAF instead of FileChannel for reading recods:
    1) I have first to initialize array to read record. If the record is marked as deleted I still have to read whole record first, or I can first initialize array with one element just to read a flag and then if it is not deleted initialize second array to read the rest. DataInputStream/RAF allows me to read a flag without initializing an array. By the way: what would you preffer? I don't remember, but I guess Andrew said it is not a problem to read whole record instead of reading first a flag, since performance for read 1 byte and 156 bytes is almost the same.
    2) What for should use nio for reading? Performance? IOInterruptedException? Atomicity? I don't need it, since I do it only one time when the server is started to cache the database...
    Phil, I hate it, I am lazy and find always execuces not to do something, and you come with your ideas, which make me refactor some things !
    Best,
    Vlad
    [ September 13, 2003: Message edited by: Vlad Rabkin ]
    [ September 13, 2003: Message edited by: Vlad Rabkin ]
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11460
        
      94

    Hi Vlad,
    Calculating number of records based on file size gives you an additional bonus as well: you have an extra check that the file is reasonable (not corrupt).
    Andrew said it is not a problem to read whole record instead of reading first a flag, since performance for read 1 byte and 156 bytes is almost the same.

    I would go further: I would say performance in nearly every modern OS will be the same. Most operating systems have block sizes in excess of 512 bytes. Some operating systems have considerably larger block sizes (necessary when you start talking about hard drives in excess of 1GB). So when your program reads one byte, the operating system actually reads one block. It has already done the slow bit of getting all the data off the hard drive. With read ahead caching (which should really kick in if you are reading the entire file sequentially to load your cache) you may find that the operating system is reading 4 or 5 blocks at a time.
    Regards, Andrew


    The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Vlad and Andrew,
    Vlad:
    Phil, I hate it, I am lazy and find always execuces not to do something, and you come with your ideas, which make me refactor some things !

    Sorry about that Vlad, but while you'll come with questions, you'll take the risk that we come with answers. Before I come back to your last post, I have a mistake to correct in my first post here, in relation with what Andrew just wrote :
    Andrew:
    Calculating number of records based on file size gives you an additional bonus as well: you have an extra check that the file is reasonable (not corrupt).

    Phil (first post):
    I have not such a number of records in the header of my file. But if I had one, I think that I would throw what I called an InvalidDataFileException is that number is incompatible with the normal file size as computed.
    I would do that for consistency with my own current design : when opening the database, such an InvalidDataFileException is thrown if one of those cases occurs :
    - the file signature is invalid;
    - the record length as computed from individual fields lenghts read from the header is different from the record length mentioned in the header too;
    - the record length is incompatible with the current Charset.

    It contradicts my post (because I forgot a little bit what I did in the db part of the assignement), but I don't check that type of corruption, and I did so on purpose :
    My open() method may throw an IOException (InvalidDataFileException is different), if and only if an IOException is thrown while I am reading the header part. While reading the rest of the file (may be to feed my cache, but for other purposes too), I catch IOExceptions thrown and ... ignore them, to avoid that individual corrupted records prevent us to use the database as a whole. As those records don't go in the cache, they will automatically be read from file if they are later accessed, throwing a DataIOException. BTW, that's a side pro of a cache with an optional maxSize.
    Now, back to your last post, Vlad.
    Vlad:
    1) I have first to initialize array to read record. If the record is marked as deleted I still have to read whole record first, or I can first initialize array with one element just to read a flag and then if it is not deleted initialize second array to read the rest. DataInputStream/RAF allows me to read a flag without initializing an array. By the way: what would you preffer? I don't remember, but I guess Andrew said it is not a problem to read whole record instead of reading first a flag, since performance for read 1 byte and 156 bytes is almost the same.

    First, I agree with what Andrew just wrote on the subject. But I would add this : if you read the deleted flag separately from the record, you are sure to be slower because your two reads (two in 90% of the cases ?) will be translated by the JVM in two calls to the OS. So if, as Andrew explain, reading one record is as fast as reading one byte, reading the file with your technique will be as fast in only one case : when all records in the file are deleted.
    Vlad:Well, I used DataInputStream for Header to avoid problem with requirements, and DataInputStream for record jus not to use another stream or RAF. I know you have wrote some time before that Sun just said that header is in DataInputStream format, nobody restricts to use RAF for reading.

    Sorry, but there is no issue with our requirements by using RAF to read the header. Here is what is stated in the instructions about that (BTW notice the english mistake in the sentence ) :
    All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes.

    "... use the formats ..." doesn't mean that you must use those classes ! DataInputStream implements DataInput, DataOutputStream implements DataOutput, and RAF implements both interfaces. So RAF is compatible with both mentioned classes.
    Vlad:
    As Max gave idea to use nio I use FileChannel for writing records (at least because it is faster, but I was to lazy to rewrite code for reading, moreover, there is one small advantage to use DataInputStream/RAF instead of FileChannel for reading recods:

    The "small advantage" is not one, as Andrew explained it about reading blocks. But you mention also the fact that you save an array allocation if a record is deleted. I don't understand that, because as records have all the same size and as you store them in your cache converted in a array of String values, you don't need more than one record bytes array to read all records. I use NIO for reading, and that's what I do in open() : I allocate only one ByteBuffer, reused all over again for each read record. Anyway, it's just a performance issue at worst, and performance is not important in this assignment, so you should be OK IMO.
    Now what I don't like that much (why this euphemism here, Phil ? ) is this : you mix both technologies. I would expect a choice to be made between "old" IO and NIO and you chose to use both, IO for reading and NIO for writing. Of course it works, but it seems hard to defend IMO. Remember what Max write about FileChannels in his book (p 283) among other advantages of them : "FileChannels represent a two-way connection to a file, allowing you to read and write to the file using that single connection". You use that two-way connection only one-way, with the additional work of filling the other-way part of the job with another technology.
    Best,
    Phil.
    [ September 14, 2003: Message edited by: Philippe Maquet ]
    Tony Collins
    Ranch Hand

    Joined: Jul 03, 2003
    Posts: 435
    Is mixing nio and standard io a problem. I myself use old io to read thhe header and nio for the records.
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Tony,
    Is mixing nio and standard io a problem. I myself use old io to read thhe header and nio for the records.

    As I do. You are not mixing technologies in the same area, that's the big difference : reading the file header and managing reads/writes of records are two different fields of operation, so we should be OK IMO. BTW, using FileChannels to read the file header would just add complexity (to do the conversion between bytes and the numeric primitive types we have to read), with no benefit.
    Best,
    Phil.
    Vlad Rabkin
    Ranch Hand

    Joined: Jul 07, 2003
    Posts: 555
    Hi,
    Andrew:
    when your program reads one byte, the operating system actually reads one block.

    Agreed.

    Phil:
    First, I agree with what Andrew just wrote on the subject. But I would add this : if you read the deleted flag separately from the record, you are sure to be slower because your two reads

    Agreed.

    So RAF is compatible with both mentioned classes.

    Agreed.

    I allocate only one ByteBuffer, reused all over again for each read record.

    Agreed.
    FileChannels represent a two-way connection to a file, allowing you to read and write to the file using that single connection".

    So what? DataInputStream not, but RAF makes it also.
    Phil, you can't say that you use only nio, since you read file header with io (RAF)...
    Ok, I agree on all this issues and will refactor my Data class.
    Thanx for all your suggestions!
    Best,
    Vlad
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Vlad,
    So what? DataInputStream not, but RAF makes it also.

    Agreed.
    Phil, you can't say that you use only nio, since you read file header with io (RAF)...

    Agreed (but just to be nice with you )

    Ok, I agree on all this issues and will refactor my Data class.

    Sorry again about it...
    Thanx for all your suggestions!

    You're welcome. I just wonder whether you are sincere...
    Best,
    Phil.
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Phil, you can't say that you use only nio, since you read file header with io (RAF)...
    Also, you can't even create a FileChannel without first creating one of the IO classes - RAF, FileInputStream, or FileOutputStream. (In the latter two cases the FileChannel is not bidirectional by the way.) NIO is designed to be used alongside IO classes, whichever you find convenient for a given task. For me, the FileChannel is unquestionably faster; the only reason not to use is is that it may be seen as more complex. For reading a header, it's easier to just use IO/s DataInput methods and ignore the FileChannel. But once you've gone to the trouble of figuring out how to write records with the FileChannel - well, reading is almost exactly the same, just a different method call. There's really no added complexity at this point, as you've already crossed that threshold with the write. So why not use FileChannel for read as well, as it's faster?
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    I allocate only one ByteBuffer, reused all over again for each read record.
    And you've both got Data-level sync which prevents multiple threads from using thins concurrently, correct? For comparison, with Record-level sync I need to allocate() each ByteBuffer as new. Which still seems to be pretty darn quick, I think because the system is good at caching these behind the scenes. Not a big deal, but it's something that could have caused trouble if I wasn't paying attention.
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Jim,
    And you've both got Data-level sync which prevents multiple threads from using thins concurrently, correct?

    Yes correct, but I only use that little optimization from within my open method (where I read all records). It's correct because open() guards itself against all public methods of my Data class (all other callers block while open is in progress). For a typical read from file (in the case the record is not in cache), a temporary (and local owned) ByteBuffer is allocated. In fact, I use overloaded read() methods, one of them receiving a preallocated ByteBuffer as parameter.

    For comparison, with Record-level sync I need to allocate() each ByteBuffer as new.

    I don't understand that. As I wrote above (in one of the previous posts), as Vlad you store your records in their converted form, right ? So why couldn't be your preread ByteBuffer preallocated just once and reused ? Maybe I missed something about your own design.
    Which still seems to be pretty darn quick, I think because the system is good at caching these behind the scenes.

    You are probably right and I am probably too much infuenced by my quite old worries about memory fragmentation In the early nineties I had to work on a C++ Database project (a db layer on top of bTrieve files), with interfaces with VB and later with Delphi. In that project I had to write my own "heap" manager to boost performances. I am a java newbie, I don't know what really happens behind the scene, but as there is no miracle in software, I just suppose that keeping good reflexes as far as optimization is concerned is still good practice. In an application of the DB C++ project I mentioned, an application-writer collegue came one day to me with a crazy way (anyway in my mind) of querying the DB. I tried to explain to him why he was wrong, and he replied something like (free translation from french) : "There is a powerful machine behind the scene anyway to do the job...". It was in 1992, and you can remember how machines were "powerful" in those old times...
    Best,
    Phil.
    [ September 14, 2003: Message edited by: Philippe Maquet ]
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    [Jim]: For comparison, with Record-level sync I need to allocate() each ByteBuffer as new.
    [Philippe]: I don't understand that. As I wrote above (in one of the previous posts), as Vlad you store your records in their converted form, right ? So why couldn't be your preread ByteBuffer preallocated just once and reused ? Maybe I missed something about your own design.

    The stored records keep their data in String[] arrays, so no ByteBuffers are used in storage. But for a write() we need to put that data in a ByteBuffer bfore writing. In my design, if there are 100 client who are trying to update 100 different records simultaneously, they all can use the same FileChannel simultaneously, writing to different sections. No additional sync is needed. If there were a single shared ByteBuffer, we'd have to sync on that buffer (or on something else at Data level rather than Record level) for the duration of the write, to ensure that no other thread uses the same buffer at the same time. Which would sort of defeat the point of Record-level locking in the first place. It seems to be a bit faster to let each thread allocate its own ByteBuffer as needed, and then they don't need to contend for any more sync locks on high-level shared objects. Otherwise, I might as well just sync on the whole FileChannel for the duration of the buffer loading + write(). Which is also not such a horrible thing really, especially since write() is pretty uncommon compared to read(). But it's not what I chose to do.
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Understood and agreed ! I forgot your concurrent writes design.
    Best,
    Phil.
    Vlad Rabkin
    Ranch Hand

    Joined: Jul 07, 2003
    Posts: 555
    Hi Jim and Phil,
    I agree with Jims point, but Phil has right we have a bit different design:
    1) I and Phil don't allow concurrent write at all
    2) I can allocate ByteBuffer as Phil suggested, because I cache the database only one (So, I actually read only one time in servers live).

    Best,
    Vlad
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: End of file indication