wood burning stoves 2.0*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes NX: (HTL) FileChannel & Threads Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "NX: (HTL) FileChannel & Threads" Watch "NX: (HTL) FileChannel & Threads" New topic
Author

NX: (HTL) FileChannel & Threads

james render
Ranch Hand

Joined: May 08, 2003
Posts: 72
Trying to reach a decision about my file access.
Should I keep one reference to my fileChannel and leave it open, for all threads to share (thus having to deal with thread synchronisation).
Or should I open and close a connection to the file as and when required in methods to prevent threads stepping on each others toes?
I know that it would be more efficient to keep one reference, but am I right in thinking that it creates more issues than it solves (threads and shutdown I'm specifically thinking of)
I notice in the sample project with J2SE Developer Exam book (Denny's DVD's) that connections are made as and when required.
I worry about deadlock!


[SCJP][SCWCD][SCJD]
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
I suggest establishing a FileChannel as you need to, not in case you need to.
M


Java Regular Expressions
james render
Ranch Hand

Joined: May 08, 2003
Posts: 72
thanks Max, just to clarify, create FileChannels on demand?
Jeff Wisard
Ranch Hand

Joined: Jan 07, 2002
Posts: 89
Question in the case where you keep a single FileChannel open for the duration of your server:
What are the threading issues? If they are simply about where the file position is, that can easily be circumvented by using the absolute indexing methods on the channel. Are there other threading issues outside of file position?


Jeff Wisard<br />Sun Certified Java Programmer (Java 2)<br />Sun Certified Web Component Developer
james render
Ranch Hand

Joined: May 08, 2003
Posts: 72
hey Jeff,
haven't spent that long thinking about it, off the top of my head, not many other issues, possibly channel closure could be a problem, but I was thinking of the pointer positioning
I know that you can do an absolute position (and indeed am doing that), but isn't there a risk that another thread could cut in in-between setting the position and writing.
thread a: fileChannel.position(1000);
thread b: fileChannel.position(10);
thread a: fileChannel.write(someBuffer);
thread b: fileChannel.write(someOtherBuffer);
(
btw does it make a difference if you one line it i.e. fileChannel.position(100).write(buffy), I wouldn't have thought so, could another thread potentially cut in after the position call?
)
Guess it depends on the interpretation of the javadoc
File channels are safe for use by multiple concurrent threads. The close method may be invoked at any time, as specified by the Channel interface. Only one operation that involves the channel's position or can change its file's size may be in progress at any given time; attempts to initiate a second such operation while the first is still in progress will block until the first operation completes. Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified.

Just remembering something from Max's book about how if it says an object is thread-safe it just means that each method call is atomic i.e. a thread won't cut out in the middle of fileChannel.position() call.
am I making any sense?
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I know that you can do an absolute position (and indeed am doing that), but isn't there a risk that another thread could cut in in-between setting the position and writing.
Yes, if you use separate methods calls for position() and read() (or write()). Another thread could reposition before the read/write, and the write will be in the wrong place. However there are other methods that allow you to specify the position as part of the read/write call - these methods are completely atomic and thread safe. So:

But:

Note that in general (for other classes and methods), you can't assume that a single method call is atomic - you might have to synchronize to prevent interruption. But FileChannel specifically guarantees atomicity for each method call, so this works here. Note also:

This may look OK, as it's on a single line. But it's still two separate method calls, so it's possible for another thread to interrupt in between.
[Max]: I suggest establishing a FileChannel as you need to, not in case you need to.
Really? That's surprising to me. I assume you mean, create a new RandomAccessFile, FileInputStream, or FileOutputStream (as needed), get the FileChannel, do your business, and then close everything? For each update, at least? (And probably each read as well, unless you're saving everything in memory after startup.) Seems like more trouble than it's worth though. It's pretty easy to just create a single RandomAccessFile on startup, and then retain its associated FileChannel as long as that file needs to be open. I'm doing NX-Contractors, not hotel, so maybe there's something different which warrants this different approach.
The thing is, I'm not entirely sure what's supposed to happen if you have more than one RAF or file stream open on the same file at once. Part of me expects the operating system to throw some sort of error - I swear I've had troubles with this sort of thing in the distant past, though I can't seem to replicate them now. (Maybe the problem was with File.delete(), trying to delete something that still had an open file stream.) But since I can't find ay guarantees in the API about whether an OS will allow this sort of simultaneous access, I prefer to use one file access class at a time, shared among all threads. That way I can explicitly control how and when it's accessed, without worrying about what the OS might do. Maybe this is paranoid or unnecessary - but it's also pretty easy, given the explicit atomicity guarantees made by FileChannel. As long as you're careful not to set the position() in a separate call.
[ May 30, 2003: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Thomas Kijftenbelt
Ranch Hand

Joined: Feb 13, 2002
Posts: 73
Jim,
I do the same thing: in the constructor of my Data class, I call an init() method, which opens the RandomAccessFile and obtains the FileChannel. In the various methods of the data class, I get a buffer from this FileChannel (which is an instance var).
This means that when you have multiple clients, each client gets its own Data instance (and therefore its own FileChannel)... question is whether this is the expected behaviour, or if you should create one Data instance for all clients (like a singleton).
Greetings,
TK
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Originally posted by Jim Yingst:

[Max]: I suggest establishing a FileChannel as you need to, not in case you need to.

Really? That's surprising to me. I assume you mean, create a new RandomAccessFile, FileInputStream, or FileOutputStream (as needed), get the FileChannel, do your business, and then close everything? For each update, at least? (And probably each read as well, unless you're saving everything in memory after startup.)

This isn't a cut and dry issue, so I don't want to give the impression that I think there's only a single correct answer. Here are some of the issues I struggled with.
  • how would you deal with real db connections?
  • What does your application spend most of it's time doing? FileIO, or other sorts of processing?
  • Do you want to export a remote FC to the client, as you will when it's a member variable?
  • what do you gain by making the FC a member variable?
  • In the case of a crash, do you want the remote gc to control when a FileChannel is released, or do you want to control it?


  • For me, the weight of these issues tilted towards using FCs on JIT basis.
    The thing is, I'm not entirely sure what's supposed to happen if you have more than one RAF or file stream open on the same file at once. Part of me expects the operating system to throw some sort of error - I swear I've had troubles with this sort of thing in the distant past, though I can't seem to replicate them now.

    What's supposed to happen they play nicely together, so long as non of the streams are changing the file size, but that hasn't always been the case. But we don't talk about that
    M
    [ May 31, 2003: Message edited by: Max Habibi ]
    S. Ganapathy
    Ranch Hand

    Joined: Mar 26, 2003
    Posts: 194
    Hi All,
    I already implemented the functionality read/write for NX:Contractor assignment using RAF. Now I am really thinking whether to use FC. Is it really worth introducing FC at this stage for me? As, I am using
    raf.readFully(byteArray);//full record in a single go.
    And then i am reading from byte array each field.
    Similarly,
    raf.writeBytes(record.toString());//writes full record.
    raf.writeByte(DELETE_FLAG);//to delete record, just change the delete flag from 0 to 1
    To implement NIO, if we use MappedByteBuffer, then really it is fast. What about the clarity of program?
    Once, we read the ByteBuffer(full record of size 183 bytes, including delete flag), how to extract each field, I really don't know. Please guide me. I really want to learn NIO at this stage.
    Regards,
    Ganapathy.
    james render
    Ranch Hand

    Joined: May 08, 2003
    Posts: 72
    What's supposed to happen they play nicely together, so long as non of the streams are changing the file size, but that hasn't always been the case. But we don't talk about that
    So multiple FileChannels on a single file are bad if any of them increase the file size? What happens when creating records then??
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Multiple anything is bad. As a matter of fact, FileChannels are your best bet, because they are, at least, competely atomic per action.
    Regarding what to do for adds/deletes. You have two choices. You can either lock the entire file programmatically for such, or you can cache the contents in memory. However, afik, you're not required to do provide access to add/delete.
    M
    james render
    Ranch Hand

    Joined: May 08, 2003
    Posts: 72
    oh no, I've lost the plot here... can we go back a bit..
    currently I've not synchronized anything for File I/O.
    once a thread has obtained a lock on a particular record (or not if its reading), it opens up a FileChannel and does its business.
    I thought that because all the FileChannel business is done at a method level there was no danger of threads stepping on each others toes (because of the record locking mechanism).
    Jim was arguing in favour of a single FileChannel instance so long as you used it atomically but Max said its okay to use them JIT.
    BUT are you saying Max that if you use them JIT you'll need to make sure that no two threads are hitting the file at the same time... in which case use some sort of static instance monitor - in which case its not that different from having a static FileChannel, you have the hassle of ensuring the synchronization but without the worries of closing the file..
    I had created a mechanism for ensuring that two creates didn't occur at the same time..
    sorry brain is fryed @ the end of day here...
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by james render:
    oh no, I've lost the plot here... can we go back a bit..
    currently I've not synchronized anything for File I/O.
    once a thread has obtained a lock on a particular record (or not if its reading), it opens up a FileChannel and does its business.
    I thought that because all the FileChannel business is done at a method level there was no danger of threads stepping on each others toes (because of the record locking mechanism).
    Jim was arguing in favour of a single FileChannel instance so long as you used it atomically but Max said its okay to use them JIT.
    BUT are you saying Max that if you use them JIT you'll need to make sure that no two threads are hitting the file at the same time... in which case use some sort of static instance monitor - in which case its not that different from having a static FileChannel, you have the hassle of ensuring the synchronization but without the worries of closing the file..
    I had created a mechanism for ensuring that two creates didn't occur at the same time..
    sorry brain is fryed @ the end of day here...


    No, you're ok if two Threads are modifying the File @ the same time, so long as 1) they're not working in the same section(which your lock makes sure of), and 2) neither changes the size of the file(which what this discussion is about). As for the latter, the only thing you have to do is lock down the entire db when a create or delete is taking place, say by using a lock(-1). Otherwise, all's well.
    Make sense?
    M
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Sounds good to me. I still prefer the single FileChannel, but multiple FileChannels seem to work too if used as Max described. I just have more of a paranoid distrust of anything I don't see explicitly specified somewhere (as you've probably gathered from this and other threads).
    Thomas Kijftenbelt
    Ranch Hand

    Joined: Feb 13, 2002
    Posts: 73
    Hi,
    A small addition:
    As for the latter, the only thing you have to do is lock down the entire db when a create or delete is taking place, say by using a lock(-1). Otherwise, all's well.

    I would say you only have to lock the entire database in case of a create; in case of a delete, the actual file size does not change (only a delete flag is modified) -> so locking that particular record should be enough.
    TK
    Thomas Kijftenbelt
    Ranch Hand

    Joined: Feb 13, 2002
    Posts: 73
    Jim,
    I still prefer the single FileChannel

    Do you mean a single FileChannel per Data() instance or a single FileChannel which is shared by all Data() instances.
    TK
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Well, I'm still sorting out how RMI really works and how my own design works / will work. So here's my plan, some of which is already implemented. Y'all can holler out if there's something here that makes no sense to you; it quite possibly means I've misunderstood something.
    I've got just one Data object in the server JVM - well, one per different db file that a client somewhere is accessing. I figure a connection factory will be keeping a WeakHashMap of different files using canonical names as keys, and when a connection to a particular file is requested, it will either return an already-existing Data object associated with that file, or it will create a new one and put it in the map.
    So, most common scenario: one Data object on server, containing one FileChannel, which is the only one to access that file. Multiple RemoteData implementations running on the server (one per client) - each forwards calls to the single Data instance, which it had gotten a reference to via the connection factory when the client first connected. Client JVMs are seeing (or think they're seeing) a single RemoteData object, which they can invoke methods on. But it's really just a skeleton auto-generated by RMI, right? (I"m using 1.4). No actual Data instance exists in teh client JVM, and no FileChannel.
    Note - if my second paragraph above makes no sense, it may be that I'm misinterpreting the assignment, so let me run this part by you. I see in the spec, "the program must allow the user to specify the location of the database, and it must also accept an indication that a local database is to be used..." OK, the latter part is handled by command line arguments defined elsewhere. But does "location of the database" mean "where's the server?" Or does it mean "once we've found the server (or localhost, which ought to be pretty easy if that's what requested) - what's the path to the DB file?" I'm assuming it's the latter, and this this means different clients may connect to different DB files through the same server. But I may be way off here; please let me know.
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Sounds like a good analysis and a reasonable design. By using only a single Data object, you're cutting down on the complexity, and by making the FileChannel a private member of Data, you're keeping it serverside. Sounds like a candidate for a perfect score.
    M
    james render
    Ranch Hand

    Joined: May 08, 2003
    Posts: 72
    thats brilliant guys, makes perfect sense to me now.. thanks for the re-cap. I think that I would have been okay as I've a mechanism in place for ensuring that there is only one create going on at a time (but still allowing updates/deletes and reads whilst its happening..)
    Jamie Orme
    Greenhorn

    Joined: Apr 24, 2003
    Posts: 22
    Originally posted by Max Habibi:
    Sounds like a good analysis and a reasonable design. By using only a single Data object, you're cutting down on the complexity, and by making the FileChannel a private member of Data, you're keeping it serverside. Sounds like a candidate for a perfect score.
    M

    Hello All
    Been reading this post with interest. It has certainly got me thinking. I was wondering if you could comment on my design:
    I have a Data class which has a file channel as a private member variable. I then have a remote data adapter which sits in front of this. Each client will get its own remote data adapter instance (from RMI factory), with each adapter containing its own private instance of data. This way there will be no contention on the file channels (I hope!). Due to this there is no synchronization except for the locking which I have implemented as a lock manager (done with future expansion, locking of different tables, etc.). My only special locking consideration is for creates - while a create is taking place no other locks will be granted (prevent problem with new records being overwritten by different clients).
    I have tested this with multiple threads inserting/updating/deleting 100's records at the same time with no problems.
    After reading Max's post about JIT file channels, I did create a version of my Data class that created them as required, which still worked but performance seemed to be considerably worse (maybe thats my fault in my particualar implementation!).
    I would be exteremely grateful on any thoughts on the above, Im hoping to submit fairly soon, was just in the process of documentation until I read this!
    Many Thanks,
    Jamie
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    So, each adapeter has its own Data instance, which has its own FileChannel, which refers to the same single data file, right? Sounds reasonable. I still have a certain paranoid distrust of multiple channels attached to the same file, but you're [ahem] probably safe.
    It might be worthwhile to create some tests for this. One thread could keep updating a record so all fields are filled with XXXXXXXX, while another thread tries to update the same record with YYYYYYYY. Meanwhile another thread keeps reading the record to see what the fields look like. They might observe all X or all Y, but if they ever see X and Y in the same record, that's an error. A similar test could look at simultaneous creates. One thread creates 1000 new X records, another creates 1000 new Y records. Afterwards check that there are 2000 new records, 1000 of each type. (Whatever order they occur in.) Somethign similar can probably be done with delete too, but I'll leave that to you.
    Jamie Orme
    Greenhorn

    Joined: Apr 24, 2003
    Posts: 22
    Originally posted by Jim Yingst:
    So, each adapeter has its own Data instance, which has its own FileChannel, which refers to the same single data file, right? Sounds reasonable. I still have a certain paranoid distrust of multiple channels attached to the same file, but you're [ahem] probably safe.
    It might be worthwhile to create some tests for this. One thread could keep updating a record so all fields are filled with XXXXXXXX, while another thread tries to update the same record with YYYYYYYY. Meanwhile another thread keeps reading the record to see what the fields look like. They might observe all X or all Y, but if they ever see X and Y in the same record, that's an error. A similar test could look at simultaneous creates. One thread creates 1000 new X records, another creates 1000 new Y records. Afterwards check that there are 2000 new records, 1000 of each type. (Whatever order they occur in.) Somethign similar can probably be done with delete too, but I'll leave that to you.

    Hi Jim
    Thanks for your reply.
    I have completed the tests you have mentioned, in addition to the ones I had already done, and everything worked well. I ran it against both my original data class, and the one with JIT file channels. Whilst the results were identical, the performance hit using the latter was very noticeable. Now I know I was inserting/updating/reading 1000 records in each thread, and that for a handfull of hits by a few clients the gap wouldnt be that noticeable, but as far as good practise (and code readability - Im thinking junior programmer here!) is concerned which mechanism would you suggest I adopt? Anyone else got any ideas on this?
    Thanks again
    Jamie
    Thomas Kijftenbelt
    Ranch Hand

    Joined: Feb 13, 2002
    Posts: 73
    Hi Jim,
    Sounds good. The FileChannel is used as private member of the Data class, and active during the life of a Data class... That's the same approach I took -> the trick however seems to be to have the various remote clients point to the one data instance... In case of the local configuration this is not a problem as there is one client and one server.
    I am not completely into RMI, but I'll have a look at it and do some tests... the approach however looks good.
    Regarding your question:
    Note - if my second paragraph above makes no sense, it may be that I'm misinterpreting the assignment, so let me run this part by you. I see in the spec, "the program must allow the user to specify the location of the database, and it must also accept an indication that a local database is to be used..." OK, the latter part is handled by command line arguments defined elsewhere. But does "location of the database" mean "where's the server?" Or does it mean "once we've found the server (or localhost, which ought to be pretty easy if that's what requested) - what's the path to the DB file?" I'm assuming it's the latter, and this this means different clients may connect to different DB files through the same server. But I may be way off here; please let me know.

    In fact rhere are a number of options which must the user must be able to specify:
    - standalone mode -> database name
    - server mode -> database name / rmi server port
    - client mode -> server host name / rmi client port
    Based on the mode in which the app. is running, I present the user with a dialog box, where he / she can enter the appropriate options. these are then written into a properties file, and retrieved when needed.
    e.g. when started in server mode, the user can enter database name / rmi server port -> this is written into the properties; then an instance of the data class is created and the database name property is retrieved.
    In either case the remote client cannot be used to specify the db location (as this client has probably no knowledge of the server machine).
    Does this answer the question.
    TK
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    [Thomas]: In fact rhere are a number of options which must the user must be able to specify:
    Thanks. My prototypes so far have just used defaults here; I forgot that in the real world yes, people might want to configure this too. So, the person who starts the server can choose which local DB file they want to use for the DB, but people starting clients just get to choose where their server is - they don't also determine file name. OK, that makes sense. (I could still allow client coice of file, but it seems an unnecessary and dubious choice to allow them.) Thanks.
    [Jamie]: I have completed the tests you have mentioned, in addition to the ones I had already done, and everything worked well. I ran it against both my original data class, and the one with JIT file channels. Whilst the results were identical, the performance hit using the latter was very noticeable. Now I know I was inserting/updating/reading 1000 records in each thread, and that for a handfull of hits by a few clients the gap wouldnt be that noticeable, but as far as good practise (and code readability - Im thinking junior programmer here!) is concerned which mechanism would you suggest I adopt? Anyone else got any ideas on this?
    First I should maybe note that I suggested 1000 records not to test performance, but to try in increase the chances of observing data corruption if it is indeed possible at all. But it's nice to get performance data too as a side effect. Heck, if possible, try going for a million records, or whatver can be done in a reasonable amount of time.
    As for readability - well, it may be a matter of taste, and what the programmer is expecting to see. Personally find the idea of a single FileChannel easier to understand and closer to what I would naturally expect, so I will probably view a code that matches this expectation as "more readable". But others can have very different expectations; I'm really not sure what's the best answer here based on "readability".
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: NX: (HTL) FileChannel & Threads
     
    Similar Threads
    FileChannel and thread safety
    NX: URLyBird 1.1.3
    Advanced collection class
    RandomAccessFile & Threading
    NX:Client crashed cause deadlock in LockManager