*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes RandomAccessFile & Threading Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "RandomAccessFile & Threading" Watch "RandomAccessFile & Threading" New topic
Author

RandomAccessFile & Threading

Manoj Dixit
Ranch Hand

Joined: Sep 13, 2003
Posts: 31
Hello,
I am using RandomAcessFile for my IO and its instance variable of Data class. I am using it through-out my Data class. I open the file in constructor and just leave it open (Don't know where to close).
There is sepreate instance of Data class for each client.
Will it crate any threading problem as diffrent client will access db.db same time for manuplating db.db.
I am really lost and don't want to switch FileChannel.
Please guide me .

Manoj
Philippe Maquet
Bartender

Joined: Jun 02, 2003
Posts: 1872
Hi Manoj,
I am using RandomAcessFile for my IO and its instance variable of Data class. I am using it through-out my Data class. I open the file in constructor and just leave it open (Don't know where to close).

If you open the raf within from your Data constructor, I suppose that a good place to close it could be the finalize() method. I use separate open() / close() methods in Data so I have not that issue.
There is sepreate instance of Data class for each client.
Will it crate any threading problem as diffrent client will access db.db same time for manuplating db.db.

If you make sure (through any static object created for that purpose) that createRecord() will not be called by multiple threads at the same time and will not be called concurrently with delete operations, you should be OK IMO (as updates and deletes are protected by the locking system).
Best,
Phil.
Manoj Dixit
Ranch Hand

Joined: Sep 13, 2003
Posts: 31
Thanks Phil for quick reply.
As finialize will be called at JVM wish. So that raf instance will keep open the file (I am not taking care of crashed client).
By opening file in constructor I read header info for FieldInfo object.
About create and delete-: I am using vector to store deleteted record number. So I am synchronizing it in delete and create. That guranttee that delete and create will not be called same time.
My concern is as with each raf instance I move file pointer back and forth in read, delete,update, create.
As delete and create - eliminated (synchronization on Vector). But question about read and update.
waiting for u r reply.

regard's
Manoj
Philippe Maquet
Bartender

Joined: Jun 02, 2003
Posts: 1872
Hi Manoj,
I suppose that your vector is static (should be).
My concern is as with each raf instance I move file pointer back and forth in read, delete,update, create.
As delete and create - eliminated (synchronization on Vector). But question about read and update.

For reads and updates you should be OK. At the raf level, being not thread-safe means it's not safe for multiple threads to access a shared raf instance. In your design, as each thread will use a separate raf instance, thread safety is just a logical issue IMO, in relation with creates and deletes. But as you said that you solved them ...
Regards,
Phil.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Are you going to use RMI? If you do, then even if each client has a separate Data instance, you may have more than one thread associated with a single client, or you may have one thread associated with more than one client. Maybe not at the same time, so concurrent access may not be a problem, but the way the Java memory model works you may get some strange effects anyway if you let different threads access a single raf instance without synchronization. Also, even if the raf instances were not shared between threads - they all connect to the same single file on the server, right? What happens if on client is writing to the file while another is reading from the same record? You may well need some sort of synchronization to prevent problems here. Switching to FileChannel may or may not help you in this department - FileChannel does offer some better protections against problems than RAF does, but my opinion is that you'd still need some synchronization somewhere to prevent possible problems. That's a long discussion in itself; see here if you want lots of gory details. In practice, if you use FileChannel the way Max advocates you probably won't observe any problems; conversely if you just synchronize as I advocate, you also shouldn't have any problems. The stuff Max and I argue about in that thread is mostly theoretical, so don't worry about it too much; I just included the URL in case you really want to know more details.
Back to your design - truth is, I have a hard time imagining it being made into something thread-safe if every client has a separate Data instance which in turn has its own separate RAF instance which refers to a single shared file. You can get a much better chance of atomic operations by using FileChannel rather than RAF, and you can avoid many other threading issues by opening a FileChannel only during execution of a method that needs to use it, and closing it when that method terminates. That way different threads will never access the same FileChannel. But that's somewhat different from what you propose. Personally I prefer to create a single FileChannel or RAF which all clients use, and protect access via synchronization. But there are a lot of wayt to do this; I don't want to railroad you down a particular path; I just want you to realize that the path you're on is not as safe as you might think. Good luck...
[ October 09, 2003: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Bharat Ruparel
Ranch Hand

Joined: Jul 30, 2003
Posts: 493
Hello Manoj,
The following is a sentence from Jim's excellent post:

Personally I prefer to create a single FileChannel or RAF which all clients use, and protect access via synchronization.

I use RAF and I am doing what Jim states above. It works.
Regards.
Bharat


SCJP,SCJD,SCWCD,SCBCD,SCDJWS,SCEA
Philippe Maquet
Bartender

Joined: Jun 02, 2003
Posts: 1872
Hi Jim,
Back to your design - truth is, I have a hard time imagining it being made into something thread-safe if every client has a separate Data instance which in turn has its own separate RAF instance which refers to a single shared file.

I got the same feeling, and that design surprised me. That's after a while that I concluded that there was no reason that Manoj's design leads to threading issues. The only possible issues relate to the sharing of the file IMO (create and delete), but they can be solved separately. The issue you mention : "What happens if on client is writing to the file while another is reading from the same record?" is the dirty reads issue, that most people here (AFAIK) don't address despite their more "classical" design.
About RMI :
Maybe not at the same time, so concurrent access may not be a problem,

why did you write maybe (twice) ? "Not at the same time, so concurrent access will not be a problem," is true too IMO.
but the way the Java memory model works you may get some strange effects anyway if you let different threads access a single raf instance without synchronization.

What kind of "strange effects" ? Can you give an example ?
Best,
Phil.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
[PM]: The issue you mention : "What happens if on client is writing to the file while another is reading from the same record?" is the dirty reads issue, that most people here (AFAIK) don't address despite their more "classical" design.
Yeah, I suppose so. However I think people should at least concider the possibility and decide why it's OK from there perspective, rather than naively think that their code is completely thread-safe. In my own case they sync is not a significant impediment because it's mostly record-level and the full caching makes things extremely fast anyway, so who cares if there's some uncontended synchronization slowing things down a tiny bit? But I understand other designs have more incentive to reduce synchronization during slow operations like I/O, so OK...
[JY]: Maybe not at the same time, so concurrent access may not be a problem,
[PM]: why did you write maybe (twice) ? "Not at the same time, so concurrent access will not be a problem," is true too IMO.

You're right. I was thinking there was a loophole, but further thought indicates there isn't (at least, not the one I was thinking of) so remove the "maybe". Unless I think of something else later.
[JY]: but the way the Java memory model works you may get some strange effects anyway if you let different threads access a single raf instance without synchronization.
[PM]: What kind of "strange effects" ? Can you give an example ?

What I was thinking of is instance data that is out of data because when a previous thread changed it, it only changed a local copy of the data, and without synchronization there's been no need yet to write that changed data to main memory, so when you access the instance from a separate thread, it sees an older version of the data, missing the last thread's change. This sort of thing can crop up in all sorts of subtle an unexpected ways when two different threads access shared data without synchronization. However in this case I can't think of a particular way that this would actually lead to problems, as there's no significant mutable data in the RAF, it seems. There's a file pointer, which is actually implemented in native code and thus probably immune to the JVM's creative optimizations. But even if it were implemented with an instance variable, it seem that for our project it's unlikely anyone would ever try to use the RAF without first resetting the file position to whatever particular value is required. So it probably doesn't matter if the file position is incorrect when a thread first accesses it, since the first thing that thread will do is use seek() to set it to something else, then use either read() or write() from the same thread to do whatever it needs to do at that position. So I think that in this case, for this assignment, this particular issue (multiple threads accessing one RAF non-concurrently without synchronization) is not actually a problem. However it's the sort of thing that is dangerous to do in general; there are many other scenarios where it may lead to trouble. E.g. if you had several different threads that were simply trying to append consecutive entires to the end of a file, relying on the file position from the last write (by another thread). Even if these threads were never concurrent, one thread could easily overwrite the data from another thread's write because it was using an out-of-date file position. So in general, I'm very paranoid about subtle bugs due to accessing mutable data from differnet thread. But I acknowledge that in this case, for a number of subtle reasons, it seems to be OK.
Actually there's a more serious issue I didn't address, but it's also one with a reasonably easy fix. If you open an RAF in "rw" mode, there's no guarantee when any writes you make will actually be written to the underlying file. (Except when you call close; I think it's safe to assume that any buffered data is fully flushed at that point.) So you could have something like this:
  • Client A locks record 1.
  • Client A reads record 1;
  • Client A changes part of the read data (e.g. customer ID field)
  • Client A updates record 1 with the new data, using A's RAF. Data is not yet flushed to file.
  • Client A unlocks record 1.
  • Client B locks record 1.
  • Client B reads record 1 using B's RAF. This reads from the file, which does not yet have the data from A's update.
  • Client B modifies data.
  • Client B writes record 1 using B's RAF. This may or may not be written promptly to the file.
  • Client B unlocks record 1.

  • What data will be in record 1 when this is done? Client B made changes as though A didn't exist, because the data hadn't been written to file yet. Perhaps B wasn't supposed to make the change at all, instead discovering that A had already reserved the record for another customer. Instead they've made two independent changes, and the final data in the file will be determined by which RAF gets around to writing its changes last. That's no good, way too risky.
    Fortunately, there are two solutions to this. One is the one Max advocates for FileChannel - create it just before you use it, then immediately close it, and let GC take it. That will force any outstanding data to be written to file. Or, instead of opening the file in "rw" mode you can use "rws" or "rwd" mode, both of which force writes to update the underlying file immdiately, before the write method returns. Which is what most people assume happens anyway. It's a simple enough fix - once you realize the problem is there. But that's far from obvious initially.
    So alright, it seems that this design can be made reasonably safe. Well, for the "dirty reads" crowd anyway. . Carry on then.
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Jim,
    [PM]: What kind of "strange effects" ? Can you give an example ?
    What I was thinking of is instance data that is out of data because when a previous thread changed it, it only changed a local copy of the data, and without synchronization there's been no need yet to write that changed data to main memory, so when you access the instance from a separate thread, it sees an older version of the data, missing the last thread's change. This sort of thing can crop up in all sorts of subtle an unexpected ways when two different threads access shared data without synchronization.

    Waouw ! I didn't think of that.
    Or, instead of opening the file in "rw" mode you can use "rws" or "rwd" mode, both of which force writes to update the underlying file immdiately, before the write method returns.

    I use "rwd".
    So alright, it seems that this design can be made reasonably safe.

    But it still looks weird. You've understood that my purpose was not to advocate that design (mine is quite different), but to answer Manoj's questions about its tread safety (or not).
    Thanks for those interesting stuff.
    Best,
    Phil.
    Arun Kumar
    Ranch Hand

    Joined: Aug 29, 2003
    Posts: 67
    I was wondering if you have one client one RAF instance, what about the OS limit on number of filehandles allowed on a single file. I believe someof the OS have 256 as the limit. So in this case what will haappen when more than 256 clients connect to the server and opens more than 256 RAFs? Shouldnt this kind of design be concerned about such an issue? This was one of the reasons why i synchronized on a static RAF instance.
    And i read the Thread pointed out by Jim. It was really confusing. Some of the posts i wasnt sure if they are talking about "1 thread - 1 FileChannel" or "multiple threads - 1 file channel".
    Arun


    SCJP (1.4), SCWCD, SCJD
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    I was wondering if you have one client one RAF instance, what about the OS limit on number of filehandles allowed on a single file.
    I'm not sure, but the API never really says that you can have multiple RAFs open for the same file at one time, does it? I think this is unspecified behavior, and so yes the OS may throw an IOException if you have too many RAFs open at once for the same file. It's because of these unspecified details that I don't trust having multiple RAFs. Having one per file, I put in synchronization to ensure there are no problems. Having multiple RAFs, there's a lot more uncertainty. I have similar concerns with FileChannel - though the API does offer more, better guarantees for FileChannel than it does for RAF, I still find them insufficient for me to fully trust the implementation to do what I want.
    And i read the Thread pointed out by Jim. It was really confusing. Some of the posts i wasnt sure if they are talking about "1 thread - 1 FileChannel" or "multiple threads - 1 file channel".
    Yea, sorry about that. In general I was trying to focus on multiple threads - 1 FileChannel in that discussion, and explaining why you still need synchronization in that case.
    Dushy Inguva
    Ranch Hand

    Joined: Jun 24, 2003
    Posts: 264
    Hello All,
    I am using CREW (Concurrent Read Exclusive Write) locking to access the records. I am also caching the records in memory. When i have to read/write from/to the database, i acquire the corresponding lock, and am currently planning to use multiple RAFs. One for each client, opened in "rws" mode. So i think i would not have problems of threads stepping on each other toes.
    But, I am bounded by the number of RAF handles i can have to a file. The other design in which we can share one RAF between all the users seems too restrictive.
    The middle approach is to maintain a pool of RAFs and allocate them on demand. But this looks like an overkill.
    So, once again (I was stuck inbetween 3 approaches in someother problem as well ;-) ) i have to choose between one of these three !!!
    What do you guys think ?
    Dushy


    SJCP, SCBCD, SJCD, SCDJWS, SCEA (Part I)
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    The other design in which we can share one RAF between all the users seems too restrictive.
    What's restrictive about it?
    Dushy Inguva
    Ranch Hand

    Joined: Jun 24, 2003
    Posts: 264
    Jim,
    The reply was fast!!! I feel it would be a point of contention between all the threads. But, in reality, i wonder... Since the computer has fixed bandwidth, opening parallel RAF connections (from the memory to disc) it could just endup sharing the available bandwidth.
    But, we have no control over what goes on underneath. It is hard to take this approach cos, while one thread is reading/writing a record to the database, all the others would have to wait !!!
    Dushy
    Manoj Dixit
    Ranch Hand

    Joined: Sep 13, 2003
    Posts: 31
    I was building the TOP (GUI) of the house without realizing that FOUNDATION is too weak. I will think about all the possibilities discussed in above threads and may come up with something.
    Thanks guys for sharing your knowledge, views and opinion.
    Arun Kumar
    Ranch Hand

    Joined: Aug 29, 2003
    Posts: 67
    Originally posted by Jim Yingst:

    [b][Arun]And i read the Thread pointed out by Jim. It was really confusing. Some of the posts i wasnt sure if they are talking about "1 thread - 1 FileChannel" or "multiple threads - 1 file channel".

    [Jim}Yea, sorry about that. In general I was trying to focus on multiple threads - 1 FileChannel in that discussion, and explaining why you still need synchronization in that case.

    I know i am asking for too much, but Jim if you have time can you explain to me why do you think that FileChannel is not thread safe (Talking about multipe clients - single FileChannel).
    From what i understood reading that long thread you had mentioned, implicit reads and write (where you dont set the position as a paramater in your read/write method) changes filechannel position during the process, so it is blocking. So these FileCHannel operations are thread safe.
    Those reads/writes with position as a paramter (explicit position methods) can occur concurrently. But even then the read method (both in explicit and implicit position) is blocking according to the ReadableByteChannel API, so either way only one read can happen at a time. And explicit write method (with position as a paramter) is also blocking becuase it might lead to a change in file size. SO all the methods seems to happen only one at a time or can i say atomic. Or am i missing something?
    Thankyou Jim
    Arun
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Well that's a long story - that's what the other thread is about after all, and there are many subpoints. But a few key points are:
    There's an excellent chance that actual FileChannel implementations really are completely thread-safe. They seem to be pretty safe from the tests I've run. I'm just arguing that if that's Sun's intent, they have failed to adequately document that thread-safety.
    The biggest hole by far is the one that I was discussing at the end of the FileChannel thread. (I dropped the other points since we weren't making much progress, and the other points were moot if we couldn't reach agreement on something I considered more fundamental.) When you call read(ByteBuffer) to read a record of length N, there's no guarantee you'll read all N bytes, even if the file has at least N bytes and your ByteBuffer has space for N bytes. This means that you need to wrap the read() in a loop, which means another thread could slice in and do something else with the FileChannel between reads, which means that (a) you can't rely on implicit-position methods; you need to make sure the position is specified as part of the read() method, and (b) you need to use synchronization somewhere if you want to make sure that your read does not read an inconsistent record because another thread wrote to the record in the middle of your read. Or (c) you can accept a potential dirty read as OK, an acceptable risk that's not critical to the application. That's fine too, as long as the risk is understood and accepted. (In practice it seems to be an extremely small risk.)
    My other main point, separate from the above, is that while FileChannel will not allow two read() methods simultaneously on the same file (that is, one of the method calls will be delayed slightly so they're not concurrent), and also won't allow two write() calls to be concurrent, there's nothing in the spec preventing a read() and write() from being concurrent. Which leads to the same sort of potential dirty read problem described above. Which again may be accepted; it's probably not a big deal - I'm just arguing that you can't guarantee it won't happen, unless you use synchronization.
    Marcus Beale
    Ranch Hand

    Joined: Apr 13, 2004
    Posts: 33

    [JM]
    What I was thinking of is instance data that is out of data because when a previous thread changed it, it only changed a local copy of the data, and without synchronization there's been no need yet to write that changed data to main memory, so when you access the instance from a separate thread, it sees an older version of the data, missing the last thread's change. This sort of thing can crop up in all sorts of subtle an unexpected ways when two different threads access shared data without synchronization.

    So true. That is exactly what the volatile keyword takes care of for you. I don't know if that's what you meant by synchronization, but I don't want people to confuse the two keywords. Even though you can sometimes take care of the issue you describe by synchronizing a method.
    Thanks for the "rwd" tip too. That tripped me up for a day or two, until I read your post and realized that of course that was my problem.
    Don Wood
    Ranch Hand

    Joined: Dec 05, 2003
    Posts: 65

    Philippe said:
    I use "rwd".

    It seems to me that "rws" is a better choice because it also updates the metadata. Since the length of the file changes when a record is added to the end of the file, it seems like a good idea to have the metadata stay as current as the data.
    This seems even more important if you choose to close the raf in a finalize method. If you use "rwd" and close the raf in a finalize method, I think you would run the risk of losing records added on to the end of the file if the finalize method does not get called.
    As an aside, I think closing the raf in a finalize method is not a good idea but I have seen a number of people say they are doing this. But for those who are closing in the finalize method, it seems to me that "rws" is a must.
    [ April 17, 2004: Message edited by: Don Wood ]
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Don,
    As an aside, I think closing the raf in a finalize method is not a good idea but I have seen a number of people say they are doing this. But for those who are closing in the finalize method, it seems to me that "rws" is a must.

    There is no garantee that finalize() will ever been called, indeed!
    It seems to me that "rws" is a better choice because it also updates the metadata. Since the length of the file changes when a record is added to the end of the file, it seems like a good idea to have the metadata stay as current as the data.

    Interesting! Do you know any reference to what's supposed to be part of the file's metadata and the risk to have a file's metadata and its contents out of synch?
    RandomAccessFile's constructor javadoc:
    The "rwd" mode can be used to reduce the number of I/O operations performed. Using "rwd" only requires updates to the file's content to be written to storage; using "rws" requires updates to both the file's content and its metadata to be written, which generally requires at least one more low-level I/O operation.

    FileChannel.force() method javadoc (related):
    Invoking this method may cause an I/O operation to occur even if the channel was only opened for reading. Some operating systems, for example, maintain a last-access time as part of a file's metadata, and this time is updated whenever the file is read. Whether or not this is actually done is system-dependent and is therefore unspecified.

    "rwd" is an optimization compared to "rws", and reading those excerpts, I mainly thought of the file timestamps for which - in this context - I don't really care about. As far as the file length is concerned, checking the sources to see what could happen is not of great help: RandomAccessFile.length() is native, and File.length() delegates its job to FileSystem.getLength() which is abstract.
    Regards,
    Phil.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: RandomAccessFile & Threading
     
    Similar Threads
    Multithreading Data side?
    Testing database/data layer without using Singleton method
    HashMap (or sth else) to lock/unlock records : Is it really safe?
    Data Object creation in Remote Mode!
    Client Local Access database