aspose file tools*
The moose likes I/O and Streams and the fly likes Concurrent File Write [Idea for comment] Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Concurrent File Write [Idea for comment]" Watch "Concurrent File Write [Idea for comment]" New topic
Author

Concurrent File Write [Idea for comment]

Tom Coupland
Greenhorn

Joined: Jul 20, 2010
Posts: 1
Hi Ranchers,

I'm looking for a way to concurrently write to a log file. The sync blocks around the write in my current application are killing the through put of the system, getting rid of them would be a massive boon.

I'm thinking of using a FileChannel to give the ability to write to a specific point in the file, calculating that point using an atomic variable and thus having synchronus writes. Abit like this:



Its just an idea really, but would be interested for somebody to poke a big hole in it sooner rather than later.

Cheers for any ideas or thoughts in this area!

Tom

[Edit: Inserted the latch to prevent a write to a new file before the position is reset]
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

If i get it correct, your application has multiple threads writing to the same log file at high concurrency and you want to optimize your code.

If that is the problem statement then i guess the simplest thing will be to have a log writer that queues messages in memory before writing and a single background thread drains this queue into a log file.
You can use a lock-free queue ConcurrentLinkedQueue to give you good performance at high concurrency.
Of course, this means that in case of server crash you will loose the log messages equivalent to the queue size.

In your current code actually you have just passed on the locking overhead to the file channel. FileChannel can not do multiple writes as specified in the javadocs:

Only one operation that involves the channel's position or can change its file's size may be in progress at any given time; attempts to initiate a second such operation while the first is still in progress will block until the first operation completes.



apigee, a better way to API!
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2969
    
    9
Nitesh, you need to read the rest of the section you quote:
Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified.

It looks to me like Tom is carefully avoiding methods that rely on the channel's own position (an internal field), and instead using only methods that take an explicit position (passed as a method parameter). That is, he's using write(ByteBuffer, long) rather than write(ByteBuffer). As far as I can tell, this means he is eligible for concurrent writes under the spec. If his system's FileChannel implementation supports it.

Tom, I'm a bit skeptical this technique will produce significant benefits under most circumstances. Most systems will still just have one disk drive, I imagine, and you still need to wait for that drive to finish writing one thing before it can write another. It could be worth testing this to see, but I wouldn't be too expectant of great success. The message queueing idea Nitesh mentions is much more likely to benefit you, I think. It's possible to combine the two I suppose.
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

Mike Simmons wrote:Nitesh, you need to read the rest of the section you quote:
Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified.

It looks to me like Tom is carefully avoiding methods that rely on the channel's own position (an internal field), and instead using only methods that take an explicit position (passed as a method parameter). That is, he's using write(ByteBuffer, long) rather than write(ByteBuffer). As far as I can tell, this means he is eligible for concurrent writes under the spec. If his system's FileChannel implementation supports it.


May be i am wrong but isn't it so that a write may (and in most cases but for over-write) change the file size? In such a case, according to the earlier statement i quoted:

Only one operation that involves the channel's position or can change its file's size may be in progress at any given time


the implementation may not allow concurrent writes.

Scanning through the JDK code it looks like there is a lock that FileDispatcher takes before it writes it to the underlying file. However, I have not carefully looked at the code and i may be wrong.
I am really not sure how will a file system allow concurrent writes to a file because one thread may impact the position where the other thread may be writing. May be I am missing a point here.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2969
    
    9
Nitesh Kant wrote:May be i am wrong but isn't it so that a write may (and in most cases but for over-write) change the file size?

Hmmm, good point. I think the spec is actually ambiguous here, as it's possible to write your code in such a way that you guarantee that the file lenght will not change. Tom's code seems to try to do that, but it's a little buggy in that it performs checkForArchive() after already having exceeded the previous file length. But that can be fixed.

So in general the write() method can change the length of the file. But we can also call it with parameters that ensure it will cannot change the length of the file. Is that then an operation that can change the length, or not? To me it's ambiguous whether FileChannel's use of the word "operation" refers to just the method, regardless of the values of its parameters, or whether it includes the parameter values. Either way, they really could have written this part of the spec more clearly. I think there's enough wiggle room that an implementation could allow concurrent writes for method calls that don't actually change the file length (this can be determined easily at the beginning of the method, after all.) But I don't know how common it is in practice. Note that FileChannel is abstract, so whatever default implementation you're looking at isn't necessarily the same as what's available for another system.

Nitesh Kant wrote:I am really not sure how will a file system allow concurrent writes to a file because one thread may impact the position where the other thread may be writing. May be I am missing a point here.

Well, if a file system is spread across multiple disks (a striped RAID array, for example) then different disks can write at the same time. That's not common on most home computers or laptops, but it's not unheard of for a dedicated server system. That's the only way that comes to mind offhand, though there could be others.
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

Mike Simmons wrote:Tom's code seems to try to do that, but it's a little buggy in that it performs checkForArchive() after already having exceeded the previous file length. But that can be fixed.

I am a little confused here. Unless the file is created pre-populated with the fixed size, each write will increase the file size, isn't it? I don't see the code pre-populating the file.

Mike Simmons wrote:Note that FileChannel is abstract, so whatever default implementation you're looking at isn't necessarily the same as what's available for another system.

True. Actually i looked at the FileOutputStream which explicitly opens the channel using FileChannelImpl that in turn uses the class i mentioned above.

Mike Simmons wrote:Well, if a file system is spread across multiple disks (a striped RAID array, for example) then different disks can write at the same time. That's not common on most home computers or laptops, but it's not unheard of for a dedicated server system. That's the only way that comes to mind offhand, though there could be others.


Yes, a RAID setup can have multiple concurrent writes but i was intrigued whether it will be on the same file. I am not a FS guru so really do not know the internals.

Anyways, its been a good discussion that i learnt a few things from. Thanks for the inputs Mike!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Concurrent File Write [Idea for comment]
 
Similar Threads
ConcurrentHapMap size not as expected when put done using Multiple Threads
Starting new threads
Problem is Parallel Processing of Jobs in Java
How can I serialize thread handling with Java 5's java.util.concurrent package?
Clip.start() behaviour