This week's book giveaway is in the Agile and other Processes forum.
We're giving away four copies of The Mikado Method and have Ola Ellnestam and Daniel Brolund on-line!
See this thread for details.
The moose likes Java in General and the fly likes Modifying a text file without using a temp file Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Modifying a text file without using a temp file" Watch "Modifying a text file without using a temp file" New topic
Author

Modifying a text file without using a temp file

Steve Bassoli
Greenhorn

Joined: Feb 22, 2010
Posts: 4
Hi all,

I have a very large HTML log file, let's say it can be several hundred megs large. I need to be able to append to the end of the file as my program runs. So, I have to remove the </body> and </html> tags at the very end of the existing log file before appending to it.

The solution I currently have is to read in the existing HTML file, write each line of it to a temporary HTML file, and then exclude the </body> and </html> tags at the end. Delete the original HTML file and rename the temporary file to the name of the preexisting HTML file.

The problem with this is performance of course, I would like to not rewrite the entire file every time the log file is written to. Is there any way to handle this better?

Thanks!
Steve Bassoli
Greenhorn

Joined: Feb 22, 2010
Posts: 4
This looks promising...

http://download.oracle.com/javase/6/docs/api/java/nio/channels/FileChannel.html

Am I on the right track?
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3050
    
    1

Why don't you just find the address of the body end tag, then overwrite the file from there and when you're done you simply add end tags for body and html, assuming you always write enough to completely overwrite both tags.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19216

There is just one issue with that. FileChannel (and RandomAccessFile, another class that could be used) both deal with bytes, not text. With simple ASCII files that isn't a big problem, but as soon as you get more exotic characters you will need proper encoding. I'm not sure how FileChannel can handle that. Perhaps it's possible using Charset / CharsetEncoder, where you take a CharBuffer or String and convert it into a ByteBuffer first, which you then write to the FileChannel. There is one catch though - finding a safe place to start writing. What if the < of </body> is encoded in two bytes, with the first byte also being used for the previous character? (In other words, one byte contains data on two different characters.)


SCJP 1.4 - SCJP 6 - SCWCD 5
How To Ask Questions How To Answer Questions
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

If not all then most browsers are very tolerant in that they try to display HTML even when the syntax is not perfect (i.e. not well formed). I suspect that you could get away with not having the closing </body></html> tags at all. Then it is a simple matter of just appending to the file.

In the unlikely event that you find that the </body></html> closing tags are needed then you could make the process that presents the file (jsp or servlet ?) append the closing tags.


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 16483
    
    2

I would have to go back and question the design. An HTML file which is hundreds of megabytes? Why? Who is going to load that into their browser and look at it? Or if it isn't intended to be loaded into a browser, then why is it HTML?
Steve Bassoli
Greenhorn

Joined: Feb 22, 2010
Posts: 4
Thanks for the suggestions everyone, I ended up going with a RandomAccessFile (Thanks Rob Prime). We're always encoding the output file as UTF8 so it's not a concern that a character consists of more than one byte. RandomAccessFile is very useful in that you can actually easily write UTF8.



It's not the slickest solution, but I had to know I could remove the characters after my <!--REMOVE_HTML--> comment.
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Steve Bassoli wrote:We're always encoding the output file as UTF8 so it's not a concern that a character consists of more than one byte.


Err ... since utf-8 encoding is a variable length encoding surely this should be a concern. You will probably get away with it since your </body></html> chars are always just 1 byte per char under utf-8 encoding.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19216

Steve Bassoli wrote:

In Java, chars are assignable to ints, so you can use randomAccessFile.writeByte(' ') directly.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Modifying a text file without using a temp file
 
Similar Threads
Writing to the end of a file
Body Of Struts 1.0 html:submit
Rotation of text
HTML in JTextPane?
another kind of Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space