aspose file tools*
The moose likes Java in General and the fly likes Best Way to Edit a Byte Array Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Best Way to Edit a Byte Array" Watch "Best Way to Edit a Byte Array" New topic
Author

Best Way to Edit a Byte Array

Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30
Hi Guys,

Another byte array question I'm afraid, firstly a bit of context. I'm trying to compare a large amount of AFP files but the product I use to generate the AFP files produces a timestamp at the top of each document. This causes my comparison software to show differences, added to this is the fact that if I produce the AFP's before 10am then the files are all one byte less in size due to no leading zero (thanks for that).

So my question is what would be the best way of taking a 5MB files and reading in only the first 150 Bytes, editing these and rewritting the file. The reading of only the first 150 bytes is the section i'm struggling the most with and i'm not sure it can be done?

For the edit I will probably replace the timestamp bytes with a time of 00:00:00 to ensure there are no differences.

Is this possible? if so how would you suggest going about it?

Thanks in advance
Mitch
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12823
    
    5
That sounds like a job for java.io.RandomAcessFile because it can write into the middle of an existing file.

Consult the JavaDocs for read(byte[]) and write(byte[]) and similar methods. You will also need seek( long ) to position for writing at the start of the file.

Bill
Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30
Hi,

Thanks for the quick reply, a quick question in regards to the RandomAccessFile I/O, can it be used to insert a byte(add to the file) I didn't see this when I was reading through the I/O.

Also does it read the whole file into memory, or just open it as a random access file?

At the moment, I use the RandomAccessFile to overwrite the bytes, which seems to work ok. But in the case of a timestamp from before 10am I would need to insert an extra byte to bring the file to the correct size for the comparison?

Thanks again,
Mitch
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42612
    
  65
No, RAF can't insert bytes - only change existing ones.


Ping & DNS - my free Android networking tools app
Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30
Thanks for the reply Ulf,

What would be the best way of inserting a byte, again trying to avoid reading the whole file in as some file can be very large?

I now have my RAF working so any files that are created post 10:00 am are now corrected, its just if any of the files are created before this I get the problem. i've raised it as a bug with the supplier but as it's pretty minute it won't be changed anytime soon...

Thanks again
Mitch
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19761
    
  20

Ulf Dittmer wrote:No, RAF can't insert bytes - only change existing ones.

Technically it can, but you have to do the hard work yourself. The protocol to insert n bytes at position m in pseudocode:
The shifting is the hardest part, but that can be done in blocks of n bytes at a time, starting at the end:
It is important to start at the end because the shift will overwrite bytes; if you start at m then you will overwrite bytes you will need to shift later on.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30
Rob,

Thanks for the reply, would this be the way you suggest to do it?

Also do you know how this will affect performance, as it looks like every byte is both read from the file and written to the file, as the byte I need to add is within the first 73 bytes of the file?

Thanks,
Mitch
Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30


Example file of 5000 bytes and m is 73, buffer of 1000
In this code, is my logic correct

Seek to position 4000 --> Read fully from 4000-5000 --> Seek to position 4001 --> Write from 4001-5001
Seek to position 3000 --> Read fully from 3000-4000 --> Seek to position 3001 --> Write to 3001-4001
Seek to position 2000 --> Read fully from 2000-3000 --> Seek to position 2001 --> Write from 2001-3001
Seek to position 1000 --> Read fully from 1000-2000 --> Seek to position 1001 --> Write to 1001-2001
Seek to position 73 --> Read fully from 73-1000 --> seek to position 74 --> Write to 74 - 1001

Mitch


William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12823
    
    5
I am guessing you are better off just using the stream methods for inserting or removing bytes, writing to a new file and then deleting the old.

My reasoning is that the operating system disk cache and buffers are probably better organized for reading and writing in sequence whereas using random access and working backwards through the file would involve lots more disk seeks.

Let us know if you do time trials with both methods.

Bill
Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30
Rob,

Getting really confused by the logic needed to fill your psudeo code



Currently I've got code which gets me down to the byte less than my buffer size by using AFPFile.seek((AFPFile.length()-(i*n)-1));

Do I need a int which increments or literally just to fill in the sections in your psudeo code?

int len = Math.min(remaining bytes, n); For this bit i'm understanding remaining bytes as:-
Total No. of bytes in file - m / no of bytes written
(raf.length() - m) / (raf.length() - m) - raf.getFilePointer()??

am I on the right track?

Thanks
Mitch



Thanks,
Mitch
Satya Maheshwari
Ranch Hand

Joined: Jan 01, 2007
Posts: 368
Is the replacement string always shorter than the original one(since the time stamp in set to 0)? If yes, you could leverage on that by padding some thing instead of moving all the bytes.


Thanks and Regards
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19761
    
  20

Mitch Robinson wrote:Do I need a int which increments or literally just to fill in the sections in your psudeo code?

You'll need at least two ints (or long perhaps):
1 for the remaining bytes. Let's call it remaining. It initially is raf.length() - m - n, since all bytes after m + n must be shifted up. After each shift you decrease it by the number of bytes shifted, usually n.
1 for the index to start shifting. Initially raf.length() - n it gets decreased by n each time (but make sure to not go below m). You have mimicked this behaviour with (AFPFile.length()-(i*n)-1), but simple addition / subtraction is faster than multiplication. The index to shift to can be calculated from this index, and is the index + n for simple insertions.
Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30


Hello again,

I've got the code above, which when I step through it seems to work as expected, BUT when I view the modified AFP it seems to have blanked all of the bytes(not shifted) after position m??

What would be the reason for this? In regards to the speed testing once i've got past this problem I will run some speed test comparison and report back with the results.

Thanks,
Mitch

Thanks again
Mitch
Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30
Thanks for the help guys, I've finally sorted it.

Will try to get some speed comparison tests going, but initial testing is showing its taking ~20seconds to shift all bytes from position 73 in a 70MB file so speed seems ok to me, especially for my needs.

Again thanks for the help

Mitch
Making Progress....
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19761
    
  20

Mitch Robinson wrote:

Here's the code I used for something similar. Instead of inserting one byte it was replacing several bytes. length is the number of bytes to replace, 0 in your case, and data is a byte[] to replace with.
When applying to your example, size == raf.length(), length == 0, diff == data.length == 1, offset == m, index == posToShift and COPY_BUFFER_SIZE == n (but can be anything).

As you see it is nearly the same; the only difference is the calculation of index / posToShift, but that seems to be just fine in your code as well.
Mitch Robinson
Ranch Hand

Joined: Oct 29, 2009
Posts: 30
Thanks for that Rob,

It seems yours is more re-usable than mine as mine was built for the specific purpose of entering a single byte, but it wouldn't take too much change to accomodate additional bytes being inserted. Also surprised by the speed I thought it would be slower than what my tests are showing....

Mitch
 
jQuery in Action, 2nd edition
 
subject: Best Way to Edit a Byte Array