Roberto Lo Giacco

Greenhorn
+ Follow
since Feb 01, 2005
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Roberto Lo Giacco

Originally posted by David Harkness:
Not if you set the ByteBuffer's position and limit of the buffer before decoding it. Loop over the mapped buffer, setting up a good block size using position and limit. Decocding will now just decode the bytes in the range you specify.

Use CharsetDecoder.decode(ByteBuffer, CharBuffer) or one of the other similar methods so you can reuse the same CharBuffer. Since decoding advances the position, it should leave you at the next correct spot, dealing with multi-byte character encodings for you; just set limit to be position + BLOCK_SIZE and keep going.

If you want ultimate speed, cannot count on ASCII files, and don't want to write your own specialized decoder, this is the way to go.



You are right, but my needs doesn't allow me to perform the operations you described: the CharBuffer I want to get out from the big log file is going to be parsed by regexp...

I ended up with this solution: wrapping the MappedByteBuffer with a custom CharSequence implementation, named MappedCharBuffer!

The result works correctly with ASCII files only, but log files are ASCII compliant usually...

Here is the code:



Actually the code performs something corresponding to this SQL statement:

SELECT COUNT(*),username FROM log WHERE message LIKE '%LOGIN OK%' GROUP BY username ORDER BY username
19 years ago
Sorry, but the memory problem is not solved using MappedByteBuffers and the Charset.decode(...): this method tries to decode the entire buffer of bytes causing an out of memory error...

I need something to decode the buffer while it's readed from the file system....
19 years ago