aspose file tools*
The moose likes Performance and the fly likes Memory leak while using String tokenizer Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Memory leak while using String tokenizer" Watch "Memory leak while using String tokenizer" New topic
Author

Memory leak while using String tokenizer

vijay kumar
Greenhorn

Joined: Feb 13, 2002
Posts: 10
Hi All,

I have a program which reads a text file with comma separated value(size 3MB). Then the file will be read into StringBuffer which is then tokenized using StringTokenizer.
I see memory leak in JVM when I store the string returned from the nextToken operation when it is stored in static hashmap.

When I run a Jprobe I find the a char array being created in JVM which will not be released untill the key or value which is referenced to the string returned from the nexttoken is released from hashmap.
The memory occupied in Heap memory in this case is around 7MB

Why is that JVM not releasing the char array memory during Garbage collection. Do you have any idea?

I have attached the sample program below


[ October 21, 2008: Message edited by: vijay kumar ]

[ October 21, 2008: Message edited by: vijay kumar ]

[Nitesh: Added code tags. Kindly use code tags while posting code.]
[ October 21, 2008: Message edited by: Nitesh Kant ]
Dmitri Bichko
Greenhorn

Joined: Jun 16, 2007
Posts: 15
StringTokenizer uses substring() to return the tokens, so the token objects don't store their own char data, just a reference to the original string and an offset and length. As long as the token strings are referenced (here, held in the map) the original string will be referenced, too.

This is a common gotcha when parsing out small chunks of large strings. One way to get around it is to create a new String from the token ( token = new String(token); ) , this will copy the string contents and release the reference to the original string.

More importantly, there's a much easier way to do what you want. You are already using BufferedReader, so why not use the .readLine() method? Similarly, .split() is an easier way to get the fields.


[ October 21, 2008: Message edited by: Dmitri Bichko ]
vijay kumar
Greenhorn

Joined: Feb 13, 2002
Posts: 10
Thanks for the explanation.I never knew that substring returns the reference instead of new string.
Don Solomon
Ranch Hand

Joined: Jul 20, 2008
Posts: 48


Software development is an exercise in thinking not coding.
vijay kumar
Greenhorn

Joined: Feb 13, 2002
Posts: 10
The above code says the substring will return new String and not the reference unless the count is begin or end. So if I retreive the string from the center then it is the new string, the reference should not hold good here. So the original string has to be garbage collected which is not the case?
Dmitri Bichko
Greenhorn

Joined: Jun 16, 2007
Posts: 15
Originally posted by vijay kumar:
The above code says the substring will return new String and not the reference unless the count is begin or end. So if I retreive the string from the center then it is the new string, the reference should not hold good here. So the original string has to be garbage collected which is not the case?


I can't say I see where it says that. It always returns a new String object, it's the backing data we are talking about.

Nothing beats experimentation, throw something like this into a debugger and look at the 'value' field of both objects:


They will reference the same char[].
vijay kumar
Greenhorn

Joined: Feb 13, 2002
Posts: 10
Yes the Char[] is refering to the string even though it substring returns new String. The memory hold by the char[] is not released if we store the returned string in hashmap.

Thanks.
[ October 23, 2008: Message edited by: vijay kumar ]
Fernando ref
Greenhorn

Joined: Dec 20, 2009
Posts: 1
You just made my day!
I was actually stuck on this nagging issue for almost a week: my tenured heapspace was overbroiling, no matter how big I made it...
Thanks a bunch!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Memory leak while using String tokenizer