So I made a simple Web crawler which works very well until at some point after fetching a few hundred pages causes an OutOfMemoryError. More specifically, the next() method of a scanner object does so. I've tried everything from forcing garbage collection to shaking my laptop pretty hard to make it work, but I just couldn't figure it out, but I'm sure it's something pretty stupid.
I would be immensely thankful if someone could help me with this and pay this person a virtual beer.
If you're keeping references to the objects with a "lot" of data, eventually you'll run out of memory--that's just the way it is. You could either clean up more than you currently are, allocate more memory to the JVM, or figure out ways to conserve memory while still retaining all the data you need.
Joined: Dec 08, 2009
Thanks for taking the time to reply.
My program doesn't store all the fetched pages, just a select few, and I encountered the same problem when I ran it without storing anything, so the problem is definitely in this method.
What exactly do you mean by cleaning up references, setting them to null when I'm done with them? Shouldn't this be done automatically by garbage collection?
GC is non-deterministic, meaning it may or may not happen. It'd be unusual if it *didn't* happen when the JVM was running low. Setting references to null can help, but everything in that method is local, so goes out of scope when the method ends.
It's *possible* there are memory leaks in, say, the Scanner class... but I don't know how *probable* it is. I get nervous when you say "the problem is definitely in this method": how have you proven that? If you have code that does *nothing* but run this method does the program still throw an OOME? How many URLs does it take before it blows up? If you run it with the same list of blow-uppy URLs does it always blow up on the same one? Have you checked with the visual JVM (Java 6+) to see if that helps identify what's keeping memory? Have you searched the web for Scanner memory leak bugs?