Well, if you keep putting values into a HashSet, it gets bigger.

I wouldn't call that a "leak"; it's the expected behavior here. You just need to find a more efficient way to do it.
First, do as Jessica suggested and increase the memory allocated to your JVM. This will be simplest, if you have the memory to spare.
Try separating your loop into two loops - one for hs1; another for hs2. Sure, you'll spend twice as much time reading lines, but this way you won't have to keep both in memory at the same time. (Set hs1 = null once you're done with it, before the second loop.) This will only have a significant effect if the two HashSets are of roughly comparable size. If hs2 only has a few hundred entries at most, it's not worthwhile putting it in a separate loop.
Next, try replacing
with
(Likewise for hs2.) This may seem strange, but the
String created by substring() uses the same character array as the original String you got from readLine(). This means that even though you're only interested in, say, the first 7 characters of it, you're actually keeping the whole line in memory when you save a reference to the substring. By using the new String() constructor, you create a new String that uses only enough memory for the substring - not for the extra chars of the full original line.
It looks like the first field is always a number. If so, you can probably get additional space savings by storing it as an Integer rather than a String. (An int would be even more efficient, but collections don't work with primitives.

)
Going a step further, if you're going to have a
lot of different integer values, it may well be more efficient to use a BitSet rather than a HashSet. If the first column is always an integer less than 10,000,000, you can allocate a BitSet with 10,000,000 bits (~1.2 MB) which will keep track of each number individually. This may seem like a lot, but the advantage is that it will never have to grow in size (unless you try to store a number greater than 10 million). If you have 100 unique values to save, the HashSet will be smaller - but if you have 1,000,000, the BitSet will certainly be smaller. You can optimize further if you can restrict the range of possible values. E.g. if the number will always be between 5 and 6 million, you can subtract 5 million, and store numbers 0 to one million - taking a tenth as much memory as 0 to ten million did.
I'm guessing these techniques (particularly the BitSet) will take care of the problem for you. If not, you'll probably need to look into alternate techniques which do not require you to represent all values read in memory, but instead rely on writing some things to intermediate files instead. Proably the easiest way to do this is to use a database, which will have already coded this functionality for you. Good luck...
[ August 07, 2002: Message edited by: Jim Yingst ]