Piet Souris wrote:Keep in mind the difference between a reference variable and the object it references. Suppose we have
what will be garbage collected (if at all)?
Campbell Ritchie wrote:I am not happy with that solution. The read() method is really awkward to use, and may give slower performance than readLine() (not certain about performance). I do my level best to avoid read(), so there should be a better way to do this with readLine. Why are you not using an XML parser? Can you check whether each line is closed with a tag corresponding to its opening tag, and if not concatenate two lines?
Tony Docherty wrote:I suggest you add a System.out.println() statement to print out the value of 'line' before you add it to docContent. This will show if your symptom-category tag contains a newline char as Paul suspects is the case.
However, If it is prints out that line as one line without the space between the tags then you need to check in your xml file to see what character that whitespace character actually is - it clearly isn't a standard ASCII space character. Write some code to read in that line byte by byte and dump the byte values to the console, an ASCII space is 20H.
Paul Clapham wrote:One thing I notice is that your posted code reads the input, line by line, and appends the line to some object, dropping the line-feed characters between the lines. So if a fragment of your XML looked like this:
then that object would contain this substring:
<Tag attr1="val1"attr2="val2">
Which is indeed not valid XML. There's no reason to drop the line-feed characters from your XML, the parser will deal with them appropriately, and as you can see it's possible that dropping them makes the XML malformed. So I'd start by not doing that any more.