What would be an elegant way to remove carriage return / linefeeds from an XML file?
I have a byte array (or String) containing an XML file that, when printed to output, spans multiple lines because every node is postfixed with the CR/LF characters.
I would rather not use String.replaceAll(..) because possibly the data itself in the XML might deliberately contain CR/LF characters.
So I am looking for a way to 'intelligently' remove the CR/LF chars between the nodes.
I thought of using SAX parsing to read the elements from top to bottom and 'rebuild' the XML content that way.
But there must be a simpler way to do this?
Well I am trying to put a java.util.Properties object in to a Sql Server XML column. Properties.storeToXML(..) generates a nice XML representation of the properties object and I got that working.
There's no need to do the conversion for that task, but in some cases I do want to output that XML representation to a log line. I thought it would be nice to just put it on one line.
Kjeld Sigtermans wrote:Isn't a XSLT transformation in Java known to be relatively 'slow'?
Let me just reference my brother Knuth's comment about "premature optimization" for the N-th time. If you really haven't heard it before then a Google search will find it for you.
Anyway you asked for "elegant" as your primary requirement.
Joined: Aug 10, 2006
This is my code, without any 'premature optimization' (had to look it up, and I don't agree, at least not in this context).
But I went with the XSLT solution.
The XSL file xmlFormatter.xsl:
Obviously, at this point the code is eligible for optimization.
I think making the choice of whether or not to go for XSLT is an architectural decision and not a premature form of optimization. Indeed, once we have chosen XSLT we should probably first get it to work and then we can optimize all we want.
I assumed high performance to be almost always an obvious requirement. Maybe I should have been more clear and have said: elegant as well as fast. But then again I still don't think the XSLT solution hereabove is elegant... and requirements change all the time.
Kjeld Sigtermans wrote:I assumed high performance to be almost always an obvious requirement. Maybe I should have been more clear and have said: elegant as well as fast. But then again I still don't think the XSLT solution hereabove is elegant... and requirements change all the time.
High performance is not always a requirement. Sometimes you need a quick and dirty program to do something
once. If it takes 10 minutes instead of 10 seconds you don't really care. But you dismissed XSLT as "not fast" just
based on some rumours or vague opinion. That statement qualifies as "premature optimization". I'm willing to bet
(or at least consider the possibility) that the XSLT solution is going to be similar in performance to whatever you put
together in a DOM.
Personally I think that writing an extension of an identity template is far more elegant than writing some DOM code
to implement the rules. But I'm a mathematician so I use the mathematician's definition of "elegant". There is almost
no DOM code which I would consider "elegant".
And yes, you do have to choose whether you want to include XSLT in the set of languages which you want to have
in your environment. If you want to reject it because it's yet another language to learn, then you could certainly do
that (and call it an architectural decision). I find it to be a useful tool myself.
subject: How to remove carriage return and linefeeds from XML files