aspose file tools*
The moose likes XML and Related Technologies and the fly likes How to remove carriage return and linefeeds from XML files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "How to remove carriage return and linefeeds from XML files" Watch "How to remove carriage return and linefeeds from XML files" New topic
Author

How to remove carriage return and linefeeds from XML files

Kjeld Sigtermans
Ranch Hand

Joined: Aug 10, 2006
Posts: 125
Hello,

What would be an elegant way to remove carriage return / linefeeds from an XML file?
I have a byte array (or String) containing an XML file that, when printed to output, spans multiple lines because every node is postfixed with the CR/LF characters.

I would rather not use String.replaceAll(..) because possibly the data itself in the XML might deliberately contain CR/LF characters.
So I am looking for a way to 'intelligently' remove the CR/LF chars between the nodes.

I thought of using SAX parsing to read the elements from top to bottom and 'rebuild' the XML content that way.
But there must be a simpler way to do this?

Cheers!
Kjeld

Kjeld Sigtermans - SCJP 1.4 - SCWCD 1.4
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

Personally I would use XSLT for this. An identity transformation, decorated by something which ignored text nodes which were all whitespace. Perhaps just an <xsl:strip-space> element would do it.
Kjeld Sigtermans
Ranch Hand

Joined: Aug 10, 2006
Posts: 125
Ok I had not thought of that, and I think I know how to do that, but I need it to really perform (in a non-time-consuming matter).
Isn't a XSLT transformation in Java known to be relatively 'slow'?
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

It can be, but usually only when the XML gets quite big.

Any reason you need to do this? Such white space is meaningless in XML after all.

JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Kjeld Sigtermans
Ranch Hand

Joined: Aug 10, 2006
Posts: 125
Well I am trying to put a java.util.Properties object in to a Sql Server XML column. Properties.storeToXML(..) generates a nice XML representation of the properties object and I got that working.
There's no need to do the conversion for that task, but in some cases I do want to output that XML representation to a log line. I thought it would be nice to just put it on one line.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

Kjeld Sigtermans wrote:Isn't a XSLT transformation in Java known to be relatively 'slow'?


Let me just reference my brother Knuth's comment about "premature optimization" for the N-th time. If you really haven't heard it before then a Google search will find it for you.

Anyway you asked for "elegant" as your primary requirement.
Kjeld Sigtermans
Ranch Hand

Joined: Aug 10, 2006
Posts: 125
Alright.

This is my code, without any 'premature optimization' (had to look it up, and I don't agree, at least not in this context).
But I went with the XSLT solution.

The XSL file xmlFormatter.xsl:

Obviously, at this point the code is eligible for optimization.
I think making the choice of whether or not to go for XSLT is an architectural decision and not a premature form of optimization. Indeed, once we have chosen XSLT we should probably first get it to work and then we can optimize all we want.
I assumed high performance to be almost always an obvious requirement. Maybe I should have been more clear and have said: elegant as well as fast. But then again I still don't think the XSLT solution hereabove is elegant... and requirements change all the time.

Thanks,
Kjeld
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

Kjeld Sigtermans wrote:I assumed high performance to be almost always an obvious requirement. Maybe I should have been more clear and have said: elegant as well as fast. But then again I still don't think the XSLT solution hereabove is elegant... and requirements change all the time.


High performance is not always a requirement. Sometimes you need a quick and dirty program to do something
once. If it takes 10 minutes instead of 10 seconds you don't really care. But you dismissed XSLT as "not fast" just
based on some rumours or vague opinion. That statement qualifies as "premature optimization". I'm willing to bet
(or at least consider the possibility) that the XSLT solution is going to be similar in performance to whatever you put
together in a DOM.

Personally I think that writing an extension of an identity template is far more elegant than writing some DOM code
to implement the rules. But I'm a mathematician so I use the mathematician's definition of "elegant". There is almost
no DOM code which I would consider "elegant".

And yes, you do have to choose whether you want to include XSLT in the set of languages which you want to have
in your environment. If you want to reject it because it's yet another language to learn, then you could certainly do
that (and call it an architectural decision). I find it to be a useful tool myself.
 
 
subject: How to remove carriage return and linefeeds from XML files