aspose file tools*
The moose likes XML and Related Technologies and the fly likes How to preserve new lines when parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "How to preserve new lines when parsing" Watch "How to preserve new lines when parsing" New topic
Author

How to preserve new lines when parsing

Joe Simone
Greenhorn

Joined: Feb 16, 2005
Posts: 25
I have xml that looks like this :



How do I parse the above using Xalan such that I can preserve the new lines?

Whatever I have tried has failed to keep each line of the query preserved. It all gets lumped together as "select name from person".

Thanks,
Joe
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Here's what the XML Recommendation says about that:
To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.
So those 
 things are going to be discarded by every compliant XML parser. However you will still have line-feeds separating your lines. If you don't see them, perhaps that's a problem with whatever you are using to look at the XML.
Craig Bayley
Ranch Hand

Joined: Sep 27, 2007
Posts: 46
Or just use CDATA.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

I'm curious as to why you want those things in the middle of your text nodes, which look a lot like SQL statements. The SQL certainly doesn't need them, so why do you need them?
Joe Simone
Greenhorn

Joined: Feb 16, 2005
Posts: 25
Well here is what I am trying to accomplish.

I have a web page that connects to a datsource and lets one create a SQL query in a text area box. I click on a button and the query executes thereby returning a result set which is presented. All is fine.

Now, I would like to export this query to an XML schema defined XML export file. Once exported to an XML file, I would like to import any query file into the text area and run it or edit it. Its almost all fine except when I import the query, which can be very complex and have many lines, I lose the newlines!

So I would like to preserve the whitespace structure of the text that is being exported to and imported from an XML file.

I must be missing something somewhere.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

One of us is missing something somewhere, because I don't see where you have a problem.

Here's my test XML:Here's my test XSLT to transform it to HTML:And here's the HTML that produces:As you can see, the linefeed characters are preserved. If they aren't being preserved for you them something in the intermediate processing you described must be stripping them off. And if that's the case then adding more whitespace of the same kind isn't going to solve that.
Joe Simone
Greenhorn

Joined: Feb 16, 2005
Posts: 25
Ok, so I believed in error that I needed some additional markup in order to preserve the newlines. In actuality newlines are preserved by default without any additional measures -- as you have demonstrated.

In fact something along the line in my code is destroying the whitespace.

Investigating ...
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Let us know, I'm curious what the real problem is.

(One possibility: if that SQL is stored as an attribute in an intermediate XML document somewhere, then the attribute normalization process will strip out linefeeds.)
Joe Simone
Greenhorn

Joined: Feb 16, 2005
Posts: 25
I set up a test case to run through the scenario:

I can export to XML ok.
I can read the XML into a string ok.
When I parse the XML via DOM, the query node loses newlines.

Going into the parse I have this:





Coming out of the parse I have this:


15:28:35,083 INFO [STDOUT] select name from report where name like 'test'

Here is the code:

if (node.getNodeName().equals("query")) {
query = ((Element)node).getTextContent();
System.out.println(query);
}

Could the getTextContent() be the culprit and it so how do I fix the situation?

Thanks!
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Normally I would expect that method to preserve the text content exactly. As its API documentation says:
No whitespace normalization is performed and the returned string does not contain the white spaces in element content (see the attribute Text.isElementContentWhitespace).
But I see you have a validated document, there's a schema involved. That makes things more complex and you need to understand what "element content whitespace" means. Start by looking at your schema and see how it describes your <query> element. (I don't know much about XML Schema.)
Joe Simone
Greenhorn

Joined: Feb 16, 2005
Posts: 25
Brilliant! That was it.

I simply changed the incorrect simple type in my schema from "normalizedString" to "string" and that fixed the problem.

God bless you Paul.

Thanks again,
Joe
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to preserve new lines when parsing