*
The moose likes XML and Related Technologies and the fly likes Xerces-2 Parsing to DOM and default attributes question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Xerces-2 Parsing to DOM and default attributes question" Watch "Xerces-2 Parsing to DOM and default attributes question" New topic
Author

Xerces-2 Parsing to DOM and default attributes question

Mark Williams
Ranch Hand

Joined: Aug 01, 2008
Posts: 66
I am trying to use Xerces implementation of JAXP to load a hibernate mapping file into the DOM and then write out the XML using Xalan. It works pretty well but in the final XML, all of the attributes with a default value defined in the DTD are appearing. The attributes are also being written out in alphabetical order. I don't want either of these behaviors. I tried playing around with setting a few different parser features but I couldn't figure out how to stop it.

Is there any way I can stop Xerces from adding the default attributes and also leave them in the order they were read from the original XML?

Thanks.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

At least for the order of attributes, the XML Recommendation specifically says that is not significant. So there is no point in asking for any particular order.

As for the attributes with default values, if the output document doesn't have the same DTD included in it then you would want those attributes to appear, wouldn't you? And if it does have the DTD included, then those attributes may be redundant but they aren't incorrect and they don't change the meaning of the document in any way.

So it looks to me like you have two non-problems here. Unless there's some operational reason that those things are causing trouble?
Mark Williams
Ranch Hand

Joined: Aug 01, 2008
Posts: 66
The only issue I have is that I am trying to do an automated change on several thousand hibernate mappings. Where I work, we let our customers have source code and they often make their own changes. All of the unecessary changes introduced by the transformation would make it harder for customers to take an upgrade and bring their modifications forward.

Other than that, I agree that it's a non-issue. Thanks for the response.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
Xerces is a low-level XML parser. It does not produce any XML-based data. Xerces does not "add" any attirbutes and it does not alphabetize them either.

Your concern lies in whatever is creating the XML-based data, e.g. DOM implementation, Xalan implementation.
Mark Williams
Ranch Hand

Joined: Aug 01, 2008
Posts: 66
Jimmy Clark wrote:Xerces is a low-level XML parser. It does not produce any XML-based data. Xerces does not "add" any attirbutes and it does not alphabetize them either.

Your concern lies in whatever is creating the XML-based data, e.g. DOM implementation, Xalan implementation.


So, I think I am confused about which implementation is responsible for which portion of JAXP. Where is the line that marks the end of Xerces's responsibility and the beginning of Xalan's responsibility?

I assumed Xalan was parsing the XML and building the DOM. Then i thought Xalan was transforming the DOM into XML. But I guess going from XML to DOM is a transformation and that would seem to also fall in Xalan's court. I am confused...
Mark Williams
Ranch Hand

Joined: Aug 01, 2008
Posts: 66
Well it looks like the Xalan Design link from the Xalan website clears up some of the confusion. I'll have to look at it in more detail after getting some rest. I don't understand the process completely at this point but I think the information I need is all there.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
Xerces and Xalan are different applications.

Xerces is a "parser." In practice, an application is created which receives information about an XML-based document from Xerces.


Where is the line that marks the end of Xerces's responsibility and the beginning of Xalan's responsibility?


Xerces is simply an XML-based parser. It reads an XML-based document, makes sure it follows all the rules and passes information about the document to an application.

Programmers create applications that receive this information from Xerces. Xalan is an example of one of these applications. An XSLT Engine is another example.

Your application is not interacting with Xerces directly. If it was, you would know a bit more about how it works. Your application is using Xalan which is DOM-based. It is reading an XML document, creating a DOM model of the document and then creating another XML document.




Mark Williams
Ranch Hand

Joined: Aug 01, 2008
Posts: 66
Jimmy, I follow what you are saying but it doesn't appear that my application's behavior matches what you describe.

When I get a new instance of DocumentBuilderFactory - an instance of org.apache.xerces.jaxp.DocumentBuilderFactoryImpl is returned.
The document object is an instance of org.apache.xerces.dom.DeferredDocumentImpl.

When I inspect the DOM in debug using eclipse immediately after calling DocumentBuilder.parse() it appears as if all of the default attributes are in the DOM representation of the source at that time.

Does that make any sense?

Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
I see. See if you can modify the application to not use the DTD. Without the DTD, there is no way to know what the default attributes would be.

If this works, then you could incorprate a validation step against the DTD prior to the operation.
Mark Williams
Ranch Hand

Joined: Aug 01, 2008
Posts: 66
Jimmy, turning off the load of the external DTD did the trick. Thanks.

As for the order of the attributes. I believe I am going to have to take another approach.

From what I read, attributes are stored in org.apache.xerces.dom.NamedNodeMapImpl. I can't prove it but judging by the logic used to place attribute nodes into NamedNodeMapImpl's Vector based internal storage, it appears to me that the attributes will always be stored in alphabetical order.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

You could certainly produce a forked version of Xalan which preserved the order of the attributes, but that seems like overkill to me. Are you really proposing to do that?

Actually when I say "certainly" that's an exaggeration. Xalan is getting the attributes from the DOM, so you would have to persuade the DOM to preserve the order of the attributes. Which I don't believe it does. You might have to write your own DOM implementation in that case, which might be harder unless you could find an open-source implementation to start from.
Mark Williams
Ranch Hand

Joined: Aug 01, 2008
Posts: 66
Whoa don't get carried away! I definitely didn't propose a Xerces fork. I was thinking more along the lines using a SAX to handle the elements I am interested in while passing the ones I don't care about straight to the output.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Okay... but SAX doesn't preserve the order of the attributes either. The startElement method passes you something which amounts to a Map of the attributes attached to the element.
Mark Williams
Ranch Hand

Joined: Aug 01, 2008
Posts: 66
Paul Clapham wrote:Okay... but SAX doesn't preserve the order of the attributes either. The startElement method passes you something which amounts to a Map of the attributes attached to the element.


Yikes, glad I didn't spend any time on reworking the approach to use SAX then! I guess I'll be hacking something together to suit my needs then. Thanks for the advice.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Yeah, that's the thing. The order of attributes is unimportant so XML software treats it as unimportant. In Java that means some kind of map from names to values. If it's important to you then that's a non-XML requirement and so a non-XML solution would be necessary. I assume that's what you have in mind now?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Xerces-2 Parsing to DOM and default attributes question
 
Similar Threads
How to parse 100mb xml file
using Xerces parser in JAXP
XML DOM
XML Parser
Lazy DOM building