This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes How to resolve namespaces for XPath with JDOM Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "How to resolve namespaces for XPath with JDOM" Watch "How to resolve namespaces for XPath with JDOM" New topic
Author

How to resolve namespaces for XPath with JDOM

Fred Woosch
Greenhorn

Joined: Dec 02, 2007
Posts: 10
I hacked a small JUnit test together to find out how to use JDOM and XPath. I just started learning these technologies, so maybe I am missing something. This is what I did:



I download an ATOM feed and try to access all titles from all entries. But since I didn't enable the Namespace I can't get any elements. This wouldn't be so bad on itself, but in the program I was going to write I need to parse xml data of which I know nothing about their structure, much less their namespace.

So obviously I would need some kind of Namespace resovler. I found a post here http://blog.davber.com/2006/09/17/xpath-with-namespaces-in-java/ that explains how to do this with DOM, but since I am doing it in JDOM this doesn't seem to be an option.

Is there something similar to a Namespace resolver in JDOM? Something that can just go, fetch the declarations and let me do my thing with xpath? (I mean aside from turning namespace awareness off completely).

regards,
Fred
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

I looked for the JDOM API documentation and found it here. I don't see a JDOMXPath class, but I do see an XPath class. That class does have an addNamespace(Namespace) method. So I would suggest you look in the API documentation for the version of JDOM you are using, I'm sure there must be a similar method.
Fred Woosch
Greenhorn

Joined: Dec 02, 2007
Posts: 10
Right. Of course the JDOMXpath class is imported from Jaxen, sorry for confusing you. The equivalent code in pure JDOM would be:



And it works the same way. And while the JDOM apidocs mention the addNamespace method, I couldn't find out how I am supposed to gather these Namespaces if I don't know anything about the document in advance.

I mean if I know I will parse XHTML I just import the namespace like this:


But I want to programmatically resolve the namespace. How would I do that?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

So you know the element name but you don't know what namespace it is in? That doesn't sound like a very realistic example to me. However I'm pretty sure that XPath has functions that allow you to get the namespace URI of an element, if that's what you had in mind.
Fred Woosch
Greenhorn

Joined: Dec 02, 2007
Posts: 10
Actually this is a real world example of an online querying tool. I want to enable the user to cut&paste any given xml file into my app, then analyze and return all unique element names with path prefix to the user. Using this information I want to enable the user to build a simplified query of a given element. See? It's not as exotic as it sounds at first

Update: I tried to get the Namespace from the root element, which actually returned ONE valid Namespace element. But my query still doesn't return anything. This is what I did.


The header looks like this:



If I save the ATOM feed to disk and remove the xmlns entry altogether, my queries work perfectly. Any ideas?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

This query doesn't return anything?

/feed

Then that's probably because there aren't any elements named feed that are not in a namespace. Just getting the namespace from the root element doesn't affect that query in any way. And adding a namespace to that XPath element doesn't change the fact that it's looking for elements that are not in a namespace.

And your first strategy of getting a namespace from the root element is not going to be reliable because namespaces can be declared anywhere in the tree, and apply only to nodes where they are declared and to children of those nodes.

You could try a query of the form

/*[local-name() = 'feed' and namespace-uri() = 'http://this-is-a-namespace-URI']

However the normal thing to do is to create a Namespace object that maps a namespace prefix to a namespace URI, add that Namespace object to the XPath object, and use that namespace prefix in the XPath query. So your query might be

/atom:feed

and you would make that work by creating a Namespace object whose prefix is "atom" and whose namespace URI is the Atom URI.
[ December 12, 2007: Message edited by: Paul Clapham ]
Fred Woosch
Greenhorn

Joined: Dec 02, 2007
Posts: 10
Your query works! I suppose thats a good thing, since I now know my config isn't broken. But still this doesn't give me the flexibility I would need to parse an arbitrary xml file. Since, as you said (I didn't realize you could do that, thanks!), the namespace declaration can happen anywhere in the xml file, the construction of a query like which takes every namespace into consideration would require excessive trickery and would most likely end up being very very slow. I am contemplating using only a query like [local-name() = 'feed'] and drop the namespace thing altogether, since that also works. But god knows what side-effects that will have
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Originally you said
I want to enable the user to cut&paste any given xml file into my app, then analyze and return all unique element names with path prefix to the user.
I'm not sure what "path prefix" means there. If you meant the namespace prefix, you're on the wrong track. The namespace prefix is meaningless. The name of an element consists of its local name (the part after the prefix) plus the URI of the namespace, if any. It's possible to declare three (or several hundred) different namespace prefixes that all have the same URI. To a parser, those three different prefixes all represent the same URI so they are all the same. Treating them as different is incorrect.

It's also possible to declare elements to be in a namespace without using a namespace prefix at all:This is an "example" element that is in the "http://My-Stupid-Namespace" namespace, even though it doesn't have a namespace prefix. So you wouldn't find it via the XPath expression "/example" unless you specified that namespace URI.
Fred Woosch
Greenhorn

Joined: Dec 02, 2007
Posts: 10
No, I meant the full path to the element. I wanted to represent them like this

[*]rss.channel.item.media:thumbnail
[*]rss.channel.item.media:title
[*]rss.channel.item.pubDate
[*]rss.channel.item.title

And this is where the getNamespacePrefix method comes in really handy. Because now I can safely ignore the namespace and still preserve the real element names. Thank you for your help!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to resolve namespaces for XPath with JDOM
 
Similar Threads
need to update entry in XML file.
Performance Recommendation for Simple XML Parsing
Parsing an XML Document containing Namesapces using XPath and Xalan
Excpetion deserialization error while creating XML for sending via web service
How to parse XML document with default namespace with JDOM XPath