File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes Extract a unique identifier for any given Element Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Extract a unique identifier for any given Element" Watch "Extract a unique identifier for any given Element" New topic

Extract a unique identifier for any given Element

Patrick Maue

Joined: Jul 18, 2009
Posts: 1
Hi there,

my task is to parse an arbitrary XML document and search for a certain attribute (having a unique name). Once I stumble upon the attribute, I would like to store its value and the element holding it. My problem here: how do I uniquely identify the element? A Locator as used in SAX is not really feasible, since it relies on the absolute location (line and column number) of the element in the document. I have to assume the documents change regularly, the location itself is therefore not an option. I first thought about simply extracting the XPath query which could then later be used to identify the element, but apparently (as far as my google skills reach) this is not that easy and has to be done manually (which means, for each new XML document type, I need to write a new adapter). Do you have any hints how to approach this problem, are there any libraries out there implementing this functionality?

Thanks for your help.

This discussion goes into the same direction (with DOM, which I don't really want to use), but they also don't have an answer:

ok, I searched around and found this one:
Looks promising, but is unfortunately still depending on DOM, does something similar exist for SAX/StAX?
Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

I think I agree with those two links, if they are trying to suggest that you should generate an XPath expression which looks like '/a[1]/b[42]/c[137]/@whatsit'.

I don't see why you should have to write different code for each input document. I think it should be fairly straightforward to write something which keeps track of the element names and generates those expressions. Just off the top of my head it looks like you should use a stack to keep track of the elements you are currently in, plus a map at each level of the stack to keep track of what elements there have been at the same level (previous siblings) and how many of each there have been. It could be tricky but it's just basically paperwork.

Namespaces might complicate the issue, but not much.
I agree. Here's the link:
subject: Extract a unique identifier for any given Element
It's not a secret anymore!