Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Extract a unique identifier for any given Element

 
Patrick Maue
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi there,

my task is to parse an arbitrary XML document and search for a certain attribute (having a unique name). Once I stumble upon the attribute, I would like to store its value and the element holding it. My problem here: how do I uniquely identify the element? A Locator as used in SAX is not really feasible, since it relies on the absolute location (line and column number) of the element in the document. I have to assume the documents change regularly, the location itself is therefore not an option. I first thought about simply extracting the XPath query which could then later be used to identify the element, but apparently (as far as my google skills reach) this is not that easy and has to be done manually (which means, for each new XML document type, I need to write a new adapter). Do you have any hints how to approach this problem, are there any libraries out there implementing this functionality?

Thanks for your help.

Update:
This discussion goes into the same direction (with DOM, which I don't really want to use), but they also don't have an answer:
http://marc.info/?l=xerces-j-dev&m=116645837127134&w=2

Update2:
ok, I searched around and found this one: http://lekkimworld.com/2007/06/19/building_xpath_expression_from_xml_node.html
Looks promising, but is unfortunately still depending on DOM, does something similar exist for SAX/StAX?
 
Paul Clapham
Sheriff
Pie
Posts: 20966
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think I agree with those two links, if they are trying to suggest that you should generate an XPath expression which looks like '/a[1]/b[42]/c[137]/@whatsit'.

I don't see why you should have to write different code for each input document. I think it should be fairly straightforward to write something which keeps track of the element names and generates those expressions. Just off the top of my head it looks like you should use a stack to keep track of the elements you are currently in, plus a map at each level of the stack to keep track of what elements there have been at the same level (previous siblings) and how many of each there have been. It could be tricky but it's just basically paperwork.

Namespaces might complicate the issue, but not much.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic