File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes XML Searching via java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML Searching via java" Watch "XML Searching via java" New topic

XML Searching via java

Dean Reedy
Ranch Hand

Joined: Sep 10, 2001
Posts: 89
Anyone out there have any ideas on searching an XML document using Java. I want to be able to search for a specific word in any part of the XML document, nodes, atributes, text, cdata....
What is the best way to do this?
Is there anything already written to do this type of searching?
I have a java class which starts at the beginning and recursive looks at each part of the xml document, but is there a better way?
Any ideas would be great?
Ajith Kallambella

Joined: Mar 17, 2000
Posts: 5782
I'd think SAX will be a good tool for this kind of searching. Although I'm tempted to suggest DOM or JDOM, the amount of recursion you're talking about can take a toll on performance for large documents.
Simply program a ContentHandler that does the text pattern match and when the match is found, try to interpret the location referencen and store the results in your own datastructure. Since you are talking about *all kinds* of elements in the document, you will have to make provisions in the data structure to store the type of the element when a match is found.
Hope this helps!

Open Group Certified Distinguished IT Architect. Open Group Certified Master IT Architect. Sun Certified Architect (SCEA).
Meera Chandrasekaran

Joined: Jan 08, 2002
Posts: 10
Hi Dean,
I had done something exactly like Ajith had suggested here, few months ago.
Use a document builder to parse the file and iterate thro the contents starting from root. By doing so, you can examine every node and match it whatever you are looking for.
Hope this helps.
Scott Bain
Ranch Hand

Joined: Dec 21, 2001
Posts: 46
I agree with the SAX approach, so long as the context of the datum is not important. However, if you need to find an element that is subordinate to another element, or belongs to a particular structure, then this will be very difficult with SAX.
To search for, say, "an ID within a PO" rather than just any occurance of "ID", I would use an object tree (DOM or JDOM) and use an XPath expression to find the node (or nodes) that qualify.

Scott Bain<br />Senior Consultant<br />Net Objectives<br />425-591-5844<br /><a href="" target="_blank" rel="nofollow">Net Objectives</a><br />----------------------------<br />* Sign up for our free newsletter by sending an e-mail to<br /><br />* Learn about and join our design pattern community of practice by going to<br /><a href="" target="_blank" rel="nofollow"></a><br />* Alan Shalloway & Jim Trott's - Design Patterns Explained: A New Perspective on<br />Object-Oriented Design is now available<br />* Our new CDROM-based XML training is now available as well
I agree. Here's the link:
subject: XML Searching via java
It's not a secret anymore!