GeeCON Prague 2014*
The moose likes XML and Related Technologies and the fly likes Searching the XML string content using Regular Expression Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Searching the XML string content using Regular Expression" Watch "Searching the XML string content using Regular Expression" New topic
Author

Searching the XML string content using Regular Expression

kathir je
Greenhorn

Joined: Jun 07, 2006
Posts: 12
Hi,

I'm having XML content as a string. I'm using W3C dom for getting values from the XML.

I've a very large XML file with attributes, elements similar like below

<Shares>
<bookDetails bookName="How to Learn English" bookAuthor="English Writer">
<Chapter chapterName="From Alphabetes" chapterPage="23"/>
</bookDetails>
<company>
<name>test</name>
<address>test address</address>
<contact>test contact</contact>
<C02>10.5</C02>
</company>
</Shares>

Currently I've written a method which accepts Root Element and searchable name to find the corresponding attribute or element and gets the value.

The method will get the value from a element or an attribute matching the given name.

I've used XPathAPI.selectNodeList to retrieve the value. I've used the below XPATH to check the given searchable name in both attribute and in element

xpath = "//*[@" + inAttr + "]";
xpathElement = "//" + inAttr + "/text()";

NodeList nodelist = XPathAPI.selectNodeList(root, xpath);

NodeList nodelist = XPathAPI.selectNodeList(root, xpathElement)

Sample Input and Output as follows

Input: bookName

Output: How to Learn English

Input: address

Output: test address

Input : C02 --> The element name has numeric character too

Output: 10.5

Problem: The XPATHAPI.selectNodeList() causes performance problem and it takes more time to search and gets the value.

I've planned to use regular expression (Pattern, Matcher) to search and get the values from the XML string.

Can anyone please let me know the regular expression with a code snippet to retrieve value either from a element or an attribute
which matches the element or attribute name ???

Thanks,
Kathir
Darryl Burke
Bartender

Joined: May 03, 2008
Posts: 4571
    
    5

kathir, on another occasion and forum you wrote, after you had been referred to this JavaRanch FAQ page:


So why weren't you forthright this time around?
http://www.java-forums.org/new-java/41822-searching-xml-string-content-using-regular-expression.html


luck, db
There are no new questions, but there may be new answers.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

If you're asking for a regex which can extract data from an XML document, such things don't exist. The complexity of XML is at a level higher than the complexity of regex.
 
GeeCON Prague 2014
 
subject: Searching the XML string content using Regular Expression