aspose file tools*
The moose likes Java in General and the fly likes XML parsing in Java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "XML parsing in Java" Watch "XML parsing in Java" New topic
Author

XML parsing in Java

raghav srinivasan
Greenhorn

Joined: Oct 18, 2009
Posts: 16
Hi,

I am looking to parse an XML stream which contains namespace and schema definitions. Below is the same XML code I am looking to parse.

<ns1:Sample xmlns:ns1="bp:Profile" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<in0 href="#id0"></in0>
<in1 href="#id1"></in1>
<in2 xsi:type="xsd:string">1234567890</in2>
<in3 href="#id2"></in3>
</ns1:Sample>

I should be able to extract the string "1234567890" from the above code. It would be great if someone can guide me how to achieve the same.

Thanks,
Raghav.
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1763
    
    7

First thing, you'll need to decide what kind of parsing strategy would be appropriate, and choose a parser accordingly.
Generally speaking your options are tree based parsing (Document Object Model or DOM) versus event based parsing (SAX).

A DOM parser reads the entire XML content and accordingly builds the full hierarchical object graph in memory, which is great if you need to traverse the object graph frequently or manipulate it, but it's not a great choice for large bodies of XML. Reason being that the object graph would consume enourmous amounts of memory.

Conversely, a SAX parser reads XML content in chunks and pushes those chunks to the application using an event model, which is useful for dealing with huge amounts of XML data, but offers very little control to the client when comared to the DOM alternative.

Another alternative to both DOM and SAX would be StaX, which sort of bridges the gap between DOM and SAX.

If the example XML snippet is an accurate representation of the size of the XML content you'll be processing, a DOM parser shouldn't be a problem memory wise, unless you'll be processing massive amounts concurrently, and it's by far the easiest approach to get started with.

You could get started with JAXP (supports both DOM and SAX), which is part of the core Java library, or you could look at a popular 3rd party library like JDOM or DOM4J.


Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
raghav srinivasan
Greenhorn

Joined: Oct 18, 2009
Posts: 16
Hi Jelle,

Thanks for your reply.. I guess I will go with the DOM parser since my XML file is small . In addition to the above example code,there would be soap Multireferences inclusions. Kindly let me know whether DOM would be able to parse the same.
I have one more query, I am searching for a tool which would de-serialize my Multireference contained in soap messages into simple XML tags. It would be great if you can share your views on this too.

Thanks again,
Raghav.
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1763
    
    7

raghav srinivasan wrote:Hi Jelle,

Thanks for your reply.. I guess I will go with the DOM parser since my XML file is small . In addition to the above example code,there would be soap Multireferences inclusions. Kindly let me know whether DOM would be able to parse the same.
I have one more query, I am searching for a tool which would de-serialize my Multireference contained in soap messages into simple XML tags. It would be great if you can share your views on this too.

Thanks again,
Raghav.


Oh, you need to process SOAP requests? Guess I skimmed over the namespace too quickly, but the XML example doesn't appear to be a structured as a valid SOAP message.
Still not quite sure what the use case here is exactly, but it looks like you'd be better of adopting a specialized SOAP library like Apache Axis?

raghav srinivasan
Greenhorn

Joined: Oct 18, 2009
Posts: 16
Hi Jelle,

Thanks for your reply..

Yes it is a soap message,sorry to have just put only a part of it..It was just the body. As you have suggested,I am presently using Axis2 for processing SOAP messages. I also had an requirement for processing XML tags,which went on well with the DOM parser. For the SOAP messages,I was just looking on some tool which would deserialize the Multiref in the messages to simple XML tags. As you have suggested,yes,Axis2 has soap libraries to process the request but my requirement is for study purpose and my idea was to understand SOAP message better but held up with the multireference I had an opportunity to learn from the tutorials but it takes time for me to deserialize every message and it becomes a tough task when the references are more and when the soap message is pretty big. It would be great if any tool would do it in seconds.

Kindly share your ideas.

Thanks,
Raghav.
salvin francis
Ranch Hand

Joined: Jan 12, 2009
Posts: 928

Jelle Klap wrote:First thing, you'll need to decide what kind of parsing strategy would be appropriate, ...


I know one strategy @ SAX, using Stack for processing intermediate objects as they are read from the xml content,
when system finds start element, it pushes a bean into the stack...
when system finds end element, it pops an element from the stack.
I have left out many details here,

just curious to know, are there any other similar strategies /patterns ?


My Website: [Salvin.in] Cool your mind:[Salvin.in/painting] My Sally:[Salvin.in/sally]
raghav srinivasan
Greenhorn

Joined: Oct 18, 2009
Posts: 16
Hi,

Kindly share your ideas if any.. I had given up trying to find tools which would do that


Many thanks,
Raghav.
salvin francis
Ranch Hand

Joined: Jan 12, 2009
Posts: 928

hmm I don't know of any tools,
but I dont think its that trivial to implement it on your own,
post your code here and we can help you if you are stuck any where.
raghav srinivasan
Greenhorn

Joined: Oct 18, 2009
Posts: 16
Hi Salvin,

Thanks for your reply. Below is my piece of SOAP code which I would like to de-serialize.

<soapenv:Body>
<ns1:Profile xmlns:ns1="BP:ProfileMS" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<in0 href="#id0"></in0>
<in1 href="#id1"></in1>
<in2 xsi:type="xsd:string">1234567890</in2>
<in3 href="#id2"></in3>
</ns1:Profile>
<multiRef xmlns:ns2="http://test.myTest.services.BP.com" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" id="id0" soapenc:root="0" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns2:SSE">UMB</multiRef>
<multiRef xmlns:ns3="http://test.myTest.services.BP.com" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" id="id2" soapenc:root="0" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns3:STE">NUMBER</multiRef>
<multiRef xmlns:ns4="http://test.myTest.services.BP.com" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" id="id1" soapenc:root="0" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns4:ACE">UM</multiRef>
</soapenv:Body>

Many Thanks,
Raghav.
salvin francis
Ranch Hand

Joined: Jan 12, 2009
Posts: 928

hey thats great,

so, what have you written in java code to parse this ?
post the code and maybe we could help you with details...
raghav srinivasan
Greenhorn

Joined: Oct 18, 2009
Posts: 16
Hi Salvin,

Just started off with the code..Will post it soon

Thanks,
Raghav.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: XML parsing in Java