It's not a secret anymore!*
The moose likes XML and Related Technologies and the fly likes XML Parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML Parser" Watch "XML Parser" New topic
Author

XML Parser

Satish Kandagadla
Greenhorn

Joined: Jul 03, 2006
Posts: 27
Hi All,

I have a really huge XML which is not a well formed one.

Is there any way to find out exactly where it is not well formed by reading the file in java?

My file size is a 100 MB one and it eats up a lot of memory in opening through any XML editor. Moreover there are 2 million lines and editor for sure cant help. If I try to parse through any parser it will fail in the first step saying that it is not well formed. Any inputs would really help.

Thanks,
Satish
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18155
    
    8

You could do that with a SAX parser. Here is an example of how to use a SAX parser and a Locator to find the current parse location. Save the information and use it when the exception is thrown.

I assume this enormous malformed document is being generated by a computer program, and you're trying to fix that program?
Satish Kandagadla
Greenhorn

Joined: Jul 03, 2006
Posts: 27
Originally posted by Paul Clapham:
You could do that with a SAX parser. Here is an example of how to use a SAX parser and a Locator to find the current parse location. Save the information and use it when the exception is thrown.

I assume this enormous malformed document is being generated by a computer program, and you're trying to fix that program?


Thanks for the Reply. The XML is from a product. I do not have access to how the product generates the XML. My intention is to figure out how many missing tags are there and then find out how to fix it.

Well the code that you pointed me to assumes that the xml is well formed or does it work on any XML?
[ November 19, 2008: Message edited by: Satish Kandagadla ]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18155
    
    8

Only well-formed documents are XML. But a SAX parser will process a document until it reaches a place where it sees a problem.

In my opinion you would be better off getting the people who are sending you garbage documents to fix those documents themselves.
Satish Kandagadla
Greenhorn

Joined: Jul 03, 2006
Posts: 27
Originally posted by Paul Clapham:
Only well-formed documents are XML. But a SAX parser will process a document until it reaches a place where it sees a problem.

In my opinion you would be better off getting the people who are sending you garbage documents to fix those documents themselves.


Thanks Paul. Yes there are challenges in the project to get the XML from them but I see no other way in getting the proper XML apart from approaching them. My life will be lot easier if I get a well formed XML.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12682
    
    5
IF - the XML you are getting is consistent in terms of where the markup is incorrect, you might be able to to code fixup routines as part of an XML pipeline processing model.

I wrote this article and this followup article as an introduction to "pipeline" processing of XML.

Bill


Java Resources at www.wbrogden.com
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: XML Parser
 
Similar Threads
Large number of external entity references
How to turn on Xerces validation? Help!!!
Problem related with valid parsers
XML parsers, encoding and byte order marks
Fixing an XML if it is unwell formed