Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

XML Parser

 
Satish Kandagadla
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I have a really huge XML which is not a well formed one.

Is there any way to find out exactly where it is not well formed by reading the file in java?

My file size is a 100 MB one and it eats up a lot of memory in opening through any XML editor. Moreover there are 2 million lines and editor for sure cant help. If I try to parse through any parser it will fail in the first step saying that it is not well formed. Any inputs would really help.

Thanks,
Satish
 
Paul Clapham
Sheriff
Pie
Posts: 20955
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could do that with a SAX parser. Here is an example of how to use a SAX parser and a Locator to find the current parse location. Save the information and use it when the exception is thrown.

I assume this enormous malformed document is being generated by a computer program, and you're trying to fix that program?
 
Satish Kandagadla
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul Clapham:
You could do that with a SAX parser. Here is an example of how to use a SAX parser and a Locator to find the current parse location. Save the information and use it when the exception is thrown.

I assume this enormous malformed document is being generated by a computer program, and you're trying to fix that program?


Thanks for the Reply. The XML is from a product. I do not have access to how the product generates the XML. My intention is to figure out how many missing tags are there and then find out how to fix it.

Well the code that you pointed me to assumes that the xml is well formed or does it work on any XML?
[ November 19, 2008: Message edited by: Satish Kandagadla ]
 
Paul Clapham
Sheriff
Pie
Posts: 20955
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Only well-formed documents are XML. But a SAX parser will process a document until it reaches a place where it sees a problem.

In my opinion you would be better off getting the people who are sending you garbage documents to fix those documents themselves.
 
Satish Kandagadla
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul Clapham:
Only well-formed documents are XML. But a SAX parser will process a document until it reaches a place where it sees a problem.

In my opinion you would be better off getting the people who are sending you garbage documents to fix those documents themselves.


Thanks Paul. Yes there are challenges in the project to get the XML from them but I see no other way in getting the proper XML apart from approaching them. My life will be lot easier if I get a well formed XML.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13058
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
IF - the XML you are getting is consistent in terms of where the markup is incorrect, you might be able to to code fixup routines as part of an XML pipeline processing model.

I wrote this article and this followup article as an introduction to "pipeline" processing of XML.

Bill
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic