Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Removing prolog from an XML file

 
Prashant Mishra
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have program that uses JDOM to read through and extract information from a XML file. It works pretty fine with normal XML files, however, some files which I recieve from users have a prolog before the root element.
When I use the prgram to on these files I get the following error:

org.xml.sax.SAXParseException: Content is not allowed in prolog.

I know its becoz of the prolog but I cant ask the users to remove it.Can anybody suggest a work around? Or is there some way in which this offending prolog can be removed within my module?

Thanks in advance!!
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That sounds like a job for an input stream filter - a custom class that reads the input file up to the desired legal starting point and then acts like a normal input stream to feed the parser. I dont use JDOM so I cant be more specific.

Bill
 
Prashant Mishra
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there any documentation/links that you can point me to?

Thanks!!
 
Prashant Mishra
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I found some resources. I will read through them.

Thanks!!
 
Prashant Mishra
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi William,

Tried out the idea you suggested. Still the same results

Any other ideas will be welcome.

Thanks!!
 
Tim McGuire
Ranch Hand
Posts: 820
IntelliJ IDE Tomcat Server VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
try:

1. read the file into a stringbuffer using



2: write back out to a new file:

[ August 08, 2007: Message edited by: Tim McGuire ]
 
Paul Clapham
Sheriff
Pie
Posts: 20769
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Prashant Mishra:
I know its becoz of the prolog
This message quite often means there is content before the prolog. Commonly this content is whitespace which you don't notice.

It might help if you looked at the document again. Does the prolog start at the beginning of the first line? If it doesn't, then you have a malformed document. And you do have the right to ask people not to send you malformed documents.
 
Prashant Mishra
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem is that these XML files come from an Integration Scenario, where at times the middleware adds some header information before the root element of the file.These headers might have some weird characters, and might not be same for all files.I have to work on these XML files, and hence the need to some how cut of this "not required" information.
 
Paul Clapham
Sheriff
Pie
Posts: 20769
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then you need to get that middleware fixed. There's no excuse for sending malformed XML, especially if you are a program that's supposed to provide a service of transmitting XML documents.

But if you can't (yes, I know, we live in the real world) then remove everything before the first "<" character.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic