File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes Removing prolog from an XML file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Removing prolog from an XML file" Watch "Removing prolog from an XML file" New topic
Author

Removing prolog from an XML file

Prashant Mishra
Greenhorn

Joined: May 05, 2005
Posts: 12
I have program that uses JDOM to read through and extract information from a XML file. It works pretty fine with normal XML files, however, some files which I recieve from users have a prolog before the root element.
When I use the prgram to on these files I get the following error:

org.xml.sax.SAXParseException: Content is not allowed in prolog.

I know its becoz of the prolog but I cant ask the users to remove it.Can anybody suggest a work around? Or is there some way in which this offending prolog can be removed within my module?

Thanks in advance!!
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12682
    
    5
That sounds like a job for an input stream filter - a custom class that reads the input file up to the desired legal starting point and then acts like a normal input stream to feed the parser. I dont use JDOM so I cant be more specific.

Bill


Java Resources at www.wbrogden.com
Prashant Mishra
Greenhorn

Joined: May 05, 2005
Posts: 12
Is there any documentation/links that you can point me to?

Thanks!!
Prashant Mishra
Greenhorn

Joined: May 05, 2005
Posts: 12
I found some resources. I will read through them.

Thanks!!
Prashant Mishra
Greenhorn

Joined: May 05, 2005
Posts: 12
Hi William,

Tried out the idea you suggested. Still the same results

Any other ideas will be welcome.

Thanks!!
Tim McGuire
Ranch Hand

Joined: Apr 30, 2003
Posts: 820

try:

1. read the file into a stringbuffer using



2: write back out to a new file:

[ August 08, 2007: Message edited by: Tim McGuire ]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18141
    
    8

Originally posted by Prashant Mishra:
I know its becoz of the prolog
This message quite often means there is content before the prolog. Commonly this content is whitespace which you don't notice.

It might help if you looked at the document again. Does the prolog start at the beginning of the first line? If it doesn't, then you have a malformed document. And you do have the right to ask people not to send you malformed documents.
Prashant Mishra
Greenhorn

Joined: May 05, 2005
Posts: 12
The problem is that these XML files come from an Integration Scenario, where at times the middleware adds some header information before the root element of the file.These headers might have some weird characters, and might not be same for all files.I have to work on these XML files, and hence the need to some how cut of this "not required" information.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18141
    
    8

Then you need to get that middleware fixed. There's no excuse for sending malformed XML, especially if you are a program that's supposed to provide a service of transmitting XML documents.

But if you can't (yes, I know, we live in the real world) then remove everything before the first "<" character.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Removing prolog from an XML file
 
Similar Threads
How to incluse a dtd.
Parsing Exception with JAXB 1.0.6 : XML files containing UTF-8 characters
special character in prolog
Need to add an invalid UTF-8 character
UTF-16 Encoding -Content is not allowed in prolog.