aspose file tools*
The moose likes XML and Related Technologies and the fly likes Parsing an Xml file with No xml Declaration Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Parsing an Xml file with No xml Declaration" Watch "Parsing an Xml file with No xml Declaration" New topic
Author

Parsing an Xml file with No xml Declaration

zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

Hello guys

I want to parse an XMl file that i am retrieving from a url,But the issue is that the file does not start with xml declaration as it is supposed to <?xml version="1.0" encoding="utf-8"?> but rather starts of without this,While trying to parse this file i am getting these errors


03-26 01:07:31.181: WARN/System.err(274): at org.apache.harmony.xml.ExpatParser.finish(ExpatParser.java:553)

03-26 01:07:31.181: WARN/System.err(274): at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:483)

03-26 01:07:31.181: WARN/System.err(274): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:320)

03-26 01:07:31.181: WARN/System.err(274): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:277)


How to solve this Problem, Any help would be appreciated and I am using SAXParser

Thanks & Regards,
Zoheb

P.s: I also found this error which i failed to put in the time asked this question guys
03-26 10:17:03.018: WARN/System.err(274): org.apache.harmony.xml.ExpatParser$ParseException: At line 2, column 0: no element found
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18150
    
    8

It's perfectly legitimate to have an XML document without a prolog. And given what you have posted, I don't see any evidence at all to point to that being your problem. I would suggest parsing the document with something which produces better error messages, so you can determine the actual problem.
zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

Hey Paul thanks for the reply but can you suggest how i can determine the problem, I found this error too but failed to put it at time of asking this question

03-26 10:17:03.018: WARN/System.err(274): org.apache.harmony.xml.ExpatParser$ParseException: At line 2, column 0: no element found

This might help in determing the problem
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12682
    
    5
I suggest that your input has leading blank lines or spaces before the root element tag.

Bill

Java Resources at www.wbrogden.com
zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

Hey William thanks for the reply but is there a way to handle those blank spaces and yet parse the file succesfully
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18150
    
    8

If somebody is sending you an XML document with spaces at the beginning, then they are sending you a document which isn't well-formed. In other words, it isn't XML. Tell them to send you well-formed documents in the future if they expect you to process them.
zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

I have reported them the same the response was they would look into it, I still am however curious if there exists any possibility to parse such document
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12682
    
    5
Sure!

The way Java IO classes are constructed it is easy to create your own extension of (for example) java.io.FilterReader and let it process the input stream of characters - or extension of java.io.FilterInputStream if you are reading bytes.

When first created, your custom class would read the input up to the first < character, then let subsequent characters be read by the parser.

Bill
I just realized that for this simple problem it would be simpler to use the existing classes PushbackInputStream or PushbackInputReader to read up to the first <, then let the parser handle the rest.

zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

Hello Guys based on your suggestion i tried this,



However i still am getting the same error, am i doing this in the manner you guys suggested or am i doing this wrong or incorrectly, However the error remains the same, I am posting it for your reference.Please take a look guys

03-28 18:39:56.149: WARN/System.err(5439): org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: syntax error

Thanks & Regards,
Zoheb
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12682
    
    5


No! that is the original stream, you want the modified stream which has read past the junk.



Bill
zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

Hey Bill i made the changes suggested by you


But still ends up returning the same error i reported in the previous post
I however am thankful for your interest in this problem
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18150
    
    8

You read through the stream until you have read a "<" character, and then you pass the rest of the document to the parser. To me it's pretty clear why that's wrong, so perhaps you just haven't taken the time to think about it.

Consider this question: why are you using a PushbackInputStream?
zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

I realized where i was goofing up and made the following changes to the code it should work but however i ran into problems again this time different


But the error i get is this

03-29 00:03:41.898: WARN/System.err(307): org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: unbound prefix

The xml is wish to parse is this



I fail to understand what the issue is
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12682
    
    5
Se we got past the syntax error, hurrah. Now for unbound prefix.

A google search for "unbound prefix xml parser" found this forum thread. Which I suspect will lead you to a solution.

Bill
zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

Got what unbound prefix means, It means that the prefix is not bound to an namespace. But the document does contain the namespace required


then the file is well formed then why does the parser throw an error
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18150
    
    8

Frankly I would try processing that document through another parser. The parser you're using gives really useless error messages. Who knows whether it is even working correctly?
g tsuji
Ranch Hand

Joined: Jan 18, 2011
Posts: 464
    
    2
[0] If I take the listed xml as shown in 3:29:44 post at its face value, I would be surprised the weather forecast site's service would serve the document without a doctype defining the entity &_deg; (no underscore) and with blanks before the root element aws:weather. But, suppose it really happen. In that case, the way to salvage it is to supply your own entity definition to it.

[1] And then, the SAXParserFactory should set NamespaceAware to true so that the content handler could popular correctly local name, in case the handler makes specific use of it.

[2] I would suggest something of this kind so that you can test it out properly. (It seems the site cannot post entity literally, so I put a underscore after & which should not be there---watch out.)
zoheb hassan
Ranch Hand

Joined: Apr 01, 2009
Posts: 146

Hey Guys Great News the thing started working and is working well now, I dont get why the errors began in the first place but now all things seem to work just great.But thanks for the support tough, learned a great deal about parsing, xml and specially PushBackInputStream a great relief tough kinda gave me sleepless nights.But all's well that ends well

P.s: I will be back
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Parsing an Xml file with No xml Declaration
 
Similar Threads
JNDI error message
can not deploy application!!!!
struts2.0 example console message
swing component exception
problems with vss in cruisecontrol