aspose file tools*
The moose likes Java in General and the fly likes Problems parsing XML if an Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Problems parsing XML if an "&" occurs" Watch "Problems parsing XML if an "&" occurs" New topic
Author

Problems parsing XML if an "&" occurs

Mark Mescher
Ranch Hand

Joined: Oct 25, 2004
Posts: 34
Hi out there,
I need to parse an xmlstream. I am using the Documentbuilderfactory for this and normaly all works fine. In some Tags sometimes a "&" occurs. I do not have the possibility to encode this data before creating the xmlstream. If the parser finds this symbol it throws a sax-parser-exception. Is there any way to parse this symbol without encoding it in the right way?
Thx.
Bye
Mark
Pradeep Kadambar
Ranch Hand

Joined: Oct 18, 2004
Posts: 148
As far as I know unless you convert the value you insert in the xml into UTF-8 encoding the SAX parser will fail to validate the xml.

Try if possible to convert the value you write to UTF-8 format.

:roll:
Arun Prasath
Ranch Hand

Joined: Sep 17, 2003
Posts: 192
SAX Parser or any parser can only parse Well-formed XMLs.
If any XMLs contain any & inside the element, then it is not a well-formed xml. Ideally it should be replaced with &
You need to check for wellformedness of xml and then parse.


SCJP 1.4, SCDJWS , SCJA<br />I can do ALL things through CHRIST who strengthens me.
Mark Mescher
Ranch Hand

Joined: Oct 25, 2004
Posts: 34
The problem is that the xml isnt generated by me but by another software. So I am not able to encode the "&" before parsing. I could convert the stream to a string and replace all "&" by the correct encoding and after that run the parser. But isnt there an easier way?
Bye
Mark
Horatio Westock
Ranch Hand

Joined: Feb 23, 2005
Posts: 221
You could tell the vendor of the other software that they are producing invalid XML, and ask them to patch their software.

In the meantime, you could write a stream filter that seaches for and replaces the invalid characters before they reach the xml parser.
Mark Mescher
Ranch Hand

Joined: Oct 25, 2004
Posts: 34
Yes I think I have to do this. Seems without a filter this will not work.
OK thanky for your helpfull replies!
Bye
Mark
Mark Mescher
Ranch Hand

Joined: Oct 25, 2004
Posts: 34
Hi once more:-)
A little question: What is the correct UTF-8 encoding of &? Is there a easy way to encode a complete String to UTF-8?
Bye
Mark
Rene Larsen
Ranch Hand

Joined: Oct 12, 2001
Posts: 1179

You could also try the use of 'CDATA' in your elements - or convert '&' into '& amp;' (without the space)



Rene
[ March 21, 2005: Message edited by: Rene Larsen ]

Regards, Rene Larsen
Dropbox Invite
Mark Mescher
Ranch Hand

Joined: Oct 25, 2004
Posts: 34
Hi,
the cdata[[ works (I can define the xml-strucute in the thirdparty software but not the containing data)! So I dont have to encode manually.
Thx a lot!
Mark
 
jQuery in Action, 2nd edition
 
subject: Problems parsing XML if an "&" occurs