Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Problems parsing XML if an "&" occurs

 
Mark Mescher
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi out there,
I need to parse an xmlstream. I am using the Documentbuilderfactory for this and normaly all works fine. In some Tags sometimes a "&" occurs. I do not have the possibility to encode this data before creating the xmlstream. If the parser finds this symbol it throws a sax-parser-exception. Is there any way to parse this symbol without encoding it in the right way?
Thx.
Bye
Mark
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As far as I know unless you convert the value you insert in the xml into UTF-8 encoding the SAX parser will fail to validate the xml.

Try if possible to convert the value you write to UTF-8 format.

:roll:
 
Arun Prasath
Ranch Hand
Posts: 192
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
SAX Parser or any parser can only parse Well-formed XMLs.
If any XMLs contain any & inside the element, then it is not a well-formed xml. Ideally it should be replaced with &
You need to check for wellformedness of xml and then parse.
 
Mark Mescher
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem is that the xml isnt generated by me but by another software. So I am not able to encode the "&" before parsing. I could convert the stream to a string and replace all "&" by the correct encoding and after that run the parser. But isnt there an easier way?
Bye
Mark
 
Horatio Westock
Ranch Hand
Posts: 221
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could tell the vendor of the other software that they are producing invalid XML, and ask them to patch their software.

In the meantime, you could write a stream filter that seaches for and replaces the invalid characters before they reach the xml parser.
 
Mark Mescher
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes I think I have to do this. Seems without a filter this will not work.
OK thanky for your helpfull replies!
Bye
Mark
 
Mark Mescher
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi once more:-)
A little question: What is the correct UTF-8 encoding of &? Is there a easy way to encode a complete String to UTF-8?
Bye
Mark
 
Rene Larsen
Ranch Hand
Posts: 1179
Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could also try the use of 'CDATA' in your elements - or convert '&' into '& amp;' (without the space)



Rene
[ March 21, 2005: Message edited by: Rene Larsen ]
 
Mark Mescher
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
the cdata[[ works (I can define the xml-strucute in the thirdparty software but not the containing data)! So I dont have to encode manually.
Thx a lot!
Mark
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic