Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Parsing problem of XML due to special character

 
Rahul Ba
Ranch Hand
Posts: 206
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have this xml...

<DATA>
<COMPLETEDATA>
<data name="title"><![CDATA[Viva’s Strategy for Growths]></data>
</COMPLETEDATA>
</DATA>

When I open this XML I get error as Invalid character found this is due to single code which is not in proper format but I can not control that , String comes from DB...Hence I used CDATA which should solve my problem, but it is not.

Please tell me remedy on this.

Thanks
 
Rob Spoor
Sheriff
Pie
Posts: 20527
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
CDATA ends with ]]>, not just ]>.
 
Rahul Ba
Ranch Hand
Posts: 206
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, It's typo mistake but still my problem is not resolved yet...

<DATA>
<COMPLETEDATA>
<data name="title"><![CDATA[Viva’s Strategy for Growths]]></data>
</COMPLETEDATA>
</DATA>
It's still says Invalid character data
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What character is causing the problem? How is it encoded in the DB?
 
Rahul Ba
Ranch Hand
Posts: 206
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
<DATA>
<COMPLETEDATA>
<data name="title"><![CDATA[Vivas Strategy for Growths]]></data>
</COMPLETEDATA>
</DATA>

You can see kind of single quote ..that is causing the problem.... There is no encoding in DB.... Any guess?
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I find it improbable that a single quote in XML causes any sort of problem; XML contains single quotes all the time. Are you sure the data wasn't cut-and-pasted from a word processing program that used some sort of smart quote rather than an actual apostrophe?
 
Rahul Ba
Ranch Hand
Posts: 206
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, even I think the same thing. User might cut - pasted from somewhere, but now that data is in DB and I have to generate XML How to handle the such situations? Is there any alternatives to such problem?

 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd first try escaping it with something like Commons' StringEscapeUtils.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13061
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If there is any chance of users cutting and pasting those ghastly MS word "smart" punctuation characters you need to protect the entire application by cleaning any input which may contain them. This is a well known problem.

Bill
 
Benjamin Hiner
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is even a FAQ about it on javaranch.
 
Rob Spoor
Sheriff
Pie
Posts: 20527
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A horribly misspelled FAQ...
 
Benjamin Hiner
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote:A horribly misspelled FAQ...


And virtually un-google-able as a result since google 'fixes' your spelling for you.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Completely google-able if you know to spell it wrong; quote the string. (First, and only, hit if the entire phrase is quoted.)

Maybe we should fix that.
 
Rob Spoor
Sheriff
Pie
Posts: 20527
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maybe we should.
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I always figured that was an AE vs. BE thing, but a bit of searching seems to indicate that "wierd" isn't correct at all - correct? If so, we should definitely create a new page.
 
Paul Clapham
Sheriff
Posts: 21107
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, "wierd" is a common misspelling of "weird" (probably based on the "rule" commonly stated as "I before E except after C") but it is a misspelling in all varieties of English.
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, a new page now exists: http://faq.javaranch.com/java/WeirdWordCharacters
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oh, wierd.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic