aspose file tools*
The moose likes Java in General and the fly likes Parsing problem of XML due to special character Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Parsing problem of XML due to special character" Watch "Parsing problem of XML due to special character" New topic
Author

Parsing problem of XML due to special character

Rahul Ba
Ranch Hand

Joined: Oct 01, 2008
Posts: 205
I have this xml...

<DATA>
<COMPLETEDATA>
<data name="title"><![CDATA[Viva’s Strategy for Growths]></data>
</COMPLETEDATA>
</DATA>

When I open this XML I get error as Invalid character found this is due to single code which is not in proper format but I can not control that , String comes from DB...Hence I used CDATA which should solve my problem, but it is not.

Please tell me remedy on this.

Thanks
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19696
    
  20

CDATA ends with ]]>, not just ]>.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Rahul Ba
Ranch Hand

Joined: Oct 01, 2008
Posts: 205
Yes, It's typo mistake but still my problem is not resolved yet...

<DATA>
<COMPLETEDATA>
<data name="title"><![CDATA[Viva’s Strategy for Growths]]></data>
</COMPLETEDATA>
</DATA>
It's still says Invalid character data
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

What character is causing the problem? How is it encoded in the DB?
Rahul Ba
Ranch Hand

Joined: Oct 01, 2008
Posts: 205
<DATA>
<COMPLETEDATA>
<data name="title"><![CDATA[Vivas Strategy for Growths]]></data>
</COMPLETEDATA>
</DATA>

You can see kind of single quote ..that is causing the problem.... There is no encoding in DB.... Any guess?
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

I find it improbable that a single quote in XML causes any sort of problem; XML contains single quotes all the time. Are you sure the data wasn't cut-and-pasted from a word processing program that used some sort of smart quote rather than an actual apostrophe?
Rahul Ba
Ranch Hand

Joined: Oct 01, 2008
Posts: 205
Yes, even I think the same thing. User might cut - pasted from somewhere, but now that data is in DB and I have to generate XML How to handle the such situations? Is there any alternatives to such problem?

David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

I'd first try escaping it with something like Commons' StringEscapeUtils.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12788
    
    5
If there is any chance of users cutting and pasting those ghastly MS word "smart" punctuation characters you need to protect the entire application by cleaning any input which may contain them. This is a well known problem.

Bill
Benjamin Hiner
Greenhorn

Joined: Feb 27, 2009
Posts: 12
There is even a FAQ about it on javaranch.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19696
    
  20

A horribly misspelled FAQ...
Benjamin Hiner
Greenhorn

Joined: Feb 27, 2009
Posts: 12
Rob Prime wrote:A horribly misspelled FAQ...


And virtually un-google-able as a result since google 'fixes' your spelling for you.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Completely google-able if you know to spell it wrong; quote the string. (First, and only, hit if the entire phrase is quoted.)

Maybe we should fix that.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19696
    
  20

Maybe we should.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41873
    
  63
I always figured that was an AE vs. BE thing, but a bit of searching seems to indicate that "wierd" isn't correct at all - correct? If so, we should definitely create a new page.


Ping & DNS - my free Android networking tools app
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Yes, "wierd" is a common misspelling of "weird" (probably based on the "rule" commonly stated as "I before E except after C") but it is a misspelling in all varieties of English.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41873
    
  63
OK, a new page now exists: http://faq.javaranch.com/java/WeirdWordCharacters
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Oh, wierd.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Parsing problem of XML due to special character