This week's book giveaway is in the Testing forum.
We're giving away four copies of Practical Unit Testing with TestNG and Mockito and have Tomek Kaczanowski on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes Illgeal XML Characters??? Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of Practical Unit Testing with TestNG and Mockito this week in the Testing forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Reply Bookmark "Illgeal XML Characters???" Watch "Illgeal XML Characters???" New topic
Author

Illgeal XML Characters???

Naresh Babu
Greenhorn

Joined: Mar 20, 2001
Posts: 29
Hi,

I got a strange problem. Our application converts data retreived from db to XML
document and passes the XML doc. to the Client.
At the client level, error as
Illegal XML Characters : & # x 1 f ; .
is obtained. there is no whitespace
inbetween the characters.
This happens only for few records present in
db.And in these few records also, there are no illegal
XML characters present (as shown in error).So, why is
it displaying illegal Characters though they are not
present in db.Please, advise me : where could be the
possible problem in XML???
Thanks A Lot
naresh

[This message has been edited by Naresh Babu (edited June 25, 2001).]
[This message has been edited by Naresh Babu (edited June 25, 2001).]
[This message has been edited by Naresh Babu (edited June 25, 2001).]
[This message has been edited by Naresh Babu (edited June 25, 2001).]
[This message has been edited by Naresh Babu (edited June 25, 2001).]
[This message has been edited by Naresh Babu (edited June 25, 2001).]
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 11862
That looks to me like the ascii control code 0x1F or ctrl-z that is used as an end of file mark in some applications, especially older ones. There are several codes in the 0 - 0x1F range that are illegal in XML.
I had a problem like that with text written by Microsoft word.
I don't know any simple cure, you will have to filter out those characters somehow.
Bill

------------------
author of:


Java Resources at www.wbrogden.com
Naresh Babu
Greenhorn

Joined: Mar 20, 2001
Posts: 29
Thanks! for the reply.But, i couldnt get why this exception
is thrown for particular records only.And since , there
are no special characters in the record data. Please,
explain in detail what is the cause of this error.Please, suggest me where i could get good documentation to filter such characters.

Thanks A lot.
Naresh.
raj betapudi
Greenhorn

Joined: Jun 26, 2001
Posts: 3
Remember, XML uses UTF-8 strictly, however, other systems uses other Unicode (like UTF-16). I wrote a simple java program that would read each character (of database records or text files) and checks whether it is a isUnicodeIdentifierPart or space and so on. I will print out every thing that is a special character. That way, I know what I am getting into.
Also, you can use online UTF converter which I found very useful: http://members.home.net/markdavis34/unicode/convert.html
You can also use native2ascii program that is part of Sun's JDK.
I hope this information helps.
Raj
------------------
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 11862
"Thanks! for the reply.But, i couldnt get why this exception
is thrown for particular records only.And since , there
are no special characters in the record data. "
I'll bet you that there are special characters - you just can't see them because they are non-printing control codes.
When this happened to me I had to haul out UltraEdit-32 (a great programmer's editor by the way) to see what MS Word was dumping into what I thought was clean text.
Bill
Naresh Babu
Greenhorn

Joined: Mar 20, 2001
Posts: 29
Thanks! A lot for the immediate replies .I was able to solve the problem based on the explanation of William. This is a
Great Place to improve our knowledge.
Thanks
Naresh
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 11862
Did you find out how the weird characters were getting into your database entries in the first place?
curious
Naresh Babu
Greenhorn

Joined: Mar 20, 2001
Posts: 29
Originally posted by William Brogden:
Did you find out how the weird characters were getting into your database entries in the first place?
curious

Sorry, for the delay in reply.we had this problem for a Application(older version) in production environment.I was not able to replicate the error. i definitely , would like to
replicate the error , can u suggest me how to ??
Thanks
Naresh
Naresh Babu
Greenhorn

Joined: Mar 20, 2001
Posts: 29
Hi,
Though i was able to temporarily solve the problem , its
repeating for other records also.Does it depend also
on db i.e if db is other that UTF8 OR is it purely
depenedent on application.Please, suggest me how to catch and filter these illegal characters.

Thanks
Naresh.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 11862
I have no idea where those characters are coming from, so I can't help you there. If the only weird character you are seing is the 0x1F - or control-z, you could simply patch every String retrieved from the database with the String replace method.
myString = myString.replace( '\u001f', ' '); // substitute space
(I think I remembered method that right)
Bill
 
IntelliJ Java IDE
 
subject: Illgeal XML Characters???
 
Threads others viewed
Casting Question
Read data from Excel???
Bean Scope??
Update data to excel
efficient way to update data
IntelliJ Java IDE