This week's book giveaway is in the Testing forum. We're giving away four copies of Practical Unit Testing with TestNG and Mockito and have Tomek Kaczanowski on-line! See this thread for details.
I got a strange problem. Our application converts data retreived from db to XML document and passes the XML doc. to the Client. At the client level, error as Illegal XML Characters : & # x 1 f ; . is obtained. there is no whitespace inbetween the characters. This happens only for few records present in db.And in these few records also, there are no illegal XML characters present (as shown in error).So, why is it displaying illegal Characters though they are not present in db.Please, advise me : where could be the possible problem in XML??? Thanks A Lot naresh
[This message has been edited by Naresh Babu (edited June 25, 2001).] [This message has been edited by Naresh Babu (edited June 25, 2001).] [This message has been edited by Naresh Babu (edited June 25, 2001).] [This message has been edited by Naresh Babu (edited June 25, 2001).] [This message has been edited by Naresh Babu (edited June 25, 2001).] [This message has been edited by Naresh Babu (edited June 25, 2001).]
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11862
posted
0
That looks to me like the ascii control code 0x1F or ctrl-z that is used as an end of file mark in some applications, especially older ones. There are several codes in the 0 - 0x1F range that are illegal in XML. I had a problem like that with text written by Microsoft word. I don't know any simple cure, you will have to filter out those characters somehow. Bill
Thanks! for the reply.But, i couldnt get why this exception is thrown for particular records only.And since , there are no special characters in the record data. Please, explain in detail what is the cause of this error.Please, suggest me where i could get good documentation to filter such characters.
Thanks A lot. Naresh.
raj betapudi
Greenhorn
Joined: Jun 26, 2001
Posts: 3
posted
0
Remember, XML uses UTF-8 strictly, however, other systems uses other Unicode (like UTF-16). I wrote a simple java program that would read each character (of database records or text files) and checks whether it is a isUnicodeIdentifierPart or space and so on. I will print out every thing that is a special character. That way, I know what I am getting into. Also, you can use online UTF converter which I found very useful: http://members.home.net/markdavis34/unicode/convert.html You can also use native2ascii program that is part of Sun's JDK. I hope this information helps. Raj ------------------
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11862
posted
0
"Thanks! for the reply.But, i couldnt get why this exception is thrown for particular records only.And since , there are no special characters in the record data. " I'll bet you that there are special characters - you just can't see them because they are non-printing control codes. When this happened to me I had to haul out UltraEdit-32 (a great programmer's editor by the way) to see what MS Word was dumping into what I thought was clean text. Bill
Naresh Babu
Greenhorn
Joined: Mar 20, 2001
Posts: 29
posted
0
Thanks! A lot for the immediate replies .I was able to solve the problem based on the explanation of William. This is a Great Place to improve our knowledge. Thanks Naresh
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11862
posted
0
Did you find out how the weird characters were getting into your database entries in the first place? curious
Naresh Babu
Greenhorn
Joined: Mar 20, 2001
Posts: 29
posted
0
Originally posted by William Brogden: Did you find out how the weird characters were getting into your database entries in the first place? curious
Sorry, for the delay in reply.we had this problem for a Application(older version) in production environment.I was not able to replicate the error. i definitely , would like to replicate the error , can u suggest me how to ?? Thanks Naresh
Naresh Babu
Greenhorn
Joined: Mar 20, 2001
Posts: 29
posted
0
Hi, Though i was able to temporarily solve the problem , its repeating for other records also.Does it depend also on db i.e if db is other that UTF8 OR is it purely depenedent on application.Please, suggest me how to catch and filter these illegal characters.
Thanks Naresh.
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11862
posted
0
I have no idea where those characters are coming from, so I can't help you there. If the only weird character you are seing is the 0x1F - or control-z, you could simply patch every String retrieved from the database with the String replace method. myString = myString.replace( '\u001f', ' '); // substitute space (I think I remembered method that right) Bill