All, I am a rather curious problem thats bothering me... The following code snippet
works ok for a few pages and bombs out for others. As far as i am aware, all the content that is being attempted to be served is UTF-8.. so i am bit bemused. I tried to remove the UTF-8 setting.. and still i got a malformed IO exception. Beats me... I am using WebSphere 4.0.6 if thats of any help.
As far as i am aware, all the content that is being attempted to be served is UTF-8. Well, maybe that's not the case. You probably need to gather more info. How about using the getContentType() to find out if whatever you're connecting to has provided you with any more info about the encoding? Typically this may be something like "text/html; charset=UTF-8" So you can try parsing out a charset from this field. If none is provided, the default is supposed to be ISO-8859-1. In practice it is sadly not that unusual for servers to fail to specify these fields correctly. The next line of defense is to initially assume the encoding is ISO-8859-1, and use that to interpret the subsequent HTML, and look for a meta tag which has the real encoding in it. E.g. <meta http-equiv="Content-Type" content="text/html; charset=Shirt-JIS"> Natually, servers that feil to properly specify their encodings are EVIL. But we may have to deal with them nonetheless...
Jim, Many thanks for your response.. I am going to try this at work tomorrow. For completion of information, the application server is websphere. But the approach you have outlined could be useful. Let me try and get back to you.
Jim... realised i have some sample data with which I could try this from where I am now... THis is what the Content type comes out as: "text/html;charset=Cp1252" And, so there is a reason why it is failing... but.. I think i need to delve into a bit more detail to have a meaningful clarification of this problem: The data that I am trying to retrieve is a HTML document that is stored in the database. The HTML was stored using the following setter method:
The database (Which is db2) has a codeset which is UTF-8. I know that this gets reflected as CP1252. So.. the question is ... is there a confusion/ conflict in the way data is stored and retrieved.. or is this still an issue which the application server should be capable of handling?
Best Regards,<br />Nagendra Prasad.
I once met a man from Nantucket. He had a tiny ad
a bit of art, as a gift, that will fit in a stocking