XML has rules for determining the encoding of a document. You will find them in Appendix F of the XML Recommendation. As Bill suggests, part of the algorithm involves the "encoding" attribute of the document's prolog.
However it should never be necessary for you to have to do that. Just get an InputStream (not a Reader) that reads the document, and pass that to your XML parser. The parser should know the rules and deal with it accordingly.
Joined: Sep 16, 2004
My problem is that the current program generates XML file without using any parser. XML file are getting generated as a flat file. They are not writing any encoding information in generated XML file. But it can have any encoding. So is there anyway by which I can determine the encoding of the XML file?
There is getEncoding() in InputStreamReader. If I use it, will it solve my problem? I am new to encoding part.
Originally posted by Chetan Parekh: They are not writing any encoding information in generated XML file. But it can have any encoding. So is there anyway by which I can determine the encoding of the XML file?
Then "they" may not be doing it correctly. If "they" don't declare an encoding in the XML document then they must encode the document as UTF-8 or UTF-16. This is not optional, it is required by the XML recommendation.
So if they are not doing that, it is not your responsibility to fix the problem. It is their problem.
However it is possible that they are not competent to fix the problem. In that case some human agent will have to determine the actual encoding of the file. There is no automated way of doing it. [ November 06, 2006: Message edited by: Paul Clapham ]
subject: Determine Character Set of XMl file using Java