File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes Determine Character Set of XMl file using Java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Determine Character Set of XMl file using Java" Watch "Determine Character Set of XMl file using Java" New topic
Author

Determine Character Set of XMl file using Java

Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
I have a XML file. I don�t know using which Character Set it was written. If I want to know the Character Set of the file, how to achieve this using Java?


My blood is tested +ve for Java.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12676
    
    5
Is there an "encoding" attribute in the <?xml declaration?
Bill
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

Moving to our XML forum...


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18124
    
    8

XML has rules for determining the encoding of a document. You will find them in Appendix F of the XML Recommendation. As Bill suggests, part of the algorithm involves the "encoding" attribute of the document's prolog.

However it should never be necessary for you to have to do that. Just get an InputStream (not a Reader) that reads the document, and pass that to your XML parser. The parser should know the rules and deal with it accordingly.
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
My problem is that the current program generates XML file without using any parser. XML file are getting generated as a flat file. They are not writing any encoding information in generated XML file. But it can have any encoding. So is there anyway by which I can determine the encoding of the XML file?

There is getEncoding() in InputStreamReader. If I use it, will it solve my problem? I am new to encoding part.

http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStreamReader.html
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18124
    
    8

Originally posted by Chetan Parekh:
They are not writing any encoding information in generated XML file. But it can have any encoding. So is there anyway by which I can determine the encoding of the XML file?
Then "they" may not be doing it correctly. If "they" don't declare an encoding in the XML document then they must encode the document as UTF-8 or UTF-16. This is not optional, it is required by the XML recommendation.

So if they are not doing that, it is not your responsibility to fix the problem. It is their problem.

However it is possible that they are not competent to fix the problem. In that case some human agent will have to determine the actual encoding of the file. There is no automated way of doing it.
[ November 06, 2006: Message edited by: Paul Clapham ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Determine Character Set of XMl file using Java
 
Similar Threads
Size of a String in Bytes
escapre character for xml
Determine Character Set of XMl file using Java
NumberFormat does not show right currency symbol
new line character