Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Determine Character Set of XMl file using Java

 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a XML file. I don�t know using which Character Set it was written. If I want to know the Character Set of the file, how to achieve this using Java?
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there an "encoding" attribute in the <?xml declaration?
Bill
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Moving to our XML forum...
 
Paul Clapham
Sheriff
Pie
Posts: 20768
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
XML has rules for determining the encoding of a document. You will find them in Appendix F of the XML Recommendation. As Bill suggests, part of the algorithm involves the "encoding" attribute of the document's prolog.

However it should never be necessary for you to have to do that. Just get an InputStream (not a Reader) that reads the document, and pass that to your XML parser. The parser should know the rules and deal with it accordingly.
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My problem is that the current program generates XML file without using any parser. XML file are getting generated as a flat file. They are not writing any encoding information in generated XML file. But it can have any encoding. So is there anyway by which I can determine the encoding of the XML file?

There is getEncoding() in InputStreamReader. If I use it, will it solve my problem? I am new to encoding part.

http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStreamReader.html
 
Paul Clapham
Sheriff
Pie
Posts: 20768
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Chetan Parekh:
They are not writing any encoding information in generated XML file. But it can have any encoding. So is there anyway by which I can determine the encoding of the XML file?
Then "they" may not be doing it correctly. If "they" don't declare an encoding in the XML document then they must encode the document as UTF-8 or UTF-16. This is not optional, it is required by the XML recommendation.

So if they are not doing that, it is not your responsibility to fix the problem. It is their problem.

However it is possible that they are not competent to fix the problem. In that case some human agent will have to determine the actual encoding of the file. There is no automated way of doing it.
[ November 06, 2006: Message edited by: Paul Clapham ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic