aspose file tools*
The moose likes Product and Other Certifications and the fly likes XML and  UTF-8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Certification » Product and Other Certifications
Bookmark "XML and  UTF-8" Watch "XML and  UTF-8" New topic
Author

XML and UTF-8

Anand Gondhiya
Ranch Hand

Joined: Feb 24, 2004
Posts: 155
Hi All,

1. following is the text from XML file which is supposed to be read using UTF-8. So I brought it up in IE to view. I tried to change the encoding to UTF-8 so that I can see bulllets instead of the weird characters like •. Am I doing the right thing / right way ?

It is expected that the incumbent meets the following selection criteria:

• A postgraduate degree, preferably Ph.D, in a relevant field such as economics, trade, competitiveness, industrial organization, private sector development. A multi-disciplinary background is an advantage.

• At least 12 years (15 with Master’s degree) relevant experience in trade and competitiveness.


2. Above text is part of CDATA section. As mentiioned above, with UTF-8 format these weird characters actually represent bullets. My java code reads this and copies the CDATA section with <!CDATA[ word and writes it to the output XML file. When I write to the output file , I convert the strings to UTF-8 expecting that the output file will show bulllets as it's already converted into UTF-8.

can anyone comment what am I doing wrong here ? I don't see bullets when I bring up the input file or output file.

Thanks
-Anand>
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42278
    
  64
Which software and which font are you using to view the XML? Does the software understand Unicode, and does it use UTF-8 when opening the file, and does the font include the character you're missing?

If you look at the output file with a hex editor, does it still have the correct character codes?


Ping & DNS - my free Android networking tools app
Anand Gondhiya
Ranch Hand

Joined: Feb 24, 2004
Posts: 155
Ulf,

I figured out that If I open the input.xml in Firefox , I can see the bullets. If I open the file in IE , IE won't let me change the encoding but with view source , I can see bullets.
in short , now I know the input file is correct and I can see them correctly.

Also , I wrote following code in Java to convert the text of input.xml to output.xml



If I open the output.xml in firefox , it DOESN"T show the bullets. If I open it in IE and do view source , it doesn't show bullets there as well. So my challenge is to convert the text into UTF-8 format using java in correct way.

this is really getting interesting. let me know if you have inputs. Thanks for your post !!

- Anand
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42278
    
  64
You don't need all those calls to getBytes - just open the FileOutputStream with an encoding of "UTF-8" and the conversion will happen automatically.

Make sure you're also specifying UTF-8 as the encoding in the XML file.
Anand Gondhiya
Ranch Hand

Joined: Feb 24, 2004
Posts: 155
Hi,

I wrote following code but it still is showing the special characters. any more ideas ?




- Anand
moe Mans
Greenhorn

Joined: Dec 02, 2009
Posts: 1
Hello,

How to add to an xml document instead of over writing it:

Any help will be much appreciated...thanks in advance.

I have managed to get my code to write an xml file with data from input fields from a jsp page... Now I actually need to add new entered details on the jsp page to the existing xml file instead of rewriting it everytime My sample code which currently rewrite the xml file is as follow bellow:



 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: XML and UTF-8