• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Problem with reading non-standard characters from XML using SAX

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am using SAX to deserialise some objects from an XML input. The problem is that a name that is read, André Gonçalves, cannot be read correctly by the parser (or this is the point where I identify the problem at least.)
In fact, when I print the output either to the console, or to a GUI text component, it appears like this: Marcos Andr� Gon�alves

Can I do something to correct this? It is really annoying...
 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Do you get the same problem (or other problems) when you open the file in Internet Explorer? Perhaps the encoding is simply incorrect.
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please check what happens if you give the output to a Java object. Try javax.swing.JOptionPane.showMessageDialog(null, "André Gonçalves"); and see what happens. The Windows console is bad at displaying non-ASCII characters.
 
Konstantinos Vasileiou
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Prime wrote:Do you get the same problem (or other problems) when you open the file in Internet Explorer? Perhaps the encoding is simply incorrect.


The encoding is set to ISO-8859-1 and the XML is displayed correctly from Firefox.
 
Konstantinos Vasileiou
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Please check what happens if you give the output to a Java object. Try javax.swing.JOptionPane.showMessageDialog(null, "André Gonçalves"); and see what happens. The Windows console is bad at displaying non-ASCII characters.



I tested it and it appears perfectly well!
By the way, I am using Ubuntu 9.04, so it is not related to the Windows console.

I really believe it has to do with the SAX parser that deserialises the entities... Is there some option I should have set to do it? It cannot be that difficult but still it is a very annoying little bug!
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
SAX parsers work perfectly well with all Unicode characters.

However your problem description is now confusing. It appears that you tested by outputting data from SAX to your console from XML, and had a problem there. Then you tested displaying a constant value into a GUI component, and that worked successfully. I don't see the test where you output data from SAX to a GUI component, and so it's still possible that your console is not a good testing tool for non-ASCII characters.

It's also possible that you are doing something like passing a Reader with the wrong encoding to the SAX parser, but you haven't posted any code so that's just speculation too.
 
Konstantinos Vasileiou
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:SAX parsers work perfectly well with all Unicode characters.

However your problem description is now confusing. It appears that you tested by outputting data from SAX to your console from XML, and had a problem there. Then you tested displaying a constant value into a GUI component, and that worked successfully. I don't see the test where you output data from SAX to a GUI component, and so it's still possible that your console is not a good testing tool for non-ASCII characters.

It's also possible that you are doing something like passing a Reader with the wrong encoding to the SAX parser, but you haven't posted any code so that's just speculation too.


First, I did not output the problematic data from SAX to the console - I created my objects first, using a SAX parser, and then printed the suspicious field of the object under discussion - and it appeared as I describe above. Outputting the data from the object to a TextArea still has the same problem for this name!

Hmmm. Maybe some code will be clarifying.
I have an XML file that contains data about some objects of my system. I deserialise the file into object instances with a class that uses the SAX parser. I use o CharArrayWriter for reading from the XML

and then


If the problem is not there, then the initialisation might be wrong?...
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Konstantinos Vasileiou wrote:Hmmm. Maybe some code will be clarifying.



Yes. But the clarifying code would be the code where you pass a File or an InputStream or something like that into the parser.
 
Konstantinos Vasileiou
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:

Konstantinos Vasileiou wrote:Hmmm. Maybe some code will be clarifying.



Yes. But the clarifying code would be the code where you pass a File or an InputStream or something like that into the parser.


Sorry.... Here it is:


I tested some more things and probably it is not a mistake of the SAX parser after all: I print the list of names the parsing module returns and the name is printed correctly. If I also print the String "André Gonçalves" to the GUI text components, it appears correctly as well. For some reason, the String loses the extra encoding information somewhere in the process? Is that possible?
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic