• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
  • Bear Bibeault
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Piet Souris
  • salvin francis
  • Stephan van Hulst
Bartenders:
  • Frits Walraven
  • Carey Brown
  • Jj Roberts

Reading Polish Characters from URL

 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using an HttpURLConnection to request a page from a server that has polish text in it.
For example, the page has Sprawdzenie nośności z opiekunem but when I print out the response to the console, I get Sprawdzenie no?no?ci z opiekunem.

This is how I am making the request:



The response page is encoded with the polish charset ISO-8859-2. This is how I am reading the response:




Any help or suggestions would be greatly appreciated.
Please let me know if you need any more information (chris.mack@centimark.com)

Also, I have tried using the java.nio.charset.CharsetDecoder to decode the page. I read the stream in as bytes and placed the bytes into a ByteBuffer, which didn't work.

Thanks,

Chris

 
Ranch Hand
Posts: 225
Eclipse IDE Debian Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You may be reading the data correctly (although you could just write new InputStreamReader(istream,"ISO-8859-2")), but not getting it to display on the console. Try displaying the data you receive with the GUI:
 
Chris Mack
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would just like to say thanks you for your reply... I appreciate it.

I am using RAD as my IDE and I am putting a break point in the code right before the string buffer is being printed to the console. When the code stops running at my break point I check the contents os the string buffer. It also has the ? in the polish text. I suspect that when the text from the in.readLine() method is assigned to the String inputLine, the text is being converted to UTF-8 instead of maintaining the charset encoding.

Any other suggestions?

Thanks again,

Chris
 
Carey Evans
Ranch Hand
Posts: 225
Eclipse IDE Debian Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There won’t be any character conversion happening when assigning strings, since Java only copies a reference to the String object, and all Java strings are encoded in UTF-16 anyway. The InputStreamReader does the initial conversion from ISO-8859-2 to UTF-16, and the System.out.println() converts from UTF-16 back to the encoding in the file.encoding system property.

I wrote a short test program, and I can’t reproduce your problem. Can you see whether this works for you?

In this case, I get ? instead of ś and ż on my console, because its encoding is Cp1252, but JOptionPane displays the string correctly.
 
Chris Mack
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was able to get the characters to display in my RAD console by changing the JVM encoding to UTF-8 and changing the console font to a font that supports UTF-8 charset.

Thanks for your replies, much appreciated!

 
Yeah, but how did the squirrel get in there? Was it because of the tiny ad?
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic