This week's book giveaways are in the Angular and TypeScript and Web Services forums.
We're giving away four copies each of Programming with Types and The Design of Web APIs and have the authors on-line!
See this thread and this one for details.
Win a copy of Programming with Types this week in the Angular and TypeScript forum
or The Design of Web APIs in the Web Services forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Knute Snortum
  • Henry Wong
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Frits Walraven
  • Joe Ess
  • salvin francis

Reading from URL, problems with encoding

 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am trying to write an application that would read a file from the internet (www.example.com/file.html), do some editing and then write it to a file on my disk. The problem is that central european characters are not shown correctly in the file on my disk. I know that web page uses iso-8859-2. I tried a few things but was not successful. How should I modify my code to get the proper result?

[ October 11, 2003: Message edited by: Alex Gli ]
 
author
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The trick is to get encodings right.This is mistake #1. The InputStreamReader needs to know about the specific encoding it is getting -- pull it from the HTTP response headers or, slightly uglier, hardcode iso-8859-2. Check the javadoc API for the appropriate constructor.And this is certainly a cardinal sin in internationalised Java. You are using the write(int) method of OutputStream, which will just chop off the top 8 bits of your char and write out a byte. This basically ignores any encoding that's being used and will only ever work properly for 7-bits ASCII stuff. What you need to do is use FileWriter instead of FileOutputStream; this will write Strings directly using your default encoding. Alternatively, if the default encoding won't do, simply wrap your FileOutputStream inside an OutputStreamWriter; you can use the latter's constructor to ask for any encoding that takes your fancy. As long as it is supported by your JRE, of course.
- Peter
[ October 11, 2003: Message edited by: Peter den Haan ]
 
Alex Gli
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanx for your suggestions Peter, i kind of got it working. Now can you help with some code that would get encoding of a particular file on the internet. Is there a method or do I have to check for <meta> tag to get proper encoding?
thanx in advance
 
Peter den Haan
author
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The character set used is returned as part of the HTTP headers, not necessarily of the actual response body. For instance, this JavaRanch page arrived at my browser with the following headers:(courtesy of Mozilla Firebird with the Live HTTP Headers plugin). As you see, it's the Content-Type header that (optionally) supplies you with the encoding being used on the web page. To get at the HTTP headers, don't open the input stream from the URL object but open the connection explicitly:Hope this helps,
- Peter
 
Doody calls. I would really rather that it didn't. Comfort me wise and sterile tiny ad:
Java file APIs (DOC, XLS, PDF, and many more)
https://products.aspose.com/total/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!