This week's book giveaway is in the OO, Patterns, UML and Refactoring forum.
We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line!
See this thread for details.
The moose likes Beginning Java and the fly likes Problem with java reading a webpage Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Problem with java reading a webpage" Watch "Problem with java reading a webpage" New topic

Problem with java reading a webpage

Colin A Thompson

Joined: Dec 08, 2009
Posts: 2
I am having a problem with copying a webpage to a text file. Every time I run my program for this one site the text that gets copied has Asian characters. There are no Asian characters on the page.

I tried my code on other websites and it works fine. Are there security measures that prevent a web page from being copied?

The website I am having problems with is public information and I am not reselling anything of theirs.
Paul Clapham

Joined: Oct 14, 2005
Posts: 19719

Possibly you are using the wrong charset to convert the downloaded data from bytes to chars. That's just my first guess, though, I'm sure there could be dozens of other things wrong. You don't provide many details for us to comment on.
Campbell Ritchie

Joined: Oct 13, 2005
Posts: 43885
Also some text editors or terminal windows may be only able to display ASCII or Latin-1 characters.
Have you checked out Aspose?
subject: Problem with java reading a webpage
It's not a secret anymore!