wood burning stoves 2.0*
The moose likes Java in General and the fly likes Error while parsing html page in java on linux Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Error while parsing html page in java on linux" Watch "Error while parsing html page in java on linux" New topic
Author

Error while parsing html page in java on linux

Rahul Dhaware
Greenhorn

Joined: Aug 13, 2008
Posts: 22
I am parsing HTML page using some html parsing utility. i am using cobra.jar and js.jar for that.

There are some unreadable special charactor like ' � ' but when I compiled my program in windows it compile properly and run fine.

But when i compiled it in linux it gives me followig Warning:
unmappable character for encoding UTF8
String stateZipArray[] = stateZip.trim().split(" � ");

and then while accessing elements from stateZipArray array it gives ArrayIndexBounds exception.

In InputStreamReader class i am using 'ISO-8859-1' as a charsetname.

Can any one please tell me what is problem and how can i resolve it?

Thanks in advance.
Martijn Verburg
author
Bartender

Joined: Jun 24, 2003
Posts: 3274
    
    5

It sounds like you're mixing and matching your encoding types. Try using UTF-8 in your InputStreamReader and also read this article


Cheers, Martijn - Blog,
Twitter, PCGen, Ikasan, My The Well-Grounded Java Developer book!,
My start-up.
Rahul Dhaware
Greenhorn

Joined: Aug 13, 2008
Posts: 22
I have tried using UTF-8 in constructor of InputStreamReader.
It it not works. it gives me same error.
Martijn Verburg
author
Bartender

Joined: Jun 24, 2003
Posts: 3274
    
    5

Have you read the article link I posted? It gives you vital understanding of these sorts of problems...
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Error while parsing html page in java on linux