aspose file tools*
The moose likes JDBC and the fly likes UTF-8 query problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "UTF-8 query problem" Watch "UTF-8 query problem" New topic
Author

UTF-8 query problem

Sai Narasimha Reddy
Greenhorn

Joined: Dec 13, 2006
Posts: 23
i'm developing a web-based setup wizard for my appliation. it reads a utf-8 encoded file and executes the queries in the file.

my reading code...



when i read the file using the above code, i'm getting a '?' symbol as a very first character. due to that i'm getting a Exception saying ..


[ July 27, 2008: Message edited by: Scott Selikoff ]

Sai Narasimha
Sai Narasimha Reddy
Greenhorn

Joined: Dec 13, 2006
Posts: 23
when i display the first line read from the file in my windows-cmd prompt, the weird '?' is displayed there too.....

does it mean that all the JDBC-queries are ANSI encoded??.....i know that's stupid of me saying that......

but i urgently need a solution for this problem.....

see this tutorial's output.....the users there also posted that they're having the same problem!
http://www.roseindia.net/java/example/java/io/ReadUTF8.shtml
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18902
    
    8

Perhaps your file has a BOM (Byte Order Mark) at the beginning? Look at it with a hex editor to find out. And if so, skip over it before trying to use the contents of the file.
Sai Narasimha Reddy
Greenhorn

Joined: Dec 13, 2006
Posts: 23
why doesn't java filter that BOM before giving me the file contents??....i've read the file using "utf-8" character encoding only.....



did anybody face the same problem??

is there any way out of this without i having to ignore the BOM manually??.....does this problem occur only in windows or it occurs in linux also??. If it doesn't occur in linux then how to make my code portable??
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18902
    
    8

Well, it doesn't. And no, you aren't the only one in the world to experience this problem. And I expect it's the same in Unix as it is in Windows, although it wouldn't be that hard to just try it if that wasn't just a rhetorical question.

If you know in advance that there's going to be a BOM then it's just a couple of lines of code to skip over it. If you don't, it's a little bit more complicated (and a PushbackInputStream can help).
Sai Narasimha Reddy
Greenhorn

Joined: Dec 13, 2006
Posts: 23
Thanks paul......for the PushbackInputStream suggestion!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: UTF-8 query problem