File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes I/O and Streams and the fly likes Help with reading text containing non-ascii character Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Help with reading text containing non-ascii character" Watch "Help with reading text containing non-ascii character" New topic
Author

Help with reading text containing non-ascii character

Sara Ku
Greenhorn

Joined: May 03, 2011
Posts: 3
Hi guys,

I need some help with reading text containing non-ascii characters from an excel file.

For example, I want to be able to detect the "©" symbol in the below text and convert it into the unicode escape sequence \u00A9 and store this to the database. The end consumer of this text is a Web browser, so this conversion is needed.

Copyright © 2005-2009

I have been trying different ideas to get this working, but I always end up with the an unreadable character for the symbol.

Thanks in advance!

Sara

Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

You need to read this article: Character Conversions from Browser to Database.
Sara Ku
Greenhorn

Joined: May 03, 2011
Posts: 3
Thanks, the article was a good read. But my question was more on the lines of the parameters to set (if any) while using I/O streams in Java to read the file. Right now my focus is more on the reading, storing, retrieving correctly part.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

Sara Ku wrote:But my question was more on the lines of the parameters to set (if any) while using I/O streams in Java to read the file.


I don't understand what you mean by "parameters". And I thought you were reading an Excel document? Perhaps you could explain how you're doing that right now.

Also...

Sara Ku wrote:I want to be able to detect the "©" symbol in the below text and convert it into the unicode escape sequence \u00A9 and store this to the database.


Don't do that. Java understands Unicode. Your database understands Unicode. Your web application understands Unicode and so do the browsers that use it. So just use Unicode characters as is. Converting them to something else is going to be wasteful and error-prone. Converting them to Java source code Unicode escapes is especially so.
Sara Ku
Greenhorn

Joined: May 03, 2011
Posts: 3
You are right, I was not very clear. I took a second look at my code and see that there is no issue with reading from the file. Something gets messed in the process of storing in database and retrieving it. I will look deeper.

Thanks for the help.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Help with reading text containing non-ascii character