This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes Converting special characters to hex code Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Converting special characters to hex code" Watch "Converting special characters to hex code" New topic
Author

Converting special characters to hex code

Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125
I have an Excel sheet, which I am reading using Apache POI.
After it is read from the excel file using the Java utility it is applied into some objects in documentum.

The content in Excel sheet can contain some special characters like , �� etc.

The requirement is that if any special character is encountered, it needs to be converted to corresponding hex codes, like 㑟

Suppose I have to consider everything bettwen the range 20A0-20CF and 2200-22FF as special charcters.

I am fetching the value of a cell from excel sheet as, say

String val = excl.getCell(1);

Now this "val" variable may contain some (special) character, which is withing the range 20A0-20CF or 2200-22FF. How do I get that character replaced with a hex code?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41089
    
  44
You'd have to iterate through all characters of that string, and do something like this:



Instead of the raw hex string, you probably have some special notation to use.

(Note that this doesn't account for characters that are not in the BMP, i.e., codepoints beyond 65535).
[ June 26, 2007: Message edited by: Ulf Dittmer ]

Ping & DNS - my free Android networking tools app
Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125

(Note that this doesn't account for characters that are not in the BMP, i.e., codepoints beyond 65535).


Thanks a lot for replying.

I have the following characters to support

Basic Latin (0000-007F)
Latin-1 Supplement (0080-00FF)
Latin Extended-A & B (0100-024F)
General Punctuation (2000-206F)
Currency Symbols (20A0-20CF)
Mathematical Operators (2200-22FF)
Combining Diacritical Marks (0300-036F)

Does it mean that the code which you have posted will not work for all these character ranges?

int codepoint = val.codePointAt(i)

Also, the ranges which I have are in the form of hex codes (as posted above). How can I find the corresponding codepoint, so that I know the ranges in codepoints (to make the above code work).
Thanks
[ June 26, 2007: Message edited by: Thomas Greene ]
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41089
    
  44
Does it mean that the code which you have posted will not work for all these character ranges?

No. Hex 22xx is way below 65000 (8000-something in fact), so you're good.
How can I find the corresponding codepoint, so that I know the ranges in codepoints (to make the above code work).

You don't need to. Wherever you want to use integers, you can write them in hexadecimal notation, preceded by a zero and an 'x'. E.g. "036F" would become "0x36f", and can be assigned or compared to an int (so-called hexdecimal literals).
Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125
Thanks for replying

I am using j2se 1.4

So instead of StringBuilder, I can use StringBuffer
But what should I use in place of codePointAt()
Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125
I did soemthing like this to convert characters to hex code



But the problem is that, I get the hex value "003f" for first and last value. How can there be same hex value for 2 characters?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41089
    
  44
Are you sure that those characters are properly encoded in that string? When I run it with String s="\u8730Abc\u8719", I get the desired result.
Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125
Originally posted by Ulf Dittmer:
Are you sure that those characters are properly encoded in that string? When I run it with String s="\u8730Abc\u8719", I get the desired result.


How to make sure that they are properly encoded. i copied these characters into my Java file from a web page.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41089
    
  44
The Java compiler assumes that source files are encoded in the platform default encoding. If you are using something else (e.g. UTF-8), then you need to tell the compiler about it. That's what the javac -encoding switch does; see here for an example.
[ June 27, 2007: Message edited by: Ulf Dittmer ]
Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125
Thanks a lot.
It works fine.

The final result that I have to achieve is that, read the values from an Excel file using POI (these values contains special characters), convert the special charcters to hex code and write them back to another excel file (using POI).
In this scenario how do I manage encoding. Will excel take care of encoding?
Will the ecoding be lost when the java program reads the values using POI?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41089
    
  44
Will excel take care of encoding? Will the ecoding be lost when the java program reads the values using POI?


You'll probably just have to try and find out. At the very least, Excel seems to store Unicode characters as such (I can see the Unicode value in the file if I look at it in a hex editor), so POI should have a good chance to retrieve that.
Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125
It is working fine

This is the code I have


There are couple of problems with this
1. it is every inefficient, since lot of string objects are getting created. I may have say, 20000 characters which need to be checked. This code will create huge number of objects.
2. I want this code to run (that is, teh conversion of special characters to hex code ) to happen only if the character that is being checked in not within the range of 0000-007F.
So I need something like

Please let me know how to do this
Thank You.
[ June 28, 2007: Message edited by: Thomas Greene ]
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41089
    
  44
if you're concerned about object creation, just create fewer of them. Something like this:
Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125
Originally posted by Ulf Dittmer:
if you're concerned about object creation, just create fewer of them. Something like this:

Thanks
i am using J2se 1.4, so charCodeAt() won't work.
What can be used instead?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41089
    
  44
i am using J2se 1.4, so charCodeAt() won't work. What can be used instead?


Sorry about that, just a typo (charCodeAt is not a Java method in any JDK) - I meant charAt.
Thomas Greene
Ranch Hand

Joined: Aug 09, 2004
Posts: 125
Thanks a lot Ulf
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Converting special characters to hex code
 
Similar Threads
Files processing on server
Editing Special Character
Excel encoding / charset to read multibyte characters from java
getting data from Excel
Write character to excel using POI