I have an Excel sheet, which I am reading using Apache POI. After it is read from the excel file using the Java utility it is applied into some objects in documentum.
The content in Excel sheet can contain some special characters like , �� etc.
The requirement is that if any special character is encountered, it needs to be converted to corresponding hex codes, like 㑟
Suppose I have to consider everything bettwen the range 20A0-20CF and 2200-22FF as special charcters.
I am fetching the value of a cell from excel sheet as, say
Now this "val" variable may contain some (special) character, which is withing the range 20A0-20CF or 2200-22FF. How do I get that character replaced with a hex code?
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35253
7
posted
0
You'd have to iterate through all characters of that string, and do something like this:
Instead of the raw hex string, you probably have some special notation to use.
(Note that this doesn't account for characters that are not in the BMP, i.e., codepoints beyond 65535). [ June 26, 2007: Message edited by: Ulf Dittmer ]
(Note that this doesn't account for characters that are not in the BMP, i.e., codepoints beyond 65535).
Thanks a lot for replying.
I have the following characters to support
Basic Latin (0000-007F) Latin-1 Supplement (0080-00FF) Latin Extended-A & B (0100-024F) General Punctuation (2000-206F) Currency Symbols (20A0-20CF) Mathematical Operators (2200-22FF) Combining Diacritical Marks (0300-036F)
Does it mean that the code which you have posted will not work for all these character ranges?
int codepoint = val.codePointAt(i)
Also, the ranges which I have are in the form of hex codes (as posted above). How can I find the corresponding codepoint, so that I know the ranges in codepoints (to make the above code work). Thanks [ June 26, 2007: Message edited by: Thomas Greene ]
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35253
7
posted
0
Does it mean that the code which you have posted will not work for all these character ranges?
No. Hex 22xx is way below 65000 (8000-something in fact), so you're good.
How can I find the corresponding codepoint, so that I know the ranges in codepoints (to make the above code work).
You don't need to. Wherever you want to use integers, you can write them in hexadecimal notation, preceded by a zero and an 'x'. E.g. "036F" would become "0x36f", and can be assigned or compared to an int (so-called hexdecimal literals).
I did soemthing like this to convert characters to hex code
But the problem is that, I get the hex value "003f" for first and last value. How can there be same hex value for 2 characters?
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35253
7
posted
0
Are you sure that those characters are properly encoded in that string? When I run it with String s="\u8730Abc\u8719", I get the desired result.
Thomas Greene
Ranch Hand
Joined: Aug 09, 2004
Posts: 125
posted
0
Originally posted by Ulf Dittmer: Are you sure that those characters are properly encoded in that string? When I run it with String s="\u8730Abc\u8719", I get the desired result.
How to make sure that they are properly encoded. i copied these characters into my Java file from a web page.
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35253
7
posted
0
The Java compiler assumes that source files are encoded in the platform default encoding. If you are using something else (e.g. UTF-8), then you need to tell the compiler about it. That's what the javac -encoding switch does; see here for an example. [ June 27, 2007: Message edited by: Ulf Dittmer ]
Thomas Greene
Ranch Hand
Joined: Aug 09, 2004
Posts: 125
posted
0
Thanks a lot. It works fine.
The final result that I have to achieve is that, read the values from an Excel file using POI (these values contains special characters), convert the special charcters to hex code and write them back to another excel file (using POI). In this scenario how do I manage encoding. Will excel take care of encoding? Will the ecoding be lost when the java program reads the values using POI?
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35253
7
posted
0
Will excel take care of encoding? Will the ecoding be lost when the java program reads the values using POI?
You'll probably just have to try and find out. At the very least, Excel seems to store Unicode characters as such (I can see the Unicode value in the file if I look at it in a hex editor), so POI should have a good chance to retrieve that.
Thomas Greene
Ranch Hand
Joined: Aug 09, 2004
Posts: 125
posted
0
It is working fine
This is the code I have
There are couple of problems with this 1. it is every inefficient, since lot of string objects are getting created. I may have say, 20000 characters which need to be checked. This code will create huge number of objects. 2. I want this code to run (that is, teh conversion of special characters to hex code ) to happen only if the character that is being checked in not within the range of 0000-007F. So I need something like
Please let me know how to do this Thank You. [ June 28, 2007: Message edited by: Thomas Greene ]
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35253
7
posted
0
if you're concerned about object creation, just create fewer of them. Something like this:
Thomas Greene
Ranch Hand
Joined: Aug 09, 2004
Posts: 125
posted
0
Originally posted by Ulf Dittmer: if you're concerned about object creation, just create fewer of them. Something like this:
Thanks i am using J2se 1.4, so charCodeAt() won't work. What can be used instead?
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35253
7
posted
0
i am using J2se 1.4, so charCodeAt() won't work. What can be used instead?
Sorry about that, just a typo (charCodeAt is not a Java method in any JDK) - I meant charAt.
Thomas Greene
Ranch Hand
Joined: Aug 09, 2004
Posts: 125
posted
0
Thanks a lot Ulf
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to
run our stuff on 16 servers instead of 3.
subject: Converting special characters to hex code