• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Converting special characters to hex code

 
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have an Excel sheet, which I am reading using Apache POI.
After it is read from the excel file using the Java utility it is applied into some objects in documentum.

The content in Excel sheet can contain some special characters like , �� etc.

The requirement is that if any special character is encountered, it needs to be converted to corresponding hex codes, like 㑟

Suppose I have to consider everything bettwen the range 20A0-20CF and 2200-22FF as special charcters.

I am fetching the value of a cell from excel sheet as, say

String val = excl.getCell(1);

Now this "val" variable may contain some (special) character, which is withing the range 20A0-20CF or 2200-22FF. How do I get that character replaced with a hex code?
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You'd have to iterate through all characters of that string, and do something like this:



Instead of the raw hex string, you probably have some special notation to use.

(Note that this doesn't account for characters that are not in the BMP, i.e., codepoints beyond 65535).
[ June 26, 2007: Message edited by: Ulf Dittmer ]
 
Thomas Greene
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


(Note that this doesn't account for characters that are not in the BMP, i.e., codepoints beyond 65535).



Thanks a lot for replying.

I have the following characters to support

Basic Latin (0000-007F)
Latin-1 Supplement (0080-00FF)
Latin Extended-A & B (0100-024F)
General Punctuation (2000-206F)
Currency Symbols (20A0-20CF)
Mathematical Operators (2200-22FF)
Combining Diacritical Marks (0300-036F)

Does it mean that the code which you have posted will not work for all these character ranges?

int codepoint = val.codePointAt(i)


Also, the ranges which I have are in the form of hex codes (as posted above). How can I find the corresponding codepoint, so that I know the ranges in codepoints (to make the above code work).
Thanks
[ June 26, 2007: Message edited by: Thomas Greene ]
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Does it mean that the code which you have posted will not work for all these character ranges?


No. Hex 22xx is way below 65000 (8000-something in fact), so you're good.

How can I find the corresponding codepoint, so that I know the ranges in codepoints (to make the above code work).


You don't need to. Wherever you want to use integers, you can write them in hexadecimal notation, preceded by a zero and an 'x'. E.g. "036F" would become "0x36f", and can be assigned or compared to an int (so-called hexdecimal literals).
 
Thomas Greene
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for replying

I am using j2se 1.4

So instead of StringBuilder, I can use StringBuffer
But what should I use in place of codePointAt()
 
Thomas Greene
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I did soemthing like this to convert characters to hex code



But the problem is that, I get the hex value "003f" for first and last value. How can there be same hex value for 2 characters?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are you sure that those characters are properly encoded in that string? When I run it with String s="\u8730Abc\u8719", I get the desired result.
 
Thomas Greene
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
Are you sure that those characters are properly encoded in that string? When I run it with String s="\u8730Abc\u8719", I get the desired result.



How to make sure that they are properly encoded. i copied these characters into my Java file from a web page.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The Java compiler assumes that source files are encoded in the platform default encoding. If you are using something else (e.g. UTF-8), then you need to tell the compiler about it. That's what the javac -encoding switch does; see here for an example.
[ June 27, 2007: Message edited by: Ulf Dittmer ]
 
Thomas Greene
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks a lot.
It works fine.

The final result that I have to achieve is that, read the values from an Excel file using POI (these values contains special characters), convert the special charcters to hex code and write them back to another excel file (using POI).
In this scenario how do I manage encoding. Will excel take care of encoding?
Will the ecoding be lost when the java program reads the values using POI?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Will excel take care of encoding? Will the ecoding be lost when the java program reads the values using POI?



You'll probably just have to try and find out. At the very least, Excel seems to store Unicode characters as such (I can see the Unicode value in the file if I look at it in a hex editor), so POI should have a good chance to retrieve that.
 
Thomas Greene
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It is working fine

This is the code I have


There are couple of problems with this
1. it is every inefficient, since lot of string objects are getting created. I may have say, 20000 characters which need to be checked. This code will create huge number of objects.
2. I want this code to run (that is, teh conversion of special characters to hex code ) to happen only if the character that is being checked in not within the range of 0000-007F.
So I need something like

Please let me know how to do this
Thank You.
[ June 28, 2007: Message edited by: Thomas Greene ]
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
if you're concerned about object creation, just create fewer of them. Something like this:
 
Thomas Greene
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
if you're concerned about object creation, just create fewer of them. Something like this:


Thanks
i am using J2se 1.4, so charCodeAt() won't work.
What can be used instead?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

i am using J2se 1.4, so charCodeAt() won't work. What can be used instead?



Sorry about that, just a typo (charCodeAt is not a Java method in any JDK) - I meant charAt.
 
Thomas Greene
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks a lot Ulf
 
Do Re Mi Fa So La Tiny Ad
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic