This week's book giveaway is in the Clojure forum.
We're giving away four copies of Clojure in Action and have Amit Rathore and Francis Avila on-line!
See this thread for details.
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Need Help in parsing Japanese SHIFT JIS Characters in Java

 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ranchers,
Im Back with another problem...yes again it is to do with Parsing a file which has "JAPANESE SHIFT JIS CHARACTERS".

Please find the code below.



Why is Java changing the Japanese SHIFT JIS characters to Junk values as shown in the above program...

Please help......

 
Christophe Verré
Sheriff
Pie
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're talking of reading Shift-JIS files, so why are you setting the input stream's encoding to UTF-8 ?
 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please let me know as to what should be done
Im stuck up with this...what charset needs to be set for Japanese SHIFT-JIS scenario ???



Please help me
Stuck up....

 
Christophe Verré
Sheriff
Pie
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The character set is named "Shift_JIS".
 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
how do i use it in my code. ???

should i first convert it to UTF-8 as shown in the pasted code and then use SHIFT-JIS charset.

The reason being is we are uploading UTF-8 files as well.so it first checks for UTF-8 encoding and there it is converting to junk characters,Then im using

String newValue = new String(fieldValue.getBytes(en),"SHIFT-JIS")
row.set(fieldName,newValue);

hence we see that newValue is also having Junk characters.

How should i handle such that UTF-8 and SHIFT_JIS works as well.

Please advice.

 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All

The flow is

im uploading a file with Japanese SHIFT JIS characters with encoding type --> UTF-8

Then we have The below logic


It parses and then we have a encodeRow method where we are doing as below



hence we see that newValue is also having Junk characters

Please help and advice( I need a solution such that i upload a file(Japanese/chinese/korean) in UTF 8 format and it should not show junk characters....i want to achieve this through code....Please advice Ranchers.




Help provided will be highly appreciated.


Deepak


 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please need help on this

 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ranchers,
Need help on this...
Deepak
 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please help me on this....Please suggest some solution....
 
Campbell Ritchie
Sheriff
Pie
Posts: 47293
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't know, sorry, but this looks too difficult for the beginners' forum. Moving.
 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please need help java experts.....

im uploading a file with Japanese SHIFT JIS characters with encoding type --> UTF-8

Then we have The below logic



It parses and then we have a encodeRow method where we are doing as below





hence we see that newValue is also having Junk characters

Please help and advice( I need a solution such that i upload a file(Japanese/chinese/korean) in UTF 8 format and it should not show junk characters....i want to achieve this through code....Please advice Ranchers.
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Deepak,

You should not be doing conversions between character sets like you are doing - in memory, using the bytes returned from String directly. You need to do a better job of converting and conserving bytes and byte orders / bytes per character...

Anyway, the best way to do it in memory is to use the java nio CharSetEncoder and CharSetDecoder classes. Their API can be found here:
java.nio.charset.CharsetEncoder
java.nio.charset.CharsetDecoder

When you first read the String, you are best off using the java.nio tools, for example accessing the values via a ByteBuffer, then using a CharSetDecoder created from the Charset the source data is in to convert the ByteBuffer to the 16 bit Unicode CharBuffer and Strings that Java uses. Then when it is time to convert it to another format you use a CharSetEncoder for the new destination Charset, and encode the CharBuffer (String) into the ByteBuffer of the new encoding.

An example of what the code might look like is attached. Note that this is 100% un-tested.
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is crossposted (and answered) over on the Sun Forums.

Hi Deepak, when you cross post please let us (on both forums) know, so as not to waste our time duplicating effort on posts that may already be solved already. I know this one went a while without being answered but the appropriate thing would have been to put a link to your post over on the Sun Forums so we could review what they are saying and combine efforts.

Anyways, I notice a few other of your active posts are also cross posted to both forums. Could you please update them all with links?

Thanks,
Steve

<edit: Sorry, I had the wrong thread linked. >
 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Steve and Ranchers,
Thanks for the information and solution.I will test and try it...In case of any clarifications.I will revert back.

Sorry for cross posting in another forums

Will not make this mistake again
Apologies for the same.



Deepak
 
Rob Spoor
Sheriff
Pie
Posts: 20393
46
Chrome Eclipse IDE Java Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Deepak Lal wrote:Sorry for cross posting in another forums

Will not make this mistake again
Apologies for the same.

There's no problem in cross posting, but http://faq.javaranch.com/java/BeForthrightWhenCrossPostingToOtherSites
 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Steve,
I had a Clarification,What paramters i need to pass to test the ConversionUtility snippet.Could you please provide inputs ?
In case of conversion from UTF-8 to SHIFT_JIS,??

Deepak Lal
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Deepak Lal wrote:Hi Steve,
I had a Clarification,What paramters i need to pass to test the ConversionUtility snippet.Could you please provide inputs ?
In case of conversion from UTF-8 to SHIFT_JIS,??

Deepak Lal


Hi Deepak,

Sorry, I cut the JavaDoc comments out for brevity.
Here is the docs for each method:


I also added one more method so you can convert directly from an NIO ByteBuffer:
 
Deepak Lal
Ranch Hand
Posts: 561
Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Steve,
Thanks a lot for all your replies,but im getting confused which one to invoke first.
I mean if i had to write a main method in java,which is the first method to be invoked first and what parameters i have to supply in this scenario.
I mean conversion from UTF-8 to SHIFT_JIS Scenario ???


Thanks a lot in advance

Deepak Lal
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The three methods are meant for three different purposes. So which one you want to call (or call first) depends on what you have.

1) Say you have bytes that you read from the input, and you want to convert them to a String for display. Then call getString(String, byte[]).
2) Say you have a String, and you want to convert it to bytes so you can store them in a specific format. Then call convertToEncoding(String, String).
3) Say you have bytes in one character set, and you need to store those bytes in another character set. Then call convertBetweenEncodings(byte[], String, String)

 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Also, I want to repeat this:
"Note that this is 100% un-tested."

I don't claim that this WILL work. I claim this is the strategy to use. I still haven't tested it. If you find it does work, please report back. If not, then it may need to be tweaked a bit. Without a lot of time, or the inputs to be able to thoroughly test I doubt I will be able to do so.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic