File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Need Help in parsing Japanese SHIFT JIS Characters in Java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Need Help in parsing Japanese SHIFT JIS Characters in Java" Watch "Need Help in parsing Japanese SHIFT JIS Characters in Java" New topic
Author

Need Help in parsing Japanese SHIFT JIS Characters in Java

Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Hi Ranchers,
Im Back with another problem...yes again it is to do with Parsing a file which has "JAPANESE SHIFT JIS CHARACTERS".

Please find the code below.



Why is Java changing the Japanese SHIFT JIS characters to Junk values as shown in the above program...

Please help......



When The Going Gets Tougher,The Tougher gets Going
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

You're talking of reading Shift-JIS files, so why are you setting the input stream's encoding to UTF-8 ?


[My Blog]
All roads lead to JavaRanch
Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Please let me know as to what should be done
Im stuck up with this...what charset needs to be set for Japanese SHIFT-JIS scenario ???



Please help me
Stuck up....

Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

The character set is named "Shift_JIS".
Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

how do i use it in my code. ???

should i first convert it to UTF-8 as shown in the pasted code and then use SHIFT-JIS charset.

The reason being is we are uploading UTF-8 files as well.so it first checks for UTF-8 encoding and there it is converting to junk characters,Then im using

String newValue = new String(fieldValue.getBytes(en),"SHIFT-JIS")
row.set(fieldName,newValue);

hence we see that newValue is also having Junk characters.

How should i handle such that UTF-8 and SHIFT_JIS works as well.

Please advice.

Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Hi All

The flow is

im uploading a file with Japanese SHIFT JIS characters with encoding type --> UTF-8

Then we have The below logic


It parses and then we have a encodeRow method where we are doing as below



hence we see that newValue is also having Junk characters

Please help and advice( I need a solution such that i upload a file(Japanese/chinese/korean) in UTF 8 format and it should not show junk characters....i want to achieve this through code....Please advice Ranchers.




Help provided will be highly appreciated.


Deepak


Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Please need help on this

Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Hi Ranchers,
Need help on this...
Deepak
Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Please help me on this....Please suggest some solution....
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38363
    
  23
Don't know, sorry, but this looks too difficult for the beginners' forum. Moving.
Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Please need help java experts.....

im uploading a file with Japanese SHIFT JIS characters with encoding type --> UTF-8

Then we have The below logic



It parses and then we have a encodeRow method where we are doing as below





hence we see that newValue is also having Junk characters

Please help and advice( I need a solution such that i upload a file(Japanese/chinese/korean) in UTF 8 format and it should not show junk characters....i want to achieve this through code....Please advice Ranchers.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4176
    
  21

Hi Deepak,

You should not be doing conversions between character sets like you are doing - in memory, using the bytes returned from String directly. You need to do a better job of converting and conserving bytes and byte orders / bytes per character...

Anyway, the best way to do it in memory is to use the java nio CharSetEncoder and CharSetDecoder classes. Their API can be found here:
java.nio.charset.CharsetEncoder
java.nio.charset.CharsetDecoder

When you first read the String, you are best off using the java.nio tools, for example accessing the values via a ByteBuffer, then using a CharSetDecoder created from the Charset the source data is in to convert the ByteBuffer to the 16 bit Unicode CharBuffer and Strings that Java uses. Then when it is time to convert it to another format you use a CharSetEncoder for the new destination Charset, and encode the CharBuffer (String) into the ByteBuffer of the new encoding.

An example of what the code might look like is attached. Note that this is 100% un-tested.


Steve
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4176
    
  21

This is crossposted (and answered) over on the Sun Forums.

Hi Deepak, when you cross post please let us (on both forums) know, so as not to waste our time duplicating effort on posts that may already be solved already. I know this one went a while without being answered but the appropriate thing would have been to put a link to your post over on the Sun Forums so we could review what they are saying and combine efforts.

Anyways, I notice a few other of your active posts are also cross posted to both forums. Could you please update them all with links?

Thanks,
Steve

<edit: Sorry, I had the wrong thread linked. >
Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Hi Steve and Ranchers,
Thanks for the information and solution.I will test and try it...In case of any clarifications.I will revert back.

Sorry for cross posting in another forums

Will not make this mistake again
Apologies for the same.



Deepak
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19670
    
  18

Deepak Lal wrote:Sorry for cross posting in another forums

Will not make this mistake again
Apologies for the same.

There's no problem in cross posting, but http://faq.javaranch.com/java/BeForthrightWhenCrossPostingToOtherSites


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Hi Steve,
I had a Clarification,What paramters i need to pass to test the ConversionUtility snippet.Could you please provide inputs ?
In case of conversion from UTF-8 to SHIFT_JIS,??

Deepak Lal
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4176
    
  21

Deepak Lal wrote:Hi Steve,
I had a Clarification,What paramters i need to pass to test the ConversionUtility snippet.Could you please provide inputs ?
In case of conversion from UTF-8 to SHIFT_JIS,??

Deepak Lal


Hi Deepak,

Sorry, I cut the JavaDoc comments out for brevity.
Here is the docs for each method:


I also added one more method so you can convert directly from an NIO ByteBuffer:
Deepak Lal
Ranch Hand

Joined: Jul 01, 2008
Posts: 507

Hi Steve,
Thanks a lot for all your replies,but im getting confused which one to invoke first.
I mean if i had to write a main method in java,which is the first method to be invoked first and what parameters i have to supply in this scenario.
I mean conversion from UTF-8 to SHIFT_JIS Scenario ???


Thanks a lot in advance

Deepak Lal
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4176
    
  21

The three methods are meant for three different purposes. So which one you want to call (or call first) depends on what you have.

1) Say you have bytes that you read from the input, and you want to convert them to a String for display. Then call getString(String, byte[]).
2) Say you have a String, and you want to convert it to bytes so you can store them in a specific format. Then call convertToEncoding(String, String).
3) Say you have bytes in one character set, and you need to store those bytes in another character set. Then call convertBetweenEncodings(byte[], String, String)

Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4176
    
  21

Also, I want to repeat this:
"Note that this is 100% un-tested."

I don't claim that this WILL work. I claim this is the strategy to use. I still haven't tested it. If you find it does work, please report back. If not, then it may need to be tweaked a bit. Without a lot of time, or the inputs to be able to thoroughly test I doubt I will be able to do so.
 
 
subject: Need Help in parsing Japanese SHIFT JIS Characters in Java