This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Beginning Java and the fly likes how to convert ��� in aou with BufferedReader ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "how to convert ��� in aou with BufferedReader ?" Watch "how to convert ��� in aou with BufferedReader ?" New topic
Author

how to convert ��� in aou with BufferedReader ?

benny rusli
Ranch Hand

Joined: Jan 15, 2005
Posts: 72
Hallo,

i have tried to convert a germany alphabet ��� into aou but without success.

I used new BufferedReader(new InputStreamReader(System.in)); to read String from Console and convert the special alphabet ��� into aou. Example : h�llo h�w are yo� -> hallo how are you. Can someone give me a Solution how to overcome this problem. I appreciate for any help.
Stuart Ash
Ranch Hand

Joined: Oct 07, 2005
Posts: 637
Get the original text as a String and then use String.replace()?


ASCII silly question, Get a silly ANSI.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Mmmm, I'd actually prefer to loop through each character in the input, checking each character to see if it needs replacement, and then adding either the original character or the replacement to the output. The check can be done with a switch statement or a HashMap. For output, you probably want to either append to a StringBuffer/StringBuilder, or write to a Writer. The advantage of looping this way is you only have to do it once. With replace(), you effectively loop though the input once for each replacement. For long input that can become prohibitively expensive. This may seem to some like premature optimization, but to me it seems quite easy to write the loop, so why not favor the algorithm which will result in better performance? The difference is negligible for small inputs, but notable for long ones.

On a non-programming note, arifin, isn't it usually more appropriate to replace � with ae, � with oe, � with ue? That's the typical convention in places that don't normally support umlauts. Maybe that wouldn't make sense in whatever application you're working on - but maybe it does. Just a thought.


"I'm not back." - Bill Harding, Twister
Stuart Ash
Ranch Hand

Joined: Oct 07, 2005
Posts: 637
That apart, I wonder if there is some program somewhere on the net which strips letters of their accents, some kind of de-accenter.
benny rusli
Ranch Hand

Joined: Jan 15, 2005
Posts: 72
hallo,

thanks for responding my question. I agree with Jim Yingst, but there is another problem. I have done with what you have suggested and the return value is not correct. It gives me h�llo -> h�llo. I have used StringBuffer, toCharArray() method, for loop and switch case to get the appropriated letter.

To Stuart Ash : Get the original text as a String and then use String.replace()? . I have tried this method but without success, the return value is still the same as the input. You meant to use readLine() method from BufferedReader.

Any Help will be appreciated.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

You need something likewhere encoding is the charset of your console. If you're using Windows then type CHCP at the command line and it will tell you what its code page is. If the code page is 850 then you should put "Cp850" for encoding. If it's 437 you should put "Cp437" and so on. If you're using some other operating system then you need to find out in some other way what the console's charset is.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Hmmm... if you're reading from the console, then using new InputStreamReader(System.in) should use the same default encoding that the console uses. So I'm skeptical this is the place where the problem occurs. However it's hard to say for sure - there are a lot of things that could be going on.

Arifin, why don't you show us the code you have so far?
[ November 24, 2005: Message edited by: Jim Yingst ]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

using new InputStreamReader(System.in) should use the same default encoding that the console uses.
No, it uses the "default charset", which is the system property "file.encoding". That's the default charset used to read and write files. In Windows systems it's commonly cp1252 whereas the encoding used by the console is something else.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Ah, I see they've revised some of the API language to use "charset" more consistently nowadays than they did in the past. OK. More interesting though is your statement that console charset might be different from the file charset. I hadn't previously encountered this situation. How does one determine the console charset? Any ideas? So far, googling isn't getting me anywhere.

[Stuart]: That apart, I wonder if there is some program somewhere on the net which strips letters of their accents, some kind of de-accenter.

Mmmm, could be, but I've never seen one. (And can't google one up easily.) Would be easy to write one yourself, except for the mildly tedious task of going through the code charts to determine what Unicode values should be mapped to what. The Collator class may have some of this info already. It knows how to view a and � as equivalent when sorting, for example. But I can't see a way to get it to convert � to a. Unless I'm missing something, I think someone would need to figure out that mapping for themselves.

Incidentally I previously mentioned using a switch statement or HashMap for the replacement mapping. On further thought a String array is probably best, where the index is the Unicode value of the character to be replaced, and most entries just have null to indicate no replacement is necessary; just use the original.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

How does one determine the console charset?
On Windows you type CHCP at the command line and it tells you that your code page is 437, or some other number. Stick a "cp" on the front of that and you've got your console charset. Try code like this to see if it works:The first line output should be wrong (something other than the accented letter) but the second should be right.
benny rusli
Ranch Hand

Joined: Jan 15, 2005
Posts: 72
Hallo,

thanks again for responding my question, finally i have got a right answering from Paul Clapham and Jim Yingst. For your request Jim, i would like to send the code that i have made, can you see and correct it, if any mistake i have made. By the way, why the code ReplaceDemo1.java doesn't give right return value, example ��� -> a:� and the other codes are able to show the right letter(right return value or output). Why the new API, i meant the InputStreamReader need to use charset for reading ��� or special character. As summary i can only say that the reading from ��� will be success only if we split the string into character array using toCharArray(). Any idea for the improving the code would be appreciated.


ReplaceDemo1.java




ReplaceDemo2.java




ReplaceDemo3.java
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Paul, thanks for the info - that's good to know.

[arifin]: By the way, why the code ReplaceDemo1.java doesn't give right return value, example ��� -> a:�

It looks like in ReplaceDemo1.jasva, you're only replacing �. If you add lines for the other two letters you'd like to replace, you should get the expected results.
benny rusli
Ranch Hand

Joined: Jan 15, 2005
Posts: 72
Hallo,

thanks again jim, if i write too much logic code, then i forget how to overcome the small mistake. It should be look like this.
Help method to call replace method
[Code]
public static String replaceall(String input)
{
input = replace(input,"�","a");
input = replace(input,"�","o");
input = replace(input,"�","u");
return input;
}
[\Code]

[Jim]: It looks like in ReplaceDemo1.java, you're only replacing �. If you add lines for the other two letters you'd like to replace, you should get the expected results.

By the way, should i move to the Intermediate Forum, if i ask about MVC. Or should i write a new topic for that, could i continue discuss about MVC in my topic "how to convert ��� in aou with BufferedReader ?". Thanks for answering my question.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: how to convert ��� in aou with BufferedReader ?
 
Similar Threads
URGENT!!! InputStreamReader
Send a String to Server and server send it somewhere on network
Converting user input
How to convert large_integer to date
about file separator