Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regex utf-8 characters

 
mark goking
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi. would like some inputs



what my program does is to replace space characters with diamond symbols and pass them to the JTextPane for display. it works. my problem is when i revert it back from diamonds to space characters.

i used regex via the String's replaceAll() method but it does not work. naturally, if i system.out the string value the diamond symbol outputs \u9830 ? << which is weird since the unicode character in my code is \u2666, im confused why the system.out shows \u9830 ? (with a space and question mark)

is there something i need to do before the regex will work?
 
Paul Clapham
Sheriff
Posts: 21126
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not using a regex would be a better idea. If you want to replace all instances of one character by instances of a second character, the replace(char, char) method of String does exactly that without requiring regex.
 
mark goking
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ah. ok thanks. will check that out
 
mark goking
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
im guessing
when i retrieve the JTextPane's document object, it reads its contents via ISO-8859 data and since there are UTF-8 character set in there, it displays them as ?



ill have to set part of the code to have it read as utf-8
 
Paul Clapham
Sheriff
Posts: 21126
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That could be. It all depends on the class whose "write" method you're calling, of course, and there's no way to tell what that class is by reading your post.

If it were me I would avoid converting the data to bytes unless (1) you are forced to do that by an external requirement and (2) you can control the charset used in the conversion.
 
mark goking
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
the class with the write method is the javax.swing.text.rtf.RTFEditorKit
 
Paul Clapham
Sheriff
Posts: 21126
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then use its other write method -- check the docs, there are two -- and write to a StringWriter.
 
mark goking
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
im going to post my running code in awhile. brb
 
mark goking
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator



this is what i have so far. i figure the problem lies in getDocumentString()

try typing some letters with space and enter. when you press enter, symbols will appear. problem is when you start typing again, the diamond symbols do not get changed back to space characters.
 
mark goking
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i double checked by using replace() in string objects and they worked. it seems that the DefaultStyledDocument in the getDocumentString() method is the culprit, it never gets retrieved as utf-8. duno why
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15356
39
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A side note: This line of code looks unnecessarily complicated. Why first create a Character object and then use String.valueOf() on it?
mark goking wrote:

The following line does exactly the same:

 
mark goking
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi . yeah dont worry, i was experimenting . i could do with new Character('\u2666').toString() right away.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic