aspose file tools*
The moose likes Java in General and the fly likes regex utf-8 characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "regex utf-8 characters" Watch "regex utf-8 characters" New topic
Author

regex utf-8 characters

mark goking
Ranch Hand

Joined: Aug 18, 2009
Posts: 155
hi. would like some inputs



what my program does is to replace space characters with diamond symbols and pass them to the JTextPane for display. it works. my problem is when i revert it back from diamonds to space characters.

i used regex via the String's replaceAll() method but it does not work. naturally, if i system.out the string value the diamond symbol outputs \u9830 ? << which is weird since the unicode character in my code is \u2666, im confused why the system.out shows \u9830 ? (with a space and question mark)

is there something i need to do before the regex will work?


Website/Java Games: http://www.chitgoks.com
Tech Blog: http://tech.chitgoks.com
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18167
    
    8

Not using a regex would be a better idea. If you want to replace all instances of one character by instances of a second character, the replace(char, char) method of String does exactly that without requiring regex.
mark goking
Ranch Hand

Joined: Aug 18, 2009
Posts: 155
ah. ok thanks. will check that out
mark goking
Ranch Hand

Joined: Aug 18, 2009
Posts: 155
im guessing
when i retrieve the JTextPane's document object, it reads its contents via ISO-8859 data and since there are UTF-8 character set in there, it displays them as ?



ill have to set part of the code to have it read as utf-8
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18167
    
    8

That could be. It all depends on the class whose "write" method you're calling, of course, and there's no way to tell what that class is by reading your post.

If it were me I would avoid converting the data to bytes unless (1) you are forced to do that by an external requirement and (2) you can control the charset used in the conversion.
mark goking
Ranch Hand

Joined: Aug 18, 2009
Posts: 155
the class with the write method is the javax.swing.text.rtf.RTFEditorKit
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18167
    
    8

Then use its other write method -- check the docs, there are two -- and write to a StringWriter.
mark goking
Ranch Hand

Joined: Aug 18, 2009
Posts: 155
im going to post my running code in awhile. brb
mark goking
Ranch Hand

Joined: Aug 18, 2009
Posts: 155



this is what i have so far. i figure the problem lies in getDocumentString()

try typing some letters with space and enter. when you press enter, symbols will appear. problem is when you start typing again, the diamond symbols do not get changed back to space characters.
mark goking
Ranch Hand

Joined: Aug 18, 2009
Posts: 155
i double checked by using replace() in string objects and they worked. it seems that the DefaultStyledDocument in the getDocumentString() method is the culprit, it never gets retrieved as utf-8. duno why
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13884
    
  10

A side note: This line of code looks unnecessarily complicated. Why first create a Character object and then use String.valueOf() on it?
mark goking wrote:

The following line does exactly the same:


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
mark goking
Ranch Hand

Joined: Aug 18, 2009
Posts: 155
hi . yeah dont worry, i was experimenting . i could do with new Character('\u2666').toString() right away.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: regex utf-8 characters
 
Similar Threads
WYSIWYG validation
Regex Help Needed
I need a help with a regular expression please
Regex pattern - unprintable characters
Regex for replacing special characters