aspose file tools*
The moose likes Java in General and the fly likes Parse Smart Quotes to regular quotes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Parse Smart Quotes to regular quotes" Watch "Parse Smart Quotes to regular quotes" New topic
Author

Parse Smart Quotes to regular quotes

Lisa Modglin
Ranch Hand

Joined: Oct 28, 2003
Posts: 46
I have an application where the user copies text from a MS Word document and pastes it into a form field. The data is stored in a database. The smart quotes are being stored in the database as question marks.

I want to search and replace the form input field before saving it to the database. I'm just not sure what I want to search for. I used the following:

String textString = request.getParameter( "test" );
for( int i = 0; i < textString.length(); i++ )
{
out.println( "<br />Char(" + i + "): " + textString.charAt( i ) + "-" + (int)textString.charAt( i ) );
}

Where the smart quotes appear, I get the value of 147 and 148.

I want to replaceAll of the smart quotes, but can't get it to work:

out.println( "New String: " + textString.replaceAll( "\\p147", "\"" ) );

I also tried this from another forum:

out.println( "New String: " + textString.replaceAll( "\\u201c", "\"" ) );

Exactly what value am I searching the string for? The unicode value, the ascii value, what?

Any comments are appreciated!

Lisa
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18669
    
    8

I think you should back up a step and read this article by John O'Conner. You may need to change some other things before that code will work.
Lisa Modglin
Ranch Hand

Joined: Oct 28, 2003
Posts: 46
It's true that the only place where I'm experiencing problems is at the database level. That's why I simply want to search the string for the smartquote and replace it with a regular quote.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

Originally posted by Lisa Modglin:

Where the smart quotes appear, I get the value of 147 and 148.

I also tried this from another forum:

out.println( "New String: " + textString.replaceAll( "\\u201c", "\"" ) );

Exactly what value am I searching the string for? The unicode value, the ascii value, what?


This one should work, but for some reason, you somehow converted 147 or 148 to x201c, instead of x0093 and x0094... so the code should be:



Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

Oops... forgot that strings are not mutable, so code should be...



Henry
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18669
    
    8

The value \u201c is the official Unicode value of one of the curly quotes. But the 147 is its encoding in some Windows charset. So when you get the data from the client, it has already been damaged. The article I linked to goes into how you can avoid that happening. And if you avoid that, you might not need to change the curly quotes to straight quotes at all. If you have your database configured to accept all Unicode characters, for example, you could leave them as is.
Lisa Modglin
Ranch Hand

Joined: Oct 28, 2003
Posts: 46
Henry,

Thanks so much! I knew how to get the 147 and 148 values, but I just didn't know what to do with them. How did you convert 147 to 93?

Paul,

I understand what you are saying, but sometimes there isn't an option of changing the database. Our DBA would have nothing of that!

Lisa
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18669
    
    8

Originally posted by Lisa Modglin:
I understand what you are saying, but sometimes there isn't an option of changing the database. Our DBA would have nothing of that!
I was suggesting that your database might already be able to handle a curly quote if it was represented properly, i.e. if you had \u2019. The reason you see ? in the database is that what you are getting from the client is a character that doesn't mean anything in Unicode. Try a little experiment and see what happens when you write a string including \u2019 into your database.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

Originally posted by Lisa Modglin:
Henry,

Thanks so much! I knew how to get the 147 and 148 values, but I just didn't know what to do with them. How did you convert 147 to 93?

Lisa


The "\\u" regular expression tag is to be followed by a hexidecimal number, so 147 in decimal converts to 93 in hexidecimal.

BTW, you can also replace both quotes in one method call.



Henry
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Parse Smart Quotes to regular quotes