This week's book giveaway is in the Clojure forum.
We're giving away four copies of Clojure in Action and have Amit Rathore and Francis Avila on-line!
See this thread for details.
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Parse Smart Quotes to regular quotes

 
Lisa Modglin
Ranch Hand
Posts: 46
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have an application where the user copies text from a MS Word document and pastes it into a form field. The data is stored in a database. The smart quotes are being stored in the database as question marks.

I want to search and replace the form input field before saving it to the database. I'm just not sure what I want to search for. I used the following:

String textString = request.getParameter( "test" );
for( int i = 0; i < textString.length(); i++ )
{
out.println( "<br />Char(" + i + "): " + textString.charAt( i ) + "-" + (int)textString.charAt( i ) );
}

Where the smart quotes appear, I get the value of 147 and 148.

I want to replaceAll of the smart quotes, but can't get it to work:

out.println( "New String: " + textString.replaceAll( "\\p147", "\"" ) );

I also tried this from another forum:

out.println( "New String: " + textString.replaceAll( "\\u201c", "\"" ) );

Exactly what value am I searching the string for? The unicode value, the ascii value, what?

Any comments are appreciated!

Lisa
 
Paul Clapham
Sheriff
Pie
Posts: 20196
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think you should back up a step and read this article by John O'Conner. You may need to change some other things before that code will work.
 
Lisa Modglin
Ranch Hand
Posts: 46
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's true that the only place where I'm experiencing problems is at the database level. That's why I simply want to search the string for the smartquote and replace it with a regular quote.
 
Henry Wong
author
Marshal
Pie
Posts: 20836
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Lisa Modglin:

Where the smart quotes appear, I get the value of 147 and 148.

I also tried this from another forum:

out.println( "New String: " + textString.replaceAll( "\\u201c", "\"" ) );

Exactly what value am I searching the string for? The unicode value, the ascii value, what?


This one should work, but for some reason, you somehow converted 147 or 148 to x201c, instead of x0093 and x0094... so the code should be:



Henry
 
Henry Wong
author
Marshal
Pie
Posts: 20836
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oops... forgot that strings are not mutable, so code should be...



Henry
 
Paul Clapham
Sheriff
Pie
Posts: 20196
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The value \u201c is the official Unicode value of one of the curly quotes. But the 147 is its encoding in some Windows charset. So when you get the data from the client, it has already been damaged. The article I linked to goes into how you can avoid that happening. And if you avoid that, you might not need to change the curly quotes to straight quotes at all. If you have your database configured to accept all Unicode characters, for example, you could leave them as is.
 
Lisa Modglin
Ranch Hand
Posts: 46
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry,

Thanks so much! I knew how to get the 147 and 148 values, but I just didn't know what to do with them. How did you convert 147 to 93?

Paul,

I understand what you are saying, but sometimes there isn't an option of changing the database. Our DBA would have nothing of that!

Lisa
 
Paul Clapham
Sheriff
Pie
Posts: 20196
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Lisa Modglin:
I understand what you are saying, but sometimes there isn't an option of changing the database. Our DBA would have nothing of that!
I was suggesting that your database might already be able to handle a curly quote if it was represented properly, i.e. if you had \u2019. The reason you see ? in the database is that what you are getting from the client is a character that doesn't mean anything in Unicode. Try a little experiment and see what happens when you write a string including \u2019 into your database.
 
Henry Wong
author
Marshal
Pie
Posts: 20836
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Lisa Modglin:
Henry,

Thanks so much! I knew how to get the 147 and 148 values, but I just didn't know what to do with them. How did you convert 147 to 93?

Lisa


The "\\u" regular expression tag is to be followed by a hexidecimal number, so 147 in decimal converts to 93 in hexidecimal.

BTW, you can also replace both quotes in one method call.



Henry
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic