JavaRanch Home    
This page:         last edited 14 July 2009         What's Changed?         Edit

Weird Word Characters   

When a segment of text is copied-and-pasted from Word (or other word processing packages), frequently the quote characters will not appear correctly in web pages. That is because these programs do not use ASCII quote characters, but rather "smart quotes" where the opening and closing quote characters are not the same.

If you are the one doing the cutting-and-pasting, you can either turn off this feature in Word or replace the "bad" characters by hand.

But if you need to deal with this issue programatically, William Brogden has supplied the following code fragment where s is the String to be "fixed":

  s = s.replace( (char)145, (char)'\'');
  s = s.replace( (char)8216, (char)'\''); // left single quote
  s = s.replace( (char)146, (char)'\'');
  s = s.replace( (char)8217, (char)'\''); // right single quote
  s = s.replace( (char)147, (char)'\"');
  s = s.replace( (char)148, (char)'\"');
  s = s.replace( (char)8220, (char)'\"'); // left double
  s = s.replace( (char)8221, (char)'\"'); // right double
  s = s.replace( (char)8211, (char)'-' ); // em dash??    
  s = s.replace( (char)150, (char)'-' );

William notes that a faster solution is possible, but this example sacrifices that for readability.


JavaRanchContact us — Copyright © 1998-2014 Paul Wheaton