aspose file tools
The moose likes Java in General and the fly likes Remove bad characters from strings Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Remove bad characters from strings" Watch "Remove bad characters from strings" New topic
Author

Remove bad characters from strings

Surendra Kumar
Ranch Hand

Joined: Jul 04, 2006
Posts: 87
Hi,

I am trying to replace bad chanracters from strings.
We have some users copying and pasting strings from mainframes and some other sources. So this brings us some unwanted special characters (like A with ^ on top or two dots on top). So later when we try to to process such string some of our applications are failing. These characters can't be even typed.

So, can someone please tell how to remove these characters?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 16483
    
    2

I can type those characters. And if it were up to me I would fix the parts of my application which failed when faced with such characters. Speaking of which, why does the application fail?

You might also want to read this article to make sure it's not your fault that those characters appear in your inputs.

However if you want to "cleanse" your data then do this:

1. Make a list of all the characters that you choose to accept.

2. Apply some method that removes any other characters from your inputs. Perhaps a regular expression could be used (I don't know much about them) or just a little filter that moves only valid characters into a validated string.
Naseem Khan
Ranch Hand

Joined: Apr 25, 2005
Posts: 809
Check this thread

Searching a large text file...


Asking Smart Questions FAQ - How To Put Your Code In Code Tags
Surendra Kumar
Ranch Hand

Joined: Jul 04, 2006
Posts: 87
Thanks, Paul.

I'm looking for some sample code that does remove these unwanted characters.
The application should accept only ASCII characters. And all other characters should be removed.

Could you please provide me with an example to do this in Java 1.4?
Thanks a lot.
Chris Rutkowski
Greenhorn

Joined: Jun 26, 2006
Posts: 8
"A with ^ on top or two dots on top", and other characters like them, are prefectly legal ASCII characters. On a normal U.S. keyboard they can only be typed by using ALT and then a number combination on the number pad, but many foreign keyboards are set up differently and if a language uses these characters, they have their own key just like any other letter. ANY printable character has an associated ASCII value.
Surendra Kumar
Ranch Hand

Joined: Jul 04, 2006
Posts: 87
so, they come under the old 128-bit ASCII code?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 16692
    
  19

Try...



Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Surendra Kumar
Ranch Hand

Joined: Jul 04, 2006
Posts: 87
Hi Henry,

Thanks for the code.

If you don't mind, could you please let me know what exactly does this code do?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 16692
    
  19

Originally posted by Surendra Nichenametla:
Hi Henry,

Thanks for the code.

If you don't mind, could you please let me know what exactly does this code do?


It's a string substitution. The first parameter is the item to match for, and the second is what to replace it with.

In this case, the second parameter specifies that the string should be replaced with nothing. As for the first parameter, it is a Regular Expression that defines what you want.

Henry
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24057
    
  13

Originally posted by Chris Rutkowski:
"A with ^ on top or two dots on top", and other characters like them, are prefectly legal ASCII characters.


No, actually. They're perfectly lovely characters, essential to many languages, but ASCII is a very restricted set of 127 specific characters that do not include umlauts, accents, etc -- only those characters used in the English language are included, along with control characters and punctuation (see here).

I'm not arguing that applications that only accept ASCII are appropriate in Java -- they're certainly not. I'm just being pedantic about our terms.


[Jess in Action][AskingGoodQuestions]
Surendra Kumar
Ranch Hand

Joined: Jul 04, 2006
Posts: 87
Thanks, Henry.

Hi Ernest,

Our app is trying to convert a text file to PDF document, and this is where it's failing. It fails with msg: "1 non-printable character. failed to create pdf..."
So I wanted to remove all these non-printable characters.

Now with the code snippent given by Henry I am trying to test this app.
Can you please let me know how to print those extended ASCII chars using the conventinal English keyboard?
Chris Rutkowski
Greenhorn

Joined: Jun 26, 2006
Posts: 8
Originally posted by Ernest Friedman-Hill:


No, actually. They're perfectly lovely characters, essential to many languages, but ASCII is a very restricted set of 127 specific characters that do not include umlauts, accents, etc -- only those characters used in the English language are included, along with control characters and punctuation (see here).

I'm not arguing that applications that only accept ASCII are appropriate in Java -- they're certainly not. I'm just being pedantic about our terms.


Thanks for the info, Ernest. My mistake.
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: Remove bad characters from strings
 
Similar Threads
How to remove carriage return and linefeeds from XML files
Generate random strings
Removing or replacing a enter key chararcter from a string??
backslash is removed from the parameter when javascript function get the parameter from java call
MySQL: how to enforce the NOT NULL constraint