aspose file tools*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Is this a valid way to remove control characters: Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Is this a valid way to remove control characters: "{cntrl}"" Watch "Is this a valid way to remove control characters: "{cntrl}"" New topic
Author

Is this a valid way to remove control characters: "{cntrl}"

Nigel Shrin
Ranch Hand

Joined: May 18, 2009
Posts: 137
I found this in an old Sun article, is this still the best way to remove all control chars from text?



I found a similar posting on Javaranch, but this was just concerning carrriage returns:
http://www.coderanch.com/t/426877/Beginning-Java/java/remove-carriage-return-from-string

Thanks!


Nigel
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18120
    
  39


It would be better if you actually used the regex for a control character.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Nigel Shrin
Ranch Hand

Joined: May 18, 2009
Posts: 137
Thanks Henry
Is there a way to strip out all possible control characters, or would you have to list them individually?

Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18120
    
  39

Nigel Shrin wrote:
Is there a way to strip out all possible control characters


Sure... but you need to actually have a regex that specifies control characters.

Henry
Nigel Shrin
Ranch Hand

Joined: May 18, 2009
Posts: 137
Sorry, I don't know how to do that, are you able to give me an example, I've only used quite straightforward regex examples, and am not sure how to search for control characters, or can you point me in the direction of a resource? (The {cntrl} did not work by the way.)

thanks
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18120
    
  39

Nigel Shrin wrote: (The {cntrl} did not work by the way.)


This is because, as already mentioned, it isn't exactly the regex for control characters. It's close, but it isn't right. Maybe it would be a good idea to go back to where you saw "{cntrl}", and this time, cut-n-paste it correctly.

Henry
Nigel Shrin
Ranch Hand

Joined: May 18, 2009
Posts: 137
Hello Henry - the article containing the code is this one:
http://java.sun.com/developer/technicalArticles/releases/1.4regex/ The article is dated 2001, updated 2002. I'm running 1.6, perhaps that is why it errors?

The block of code in the article is this:

Removing Control Characters from a File

/

I am running the code in Eclipse, and have only changed the file path lines to new text files I have created.
File fin = new File("C:/Documents and Settings/Nigel/Desktop/f1.txt"); // f1.txt contains the above program, so plenty of tabs and newlines
File fout = new File("C:/Documents and Settings/Nigel/Desktop/f2.txt");

The exception msg is as follows:

Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition
{cntrl}

I've searched the web looking for an explanation of this exception, and found an answer that the brace should be escaped ie \\{ and \\}.
This does not appear to change the input file at all:


Is it a posix us-ascii only search?
If yes, then I have tried another text file containing the following, and \\{cntrl\\} has not removed anything?


Thank you

Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18120
    
  39


Wow. An eight year old article that is just wrong -- and no-one caught it. Take a look at the JavaDoc...

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

Or you can use any other regex document instead. See the section about POSIX character classes. One of the classes is for control characters.

Henry
Nigel Shrin
Ranch Hand

Joined: May 18, 2009
Posts: 137
yes - that's where I found the posix reference
Prithvi Sehgal
Ranch Hand

Joined: Oct 13, 2009
Posts: 774
Hi,

Try to use this pattern



Let us know if it works.

Hope it helps,


Prithvi,
My Blog, Follow me on Twitter,Scjp Tips, When you score low in mocks, Generics,Scjp Notes, JavaStudyGroup
Prithvi Sehgal
Ranch Hand

Joined: Oct 13, 2009
Posts: 774
Even better,

there is a software known as regex buddy, you can download and try to get the correct regex from there.

Hope this helps,
Nigel Shrin
Ranch Hand

Joined: May 18, 2009
Posts: 137
Thank you Prithvi, the "Pattern p = Pattern.compile( "\\p{Cntrl}");" syntax removed tabs but not newlines.
I've made a note of regex buddy - that could well be useful in future.
Prithvi Sehgal
Ranch Hand

Joined: Oct 13, 2009
Posts: 774
Hi,

You are welcome Nigel.

Best Regards,
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Is this a valid way to remove control characters: "{cntrl}"
 
Similar Threads
Remove all control characters except \n & \r
List the running threads
how to get Thread dumps
jsp page table has right click disabled. Why?
non-printable special characters