• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Is this a valid way to remove control characters: "{cntrl}"

 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I found this in an old Sun article, is this still the best way to remove all control chars from text?



I found a similar posting on Javaranch, but this was just concerning carrriage returns:
http://www.coderanch.com/t/426877/Beginning-Java/java/remove-carriage-return-from-string

Thanks!
 
Henry Wong
author
Marshal
Pie
Posts: 21016
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

It would be better if you actually used the regex for a control character.

Henry
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry
Is there a way to strip out all possible control characters, or would you have to list them individually?

 
Henry Wong
author
Marshal
Pie
Posts: 21016
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nigel Shrin wrote:
Is there a way to strip out all possible control characters


Sure... but you need to actually have a regex that specifies control characters.

Henry
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, I don't know how to do that, are you able to give me an example, I've only used quite straightforward regex examples, and am not sure how to search for control characters, or can you point me in the direction of a resource? (The {cntrl} did not work by the way.)

thanks
 
Henry Wong
author
Marshal
Pie
Posts: 21016
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nigel Shrin wrote: (The {cntrl} did not work by the way.)


This is because, as already mentioned, it isn't exactly the regex for control characters. It's close, but it isn't right. Maybe it would be a good idea to go back to where you saw "{cntrl}", and this time, cut-n-paste it correctly.

Henry
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Henry - the article containing the code is this one:
http://java.sun.com/developer/technicalArticles/releases/1.4regex/ The article is dated 2001, updated 2002. I'm running 1.6, perhaps that is why it errors?

The block of code in the article is this:

Removing Control Characters from a File

/

I am running the code in Eclipse, and have only changed the file path lines to new text files I have created.
File fin = new File("C:/Documents and Settings/Nigel/Desktop/f1.txt"); // f1.txt contains the above program, so plenty of tabs and newlines
File fout = new File("C:/Documents and Settings/Nigel/Desktop/f2.txt");

The exception msg is as follows:

Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition
{cntrl}

I've searched the web looking for an explanation of this exception, and found an answer that the brace should be escaped ie \\{ and \\}.
This does not appear to change the input file at all:


Is it a posix us-ascii only search?
If yes, then I have tried another text file containing the following, and \\{cntrl\\} has not removed anything?


Thank you

 
Henry Wong
author
Marshal
Pie
Posts: 21016
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Wow. An eight year old article that is just wrong -- and no-one caught it. Take a look at the JavaDoc...

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

Or you can use any other regex document instead. See the section about POSIX character classes. One of the classes is for control characters.

Henry
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
yes - that's where I found the posix reference
 
Prithvi Sehgal
Ranch Hand
Posts: 774
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Try to use this pattern



Let us know if it works.

Hope it helps,
 
Prithvi Sehgal
Ranch Hand
Posts: 774
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Even better,

there is a software known as regex buddy, you can download and try to get the correct regex from there.

Hope this helps,
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Prithvi, the "Pattern p = Pattern.compile( "\\p{Cntrl}");" syntax removed tabs but not newlines.
I've made a note of regex buddy - that could well be useful in future.
 
Prithvi Sehgal
Ranch Hand
Posts: 774
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

You are welcome Nigel.

Best Regards,
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic