File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Remove special characters of any type from the file

 
Krish Yeruva
Ranch Hand
Posts: 58
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have one file which is having the unpredictable special characters in it. Please find the attachment.
So can any one please help me out, is there any script to find out and eliminate those unpredictable special characters.






Spl.PNG
Special Characters
[Thumbnail for Spl.PNG]
 
Richard Tookey
Bartender
Pie
Posts: 1166
17
Java Linux Netbeans IDE
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sounds to me like the application you are using to view the file is using the wrong character encoding. What is the application?
 
Krish Yeruva
Ranch Hand
Posts: 58
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Richard,
I am not viewing these files. The thing is, in my application there is alot of unix jobs which uses these files. If these files are having the special characters like this, then that job will get failed. So I need to use the command in unix to findout and remove any type of special chars in the files. SO can you please help me out.
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do these characters belong in the files? If so, you'll need to handle them properly. If not, the easiest may be to not put them there in the first place.
 
Richard Tookey
Bartender
Pie
Posts: 1166
17
Java Linux Netbeans IDE
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Krishnareddy Yeruva wrote:
I am not viewing these files.


You must be ! The PNG file you attached shows the character so you are viewing it! So what was used to view the character?
 
Krish Yeruva
Ranch Hand
Posts: 58
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
After the job got failed because of those bad characters, I have opened the file in EditPlus and notepad. I am able to see those special chars in both the editors.
 
Richard Tookey
Bartender
Pie
Posts: 1166
17
Java Linux Netbeans IDE
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Krishnareddy Yeruva wrote:After the job got failed because of those bad characters, I have opened the file in EditPlus and notepad. I am able to see those special chars in both the editors.


I assume you mean 'notepad' as in Windows 'notepad' . Try opening the file in notepad and changing the character encoding (I don't use notepad but I know there is an option to set the character encoding). If you can find what the character encoding should be then it is easy enough to convert the file to the encoding assumed in your Linux scripts.
 
Krish Yeruva
Ranch Hand
Posts: 58
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ulf,

As Ulf mentioned:
Do these characters belong in the files? If so, you'll need to handle them properly. If not, the easiest may be to not put them there in the first place.

These characters doesn't belong to these files. But by mistake these chars are getting placed inside these files. So if we have any unix script to remove these special chars, so that we can validate and remove such chars from that file before the job is going to execute.
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look into the Unix/Linux "sed" utility; it can be used to remove characters from a file according to some regexp (assuming you can find a regexp that matches precisely the extraneous characters and no others).
 
Krish Yeruva
Ranch Hand
Posts: 58
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
The main thing here is these txt files are processing by the UNIX jobs. So if we have the unix script to find out these special chars, we can remove those chars from these txt files.

Thanks in advance
 
Richard Tookey
Bartender
Pie
Posts: 1166
17
Java Linux Netbeans IDE
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Krishnareddy Yeruva wrote:Hi Ulf,

As Ulf mentioned:
Do these characters belong in the files? If so, you'll need to handle them properly. If not, the easiest may be to not put them there in the first place.

These characters doesn't belong to these files. But by mistake these chars are getting placed inside these files. So if we have any unix script to remove these special chars, so that we can validate and remove such chars from that file before the job is going to execute.


The only thing that makes these characters 'special' is that you don't want them in your files. This is where your problem starts. You either have to define the set of the characters you want to remove or you need to define the set that you can accept but either way you to know the character encoding of the file so you can know how to convert the bytes of the files into characters before the filtering. Once you know the character encoding it is a fairly straight forwards to write a small program in almost any language you will find on the Linux box to filter the content of the file. But I stress - you need to know the character encoding to make this safe.

As Ulf says, the best approach is not to put the 'specials' there in the first place and in your position this would be my first point of attack. I would go back to the people who provided the files and ask them to provide clean files with a known character encoding.



 
Tim Holloway
Saloon Keeper
Pie
Posts: 17633
39
Android Eclipse IDE Linux
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Also, assuming that these really are "special characters" and not simply some sort of binary artefacts, the "tr" Unix/Linux utility can be used to translate them into something more meaningful.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic