• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Remove special characters of any type from the file

 
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I have one file which is having the unpredictable special characters in it. Please find the attachment.
So can any one please help me out, is there any script to find out and eliminate those unpredictable special characters.






Spl.PNG
[Thumbnail for Spl.PNG]
Special Characters
 
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sounds to me like the application you are using to view the file is using the wrong character encoding. What is the application?
 
Krish Yeruva
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Richard,
I am not viewing these files. The thing is, in my application there is alot of unix jobs which uses these files. If these files are having the special characters like this, then that job will get failed. So I need to use the command in unix to findout and remove any type of special chars in the files. SO can you please help me out.
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Do these characters belong in the files? If so, you'll need to handle them properly. If not, the easiest may be to not put them there in the first place.
 
Richard Tookey
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Krishnareddy Yeruva wrote:
I am not viewing these files.



You must be ! The PNG file you attached shows the character so you are viewing it! So what was used to view the character?
 
Krish Yeruva
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
After the job got failed because of those bad characters, I have opened the file in EditPlus and notepad. I am able to see those special chars in both the editors.
 
Richard Tookey
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Krishnareddy Yeruva wrote:After the job got failed because of those bad characters, I have opened the file in EditPlus and notepad. I am able to see those special chars in both the editors.



I assume you mean 'notepad' as in Windows 'notepad' . Try opening the file in notepad and changing the character encoding (I don't use notepad but I know there is an option to set the character encoding). If you can find what the character encoding should be then it is easy enough to convert the file to the encoding assumed in your Linux scripts.
 
Krish Yeruva
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Ulf,

As Ulf mentioned:
Do these characters belong in the files? If so, you'll need to handle them properly. If not, the easiest may be to not put them there in the first place.


These characters doesn't belong to these files. But by mistake these chars are getting placed inside these files. So if we have any unix script to remove these special chars, so that we can validate and remove such chars from that file before the job is going to execute.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Look into the Unix/Linux "sed" utility; it can be used to remove characters from a file according to some regexp (assuming you can find a regexp that matches precisely the extraneous characters and no others).
 
Krish Yeruva
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
The main thing here is these txt files are processing by the UNIX jobs. So if we have the unix script to find out these special chars, we can remove those chars from these txt files.

Thanks in advance
 
Richard Tookey
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Krishnareddy Yeruva wrote:Hi Ulf,

As Ulf mentioned:
Do these characters belong in the files? If so, you'll need to handle them properly. If not, the easiest may be to not put them there in the first place.


These characters doesn't belong to these files. But by mistake these chars are getting placed inside these files. So if we have any unix script to remove these special chars, so that we can validate and remove such chars from that file before the job is going to execute.



The only thing that makes these characters 'special' is that you don't want them in your files. This is where your problem starts. You either have to define the set of the characters you want to remove or you need to define the set that you can accept but either way you to know the character encoding of the file so you can know how to convert the bytes of the files into characters before the filtering. Once you know the character encoding it is a fairly straight forwards to write a small program in almost any language you will find on the Linux box to filter the content of the file. But I stress - you need to know the character encoding to make this safe.

As Ulf says, the best approach is not to put the 'specials' there in the first place and in your position this would be my first point of attack. I would go back to the people who provided the files and ask them to provide clean files with a known character encoding.



 
Saloon Keeper
Posts: 27762
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also, assuming that these really are "special characters" and not simply some sort of binary artefacts, the "tr" Unix/Linux utility can be used to translate them into something more meaningful.
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic