This week's book giveaway is in the Java in General forum. We're giving away four copies of Beginning Java 17 Fundamentals: Object-Oriented Programming in Java 17 and have ishori Sharan & Adam L Davis on-line! See this thread for details.
I have developed a transformation application which reads data from Excel sheets and then creates an XML document. However, during the operation certain indeterminate characters creep in. These characters are treated as "space" characters by Java but even if I do a trim then these don't get removed. On comparing them against " " also, nothing happens!
For example, here's a character which is wrecking quite a havoc --> � There's another one which looks like a small square.
Can anyone suggest how to deal with such occurrences and filter them out from a string using Java?
Doing it manually would be too much of an overhead. TIA!
I understand that the problem presented here might be rather conveniently solved if a possibility to plug the source would have existed. However, as matters stand, the Excel sheets can be created by anyone around the world. The people we are targeting in our case take a dump of database tables into these sheets. So obviously, considering that the dumps are usually to the tune of 1000s of rows, I don't think it would be really feasible for us to suggest them to keep an eye out for such funny characters.
Ulf, your suggestion is more like a prevention but in my case the problem already manifests itself. I need to "solve" it.
The trim() method will only remove whitespace characters at the beginning or end of the String - removing multiple whitespace chars if present. If the characters are embedded elsewhere in the string, trim() will have no effect.
Anirvan, you can find out exactly what these characters are by casting each char to int and printing it out:
Here the quotes are useful since some characters are not visible, and others may cause you to skip a line.
Once you know the numeric value of the characters, you can find out exactly what character it is by looking up the value at in a unicode chart (such as at unicode.org). Once you know what it is, it will be easier to develop a sensible strategy for dealing with it.
Not unless you're concerned with only "printable" characters. Beyond 127 what you get is a whole set of Latin, Non-break space, and a wide bunch of zookie characters from a wider bunch of Worldly languages. Of course, the implementation above should be a a nightmare for someone developing an application with Localization in mind. It's only meant for the nice characters one can see printed on a standard keyboard.
WHAT is your favorite color? Blue, no yellow, ahhhhhhh! Tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop