aspose file tools*
The moose likes JDBC and the fly likes how to avoid special character in database? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "how to avoid special character in database?" Watch "how to avoid special character in database?" New topic
Author

how to avoid special character in database?

ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Sorry for my ignorance about SQL and Database.

Is there any provision in SQL or in Database or in JDBC API to avoid special characters in tables/database?

Thanks.
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

What do you mean by "special character"?


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Originally posted by Paul Sturrock:
What do you mean by "special character"?



A computer-representable character that is not alphabetic, numeric, or blank.

Non-alphabetic or non-numeric character, such as @, #, $, %, &, * and +.


I hope you get it now.

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42906
    
  69
While there are some ways of doing this in the database, it would be a much cleaner design not to store them there in the first place. What prevents you from removing those characters before you store them?
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Originally posted by Ulf Dittmer:
While there are some ways of doing this in the database, it would be a much cleaner design not to store them there in the first place. What prevents you from removing those characters before you store them?


Definitely, I would like to remove them before storing. But is there any ohter way of doing this apart from using RegEx? I mean, say any special kind of statement or something of that nature.

Thanks.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42906
    
  69
But is there any ohter way of doing this apart from using RegEx? I mean, say any special kind of statement or something of that nature.


No. Anyone's definition of 'special characters' probably differs from everyone else's, so it would be hard to know what to remove.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
What's the real problem you're trying to solve? What happens when "bad" characters get into the database? Believe me, I've hit dozens of ways that unexpected user input can crash an application, so I appreciate the problem. But just removing them may not be the answer.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Originally posted by Stan James:
What's the real problem you're trying to solve? What happens when "bad" characters get into the database? Believe me, I've hit dozens of ways that unexpected user input can crash an application, so I appreciate the problem. But just removing them may not be the answer.


The thing is, we don't want to make user flexible enough to enter *anything* he/she likes.

I guess, I should do that test both at client side and server side with my code only (JavaScript & Java respectively).
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

What does happen when "bad" characters enter the database? My understanding is it doesn't really matter what ends up in a text filed of a database, unless you use the value in some other technology unchecked. You mention JavaScript - is this the real problem?
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Altering what the user enters seems downright rude. I'd be more inclined to properly escape strings on whatever platform I have to handle them. The first rookie error we hit was building an insert statement as a String and somebody entered a name of O'Malley and the ' terminated the value in the SQL string and Malley gave a syntax error. After we managed to get ' in and out of the database, it blew up some JavaScript written by a JSP the same way - terminated a string.

See the Commons StringEscapeUtils for a start on this journey.
Paul Campbell
Ranch Hand

Joined: Oct 06, 2007
Posts: 338
Really the question is not how to avoid entering special characters in a database... it is how to avoid upserting characters that are not supported by the database's configuration. It isn't just special characters, carriage returns, etc. If the character is not supported by your database's configuration, it becomes unprintable, unreadable, and most of all... it makes that column virtually useless in a where clause, causes odd results in group by, order by, etc. As a result, if you don't eliminate the bad characters prior to committing them to the database, you (or somebody) will have to clean them up later. I was the person that had to clean up the bad data the other day when I had to create a package to remove unprintable/unreadable characters not supported by an Oracle installation using US7ASCII... in this case the user had loaded thousands of rows of data containing "curly" quotes from an excel spreadsheet to the database.

So it is a valid concern and no, I don't think you can do it in a single statement. In my case, the package is now part of the upsert sql and strips the characters as part of the sql statement.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Interesting notes about upsetting the database itself. I doing my first direct hands-on time work with databases since the 80s right now. Using a prepared statement with setString() I'm able to insert and retrieve all the characters on the keyboard, which is more than the last system I worked on could do. I'll have to play with copy paste from Word and Excel and see what happens!
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

Originally posted by Paul Campbell:
Really the question is not how to avoid entering special characters in a database... it is how to avoid upserting characters that are not supported by the database's configuration. It isn't just special characters, carriage returns, etc. If the character is not supported by your database's configuration, it becomes unprintable, unreadable, and most of all... it makes that column virtually useless in a where clause, causes odd results in group by, order by, etc. As a result, if you don't eliminate the bad characters prior to committing them to the database, you (or somebody) will have to clean them up later. I was the person that had to clean up the bad data the other day when I had to create a package to remove unprintable/unreadable characters not supported by an Oracle installation using US7ASCII... in this case the user had loaded thousands of rows of data containing "curly" quotes from an excel spreadsheet to the database.

So it is a valid concern and no, I don't think you can do it in a single statement. In my case, the package is now part of the upsert sql and strips the characters as part of the sql statement.


Rather than "cleaning up the data", why did you not convert your Oracle instance to support Unicode? Is it a pre-8i version or something? OK, migration from one character set to another is non-trivial, but if users are trying to store characters outside your current character set, is this not something you need to do? Surely your character stripping package is corrupting the data?
Paul Campbell
Ranch Hand

Joined: Oct 06, 2007
Posts: 338
It is 10g/RAC... all our databases are 10g or being converted to 10g. Not all are RAC. 8i has been out of support for a long time, 9i is in extended support by Oracle. You never want to be in an out-of-support situation with your corporate data stores.

Well for one. It isn't my call. Even if it was... it isn't the right decision. That is an architecture decision driven by the corporation as a whole. We are currently going through a core application conversion and consolidation.

One of the "features" of upgrading the character code used by your database is that some systems (most notably the ETL jobs) are dependent on the particular character returned to the job in order to determine "no data found". When you change character codes on your database, you change the character that will be returned when "no data found" occurs for these jobs.

Your suggestion is valid. In this situation, there just isn't enough ROI to support a decision to convert/upgrade hundreds of Oracle databases (all our databases are configured similarly) and testing/modification of the thousand+ ETL jobs that use these databases because a single excel spreadsheet upsert process inserted unsupported characters into a single database.

Also, once the unsupported characters are inserted to the database, they are lost. Converting to different character code will not make that data usable.

I'm writing in a hurry as I go to work so if I didn't make a lot of sense, please let me know.

[ October 08, 2007: Message edited by: Paul Campbell ]

[ October 08, 2007: Message edited by: Paul Campbell ]

[ October 08, 2007: Message edited by: Paul Campbell ]
[ October 08, 2007: Message edited by: Paul Campbell ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: how to avoid special character in database?