wood burning stoves 2.0*
The moose likes JDBC and the fly likes How to insert HTML into database? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "How to insert HTML into database?" Watch "How to insert HTML into database?" New topic
Author

How to insert HTML into database?

Michael Cropper
Ranch Hand

Joined: Sep 30, 2009
Posts: 137
Hi,

I am trying to insert HTML into a database but I am having a little difficulty doing this due to the illegal characters.

The reason I am wanting to do this is that I am having a play around with scraping content, so I want to be able to put the full HTML into a database before I mess around trying to parse it.

Has anyone got any ideas on how to get around illegal characters in the HTML so that I can insert the data into the database?

I don't mind having to do a bit of pre-processing on the HTML to remove the content, but I haven't been able to find a definitive list of illegal characters anywhere.

Any thoughts?

Michael
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61760
    
  67

"Illegal characters"? What the heck are those? When inserting text into a DB there is no such thing as "illegal characters" -- at least none that are valid HTML.

I suspect you're having some other difficulty and are blaming the wrong thing. Why not explain what's happening?


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Michael Cropper
Ranch Hand

Joined: Sep 30, 2009
Posts: 137
Possibly so...

The issue is when trying to update data with a string such as (this is a snippet of the html)



The issue I am receiving is

MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near e's bla, blablablabla (one of blablabla' at line 1


Seems to be an issue with the ' in this instance.

Thanks
Michael

Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61760
    
  67

So, not an illegal character, but illegal syntax.

Without even looking at your code I can bet that you are not using a PreparedStatement, but building up a SQL statement with string concatenation for the values. Am I right?

If so, switch to a PrepardeStatement immediately. Do not pass GO, do not collect $200.
Michael Cropper
Ranch Hand

Joined: Sep 30, 2009
Posts: 137
Until earlier I wasn't using a prepared statement, but now I am. :-)

Here is the code I am using


Although this didn't solve the problem when I switched over. It still throws the error. Unless I have mis-understood how to implement the prepared statement?

Thanks
Michael
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61760
    
  67

Yes. Find a tutorial.

You don't put the data into the SQL string. You use ? placeholders and supply the data with methods. This not only eliminates silly syntax problems like you are having, it also helps to protect against SQL injection attacks.
Michael Cropper
Ranch Hand

Joined: Sep 30, 2009
Posts: 137
Ok thanks, will have a good read up on prepared statements and check back if I have anymore issues.

Thanks
Michael
Michael Cropper
Ranch Hand

Joined: Sep 30, 2009
Posts: 137
Hi BearBibeault,

Just been reading up about PreparedStatements and they seem straight forward enough, so I will go and implement those.

But I couldn't see any explanations as to 'why' they are more secure and help prevent SQL Injection attacks. Have you come across any info behind how they prevent SQL Injection attacks? The way I am currently looking at is it that even if you replace the "?" with a String, then the String could still contain text such as - ';drop table x;

I am guessing that when data is added to the PreparedStatement object via the .setString method, then something is happening in the background which strips out any dodgy characters?

When looking through the Java books I have on my shelf, I am extremely surprised that only 1 of the 4 I looked through mentions PreparedStatements at all.

Michael
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18986
    
    8

The database driver doesn't just do a naive string concatenation. It does the escaping of characters and the formatting of data which is required by the database it is designed to work with. For example in most databases you escape a quote character (like the one which did you in) by replacing it by two quote characters. So you can write a method which does that, and litter your code with calls to that method, or you can let the database driver do it for you.

Your method would have to deal with other things besides quotes, and it would be database-dependent, because different databases do it differently. Especially in the case of dates and timestamps. And so all of that ugly business is encapsulated in the database driver, which applies it to the parameters of PreparedStatement objects. That's a huge bonus to the programmer.

And I agree: I think the tutorials and books should start with PreparedStatement, and relegate Statement to the lunatic fringe, the place where you have to do something which is highly database-specific and which can't be done with a PreparedStatement.
Michael Cropper
Ranch Hand

Joined: Sep 30, 2009
Posts: 137
Thanks for the great explanation Paul, it all makes sense now :-)

Now all I need to do is re-code an awful lot of code to use PreparedStatements instead (dam those books / tutorials I read!). Shouldn't take me long :-s

Thanks again
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to insert HTML into database?