wood burning stoves 2.0*
The moose likes JDBC and the fly likes PreparedStatements (need review) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "PreparedStatements (need review)" Watch "PreparedStatements (need review)" New topic
Author

PreparedStatements (need review)

Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
I am building a web app with a single (not pooled) full time (jdbc) connection between static classes and the database. This is expected to be a low traffic site and the static methods are synchronized. In an effort to speed up access to product information, I am trying PreparedStatements for the first time. As I test, on localhost, sure that I'm the only one running the app., it seems clear to me that my the prepared statements are slower than the unprepared statements I use earlier in the process. This may not be a fair comparison. The unprepared statements get a single result set from one table and it's done. As you can see from the code below, what I'm doing with prepared statements involves three tables and multiple queries. But since this is my first time, I would appreciate review and comment. This does actually work; i.e. all data is retrieved as expected.

The first method below (initialize()) is called once from a servlet init() method when the application is first started. The second method (getItemBatch()) retrieves information about as many product items as match a product name (Titel). My little development / test database has less than 100 (total) items and only 1-3 items matching each name; most often only 1. The server app and database are on the same machine and I'm accessing from a browser via localhost. I'm surprised by the consistent perceptible wait for this detailed product data compared to fetching a master product list (all items) mentioned above.





Correlation does not prove causality.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30138
    
150

THe first thing I notice is that you don't close the PreparedStatements which is a resource leak.

As far as comparing speed, you can't compare Statement and PreparedStatement speed unless you are using them to run the same query. Most of the time in a query comes from executing it and returning the data. You also have to run in a loop to compare times since PreparedStatement benefits come on subsequent calls.

For tuning what you have now, try running the SQL through explain to see if the database is using an index. Also, look if you can use a join to avoid so many trips to the database.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
Thanks for the tips, Jeanne.

I'm closing the prepared statements via the servlet's destroy() method; the same servlet that calls the method "initialize()" above in order to create them. The idea is to create them once (and only once) and have them available forever for all users (i.e. until the app or server is shut down). I'm going for a kind-of a pseudo-stored procedure. I would just use stored procedures straight out, but the database exists to support another application (their in-house sales system) and I'm going to use it for Internet sales read-only ... avoiding any potential conflict with their maintenance efforts or agreements etc. by not doing anything to change their database set-up. I have suggested that my app use a new account limited to read-only privileges. And come to think of it, I haven't tested whether I can use Prepared Statements in that mode; just seems like it should work.

Each result set is closed in the finally block in the method where they are created. I guess that's ok? I reuse the same RS variable name for multiple result sets (on the second two) but close only once. But wait! Do I even need that? The ResultSets are declared within the scope of the method. If resetting them without closing the old ones doesn't cause leaks, then exiting the method should do just as well on its own. It is a static method, but the ResultSets will be reset every time it's used (at the very least). So, at most, there would just be a single set of ResultSet handles available for reuse; not run-away "leakage."

I'll do some googling to find out about "explain," but it might help if you could give me a hint. I'll also look at "join" (yes, that one I've heard of ... lol). I'm also wondering if I can send the two select requests in the loop both at the same time, simply by turning them into one prepared statement separated by ';'. Or just found "MultipleActiveResultSets=True"; document allowing SQL Server to process multiple transaction requests on a single connection ... still investigating this. Finally, I might get a boost from using connection pooling, which I haven't done yet. It's low priority in my project right now, but I might have to do it before we go online.


Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30138
    
150

While it's ok to keep connections open a long time (although you really should use a connection pool instead of doing it yourself), you still should close the PreparedStatement. The overhead isn't in creating the prepared statement object for each query.

Closing the result set isn't strictly necessary if you close the prepared statement, but it is a good idea in case the driver doesn't.
Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
Jeanne Boyarsky wrote:While it's ok to keep connections open a long time (although you really should use a connection pool instead of doing it yourself), you still should close the PreparedStatement. The overhead isn't in creating the prepared statement object for each query.

Closing the result set isn't strictly necessary if you close the prepared statement, but it is a good idea in case the driver doesn't.


I did discover today that I don't need to close the result sets at all. (My concern was initialized by someone asking what would happen with SQLException while trying to close them.) After reading about their closure in the Java 7 API doc, it looks like there's no chance of run-away leakage. They close on their own. The prepared statements aren't closed, but they are reused; which closes any old related result sets that may be open.

RE: the PreparedStatements, it seems like there is some overhead to preparing the statements. In fact from what I've read, it's not worth using prepared statements when the preparation is done on each demand for data unless you fetch a lot of data ... i.e. there's a big enough number of loops on the prepared statements. I don't know how many loops I'll have. It could be fairly high, but often, it will be only once ... for one item. As I was testing today, I noticed that the first instance of fetching a data set was slower than the rest. I fixed that by setting con.setAutoCommit(true) in initialize(). (I know it's the default, but just want to be sure.) Then the first instance became just as fast as the rest.

Another thing I'm looking into is the possibility of using batch processing. With Statements, it's said you can't do that with SELECT. Perhaps it's just an oversight, but the Java 7 PreparedStatement doc doesn't say that.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30138
    
150

Roger F. Gay wrote:
I did discover today that I don't need to close the result sets at all. (My concern was initialized by someone asking what would happen with SQLException while trying to close them.) After reading about their closure in the Java 7 API doc, it looks like there's no chance of run-away leakage. They close on their own. The prepared statements aren't closed, but they are reused; which closes any old related result sets that may be open.

No. The resources are only closed if you are using try with resources syntax. Your code uses the pre-Java 7 coding style so you don't get the auto closure.
Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
Jeanne Boyarsky wrote:
Roger F. Gay wrote:
.... They close on their own. ....

No. The resources are only closed if you are using try with resources syntax. Your code uses the pre-Java 7 coding style so you don't get the auto closure.


You make a good point. I could be sure of 100% closure that way. I'm using Java 7 but have never used the Try with resource. Nice one. I'd also not have to worry about what happens if the close fails in the finally statement (pre-7 style). Here's the piece of Java 7 API I was referring to earlier. Same thing in in Java 6 which does not implement the AutoClosure interface.
close()
...
Note: A ResultSet object is automatically closed by the Statement object that generated it when that Statement object is closed, re-executed, or is used to retrieve the next result from a sequence of multiple results.


It seems like it could leave the last ResultSets hanging, but it would be limited to one each, and they'd always be closed on next reuse. The ResultSets are defined within the scope of the method, but I don't think that helps with closure in this case since the method is static. As I clean things up, the need to do it the way you suggest may be coming into focus. Yesterday, I tried restarting the database system to create a connection loss. When the connection is not valid, I'll have to reinitialize the connection and the PreparedStatements. Not sure though. The new PreparedStatements will no longer be associated with the hanging ResultSets. But through the process of creating the new PreparedStatements, the old PreparedStatements will surely be closed? And that would cause any related ResultSets to be closed as well. I'm also reusing the same ResultSet variables, defined within the method - not creating new instances via new ResultSet ... Which means that old rs, rs1, and rs2 must, at the very least be sent to the garbage dump. That's right isn't it? Or would they be immediately destroyed / replaced merely by reuse of the rs, rs1, and rs2 variable names within the same scope?
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3606
    
  60

Roger F. Gay wrote:RE: the PreparedStatements, it seems like there is some overhead to preparing the statements. In fact from what I've read, it's not worth using prepared statements when the preparation is done on each demand for data unless you fetch a lot of data ... i.e. there's a big enough number of loops on the prepared statements. I don't know how many loops I'll have. It could be fairly high, but often, it will be only once ... for one item.

That ultimately depends on the database, but for most databases this is not true. Databases generally share the prepared statement across different database sessions, so even if you just prepare the statement and run in once, assuming you do this in many sessions, (or several times in the same session, but not in a loop), the overall performance will be a lot higher than with just using a statement. There are many people on the net who tried to compare performance between the PreparedStatement and Statement; most of them have it wrong (measure the wrong thing), but even those who do the measurements right usually miss the wider perspective point - the PreparedStatement reusability across sessions. (Some of these people then run everything using Statements and then complain that RDBMS don't scale and start exploring NoSQL, which irks me a lot. Sorry for the excursion. )

Edited to clarify: yes, there might be some overhead for the initial run of the PreparedStatement, that much is true. But the other factors I've described usually more than make up for this penalty. Besides, the PreparedStatement prevents SQL injection attacks, and this reason alone should be more than enough to make the decision, but we're discussing strictly performance here.

As I was testing today, I noticed that the first instance of fetching a data set was slower than the rest. I fixed that by setting con.setAutoCommit(true) in initialize(). (I know it's the default, but just want to be sure.) Then the first instance became just as fast as the rest.

As you've noted, autocommit is the default. I'd guess that you were seeing the speedup because the database has cached some data after the first run; again, this is common oversight when assessing DB performance.

Another thing I'm looking into is the possibility of using batch processing. With Statements, it's said you can't do that with SELECT. Perhaps it's just an oversight, but the Java 7 PreparedStatement doc doesn't say that.

The addBatch() method doesn't return a resultset, so unless the Java 7 makes it accessible somehow, I don't see any possibility to obtain the results of that select. My guess (again) is that the JDBC allows you to run SELECTs this way if you don't mind you cannot read their results. You might want to do that with SELECT FOR UPDATE queries.
Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
Martin Vajsar wrote:The addBatch() method doesn't return a resultset, so unless the Java 7 makes it accessible somehow, I don't see any possibility to obtain the results of that select. My guess (again) is that the JDBC allows you ro eun SELECTs this way if you don't mind you cannot read their results. You might want to do that with SELECT FOR UPDATE queries.


Finally, an explanation! It's been irking me since I started this. Seemed like it would be an obvious benefit to batch selects (see my code ... just want to select data from two different tables at the same time). But there would be quite a bit more complexity to a ResultSet returning an array of ResultSets or whatever from multiple runs through the loop. What I thought to achieve was to reduce the number of round trips. I don't want rs1 and rs2 to each take a separate request response cycle. I was hoping to send them both at the same time and have the database process them with no confusion about which is which; even if they returned data with the same name fields. (They don't in my program.)

Martin Vajsar wrote:As you've noted, autocommit is the default. I'd guess that you were seeing the speedup because the database has cached some data after the first run; again, this is common oversight when assessing DB performance.


Not sure I understand that. Maybe I do. Like, it would create handles to the particular tables of interest?
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3606
    
  60

Roger F. Gay wrote:Not sure I understand that. Maybe I do. Like, it would create handles to the particular tables of interest?

As I've mentioned before, the database usually reuses statements upon execution. First run of a statement (depending on the database) causes the statement to be parsed and a plan for that statement to be created. This takes some time, but I'd say for simple queries it is too short to measure it reliably.

Much more pronounced is the effect of the plain old caching: the database loads the data it processes into a cache; subsequent run of the same command that touches the same data can be much faster, because physical IO (reading of the data from disc to memory) is avoided. The more data the query touches, the more obvious this effect will be.

Some databases (Oracle 11g, for example) are even able to track whether data returned by a query has changed since its last execution, and if they didn't, it can return the last result, skipping the execution of the statement altogether. This is a new 11g feature and I don't have much experience with it - didn't ever try to explore, but if you're on a database which does this kind of things, you need to take it into account when measuring performance.
Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
OK, Martin; I think I see the practical implications anyway. If I do, this should make sense. I just started the db system (SQL Server) and Tomcat and gave them plenty of time to "warm up." The first product fetch was just as fast as before. I mean ... like I reported before. The first one is not noticeably slower than later fetches. It's quite fast actually. There are other variables, but at least I've demonstrated, by seeing the first one go fast, that the current coding is not causing long delays first time around.

So here's really the big burning question at this point. Does anyone see any way for me to speed things up even further? That loop is still bugging me. Is it possible to do that more efficiently?
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3606
    
  60

It's hard to tell with certainty without more knowledge about your data model. But if rs1 and rs2 are expected to contain one row each, then an inner join could probably be used to join all three tables together and return them as a single resultset. That would of course decrease the number of DB roundtrips significantly. (If you take a look, you'll find that Jeanne has advised you to use joins in her first response.)

If you don't know how a join looks like, I'd suggest reading an SQL tutorial at this point. You'll need that knowledge if you want to develop database applications. I can't recommend any particular SQL tutorial, but I'd say t should be possible to find several using Google and just start there. Feel free to come back and ask questions if you get stuck!
Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
Martin Vajsar wrote:It's hard to tell with certainty without more knowledge about your data model. But if rs1 and rs2 are expected to contain one row each, then an inner join could probably be used to join all three tables together and return them as a single resultset. That would of course decrease the number of DB roundtrips significantly. (If you take a look, you'll find that Jeanne has advised you to use joins in her first response.)

If you don't know how a join looks like, I'd suggest reading an SQL tutorial at this point. You'll need that knowledge if you want to develop database applications. I can't recommend any particular SQL tutorial, but I'd say t should be possible to find several using Google and just start there. Feel free to come back and ask questions if you get stuck!


Thanks. I hadn't forgotten Jeanne's advice on that point. I worked straight through the weekend, long hours, mostly on the client side, making the product display slick and interactive and carrying through to data loaded into the shopping cart page. Presentation tomorrow!

I think your plan is a good one. As soon as I can focus a block of time back on this, I'll hit the tutorials again focused on JOIN and post a new question if needed. I've also run across a couple of references stating that you can create a multi-part Statement simply separating the parts with a semi-colon. You have to set something when making the MySQL connection, but it's default for SQL Server. Now I've forgotten what it's called: allowMultiSomething=true;

Thanks Martin and Jeanne!
Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
Wasn't hard to guess it: allowMultiQueries, as in "jdbc:mysql://localhost:3306/testdb?allowMultiQueries=true"; example given, search for allowmultiqueries
Roger F. Gay
Ranch Hand

Joined: Feb 16, 2007
Posts: 349
Successfully joined all three tables. Thanks again guys and gals!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: PreparedStatements (need review)
 
Similar Threads
how to delete the row on click of a checkbox instantly ?
multiple processes... only want one
Table having large volume of data
Are there any problems using the same statement for multiple queries?
BLOB insertion into the database hangs