I'm dealing with the rather classical problem to run a query against a DBMS and fetch ResultSet's resulting data into Java classes. I know that this problem may be solved using JPA or other ORMs framework, but for some reasons - it would be long to explain them - i cannot use ORMs or JPA... moreover, I'm really interested to work directly with JDBC, so even if I'm going to rehinvent the wheel, well, this time really does not matter....
So, basically I have to fetch each row in a JavaBean. For sake of semplicity, I'm going to use Java objects with perfectly reflect my table structures, so that if i have a table with column "Customer", I'll always have a "Customer" field in my Java Object. No need or support for aliasing of names.
As first approach, I thought to use reflection: I enumerate all declared fields of my class, the basing upon type I will call resultset.getXXX(fieldName).
This approach works, but I'd like to know if there is a faster approach. I tried to use JavaAssist (i've read that it is used internally in Hibernate) to build "on the fly" a custom class fetcher, avoidind reflection. Even this approach works well, but running time with these two different approaches is really nearly the same. I can fetch about 5K rows of 100 columns for row in about 15 seconds, and I don't think is a bad time, but I would like to run faster.
Which approach I can considerate ? What approach do ORMs framework use ? I'd like, with your precious help, to find some answers to these questions...
Do I understand it right that reflection does not seem to be the bottleneck? In that case, you should probably play with the fetch size a bit (see the setFetchSize() method in the Statement interface). If your goal is to read the whole resultset as quickly as possible, try really high fetch size. One word of warning though: I've got exceptions with high fetch sizes in Oracle (the problem seemed to be related to the total amount of data fetched at once, so narrow tables could have higher fetch size than wide ones), so test the setting thoroughly and probably make them configurable too.
If you want to avoid reflection for whatever reason, I might suggest an approach I'm using on one project where we also avoided ORM frameworks. I've written a utility which analyzes table creation SQL script (with syntax tightened up to make the parsing easier) and creates POJO and DAO classes and interfaces for around a hundred tables based on the SQL scripts. Our schema changes a lot and my primary objective was to have a single authoritative source of schema structure so I wouldn't have to propagate changes manually to Java sources. (Simpler approach might employ some XML to describe the schema and generate the table creation scripts from this XML as well. I've missed on this.)
I believe similar functionality is available in various ORM frameworks, ie. you could have these classes generated from your schema. I investigated this briefly, but then decided for homebrewed solution as we also create schema documentation this way.
Edit: one more thought: ORM does not employ some magic here, so there is no reason why pure JDBC solution could not be as fast as ORM. To the contrary, as ORM is a generic solution, JDBC gives you more leeway and at least in some cases will allow you to do things faster than ORM (probably). If ORM seems faster to you, it is probably because it has fine-tuned some parameters, such as fetch-size, against a database you're using.
Also, make sure to use read-only, forward-only resultset. They might not be faster than other flavors, but certainly they won't be any slower.
ORM does not employ some magic here, so there is no reason why pure JDBC solution could not be as fast as ORM. To the contrary, as ORM is a generic solution, JDBC gives you more leeway and at least in some cases will allow you to do things faster than ORM (probably). If ORM seems faster to you, it is probably because it has fine-tuned some parameters, such as fetch-size, against a database you're using.
I totally agree. To be honest, I've never used an ORM since I deal with a legacy database structure I cannot modify, which lacks of primary keys, autogenerated keys, versionID columns and more... So I decided to do persistence my way. Since - as you stated, and I think you're right - there is no magic in an ORM and an ORM have to reflect data from resulset to a Java class, no matter how, I would like to learn how this problem has been addressed.
The fact that reflection seems to be as fast as a custom fetcher sounded strange to me. Maybe, with moder processors, and modern JRE of course, reflection overhead is negligible.
Claude Moore wrote:The fact that reflection seems to be as fast as a custom fetcher sounded strange to me. Maybe, with moder processors, and modern JRE of course, reflection overhead is negligible.
My personal experience is that network affects the performance much more than any reflection overhead. After all, you can easily measure how long it takes to use reflection to set 100 fields 5 thousand times in a row (and compare that to using setters instead of reflection).
I'd first say that if your performance meets your needs, then do not optimize. Premature optimization is the root of all evil. Optimization comes at the cost of complexity, and over optimizing means that you get the cons of the complexity without reaping any benifits
First of all, you need to find out where your bottlenecks are?. Is it in the database or your code? As Martin said, the database itself can be slower than reflection.
Secondly, check if you are doing a lot of reflection lookups in your code. AFAIK, calling a method using reflection is not significantly slower than calling the method directly (IIRC, it takes around double the time or so) . But, looking up a method is very slow (IIRC, the lookup methods add an overhead of about 10 times). So, you need to check if you can reduce the number of times you call the lookup method. If you reduce the lookup calls, you can get significant boost in your perfroamnce
Third, if you really want to avoid reflection completely, you can always use ASM to generate code. Basically, you can generate a utility class that populates a List of your bean from result set at runtime. At startup, for each type of bean that you need to map to database, you do the code generation only once and use reflection to generate an instance of the class and keep it in memory. Everytime you want to convert, you call the utility object that you have created. So, you eliminate all reflection calls at runtime. THe startup would be slower than usual, but not significantly so. Obviously, this solution adds a lot more complication over using a solution that is reflection based.
AS previosuly stated, I did not notice a great difference between reflection and "direct populating" my beans via a Java class which populates my beans using ResultSet and avoids any reflection. I did not used ASM, I used JavaAssist but I think that they should be the same thing, as long as we're talking of building on the fly an utility class which populates a bean. I knew that reflection was slower that normal access to method or properties, but mine wasn't the case - I was filling a list of 5,000+ rows from my db, and each row has more than 100 columns. Considerating I'm doing some test from remote, and I'm not in LAN with my db server, it's likely that I was experiencing network latency... I will do some test in LAN, of course, but the real question is which approach I should follow...
Well, if you are pretty sure that the issue is network latency, there's not much you can do in code except try to reduce the number of database calls you make. Make sure you are using connection pooling, because creating connections over a high latency network can be very slow. Also, you might want to try playing around with fetchSize parameter if you are iterating over large result sets. Generally, if you are showing only 30-50 records to the user, you don;t really need your fetch size to be larger than the number of records that you need to show, so increasing fetchSize may not help
If your connected to your DB server over a WAN, IMO, you will get most bang for your buck by tuning TCP at the OS layer to work well over WAN. This is not something that most programmers know how to do. I've been programming for 17 years, and I have never done this myself. You need to call in people who know enough about tuning the OS and network topology.