This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I'm relatively new to Java, and I am writing a server side Java app that parses data from fixed length ascii files and loads them into tables on our db. I want to keep my app as object orientedly pure as possible but performance is more important. So here goes: Which is more efficient (i.e. faster)? (1) Create a bunch of child classes from a base class. During construction each child object passses unique attributes as parameters to the constructor of the base class which stores them in private variables. These attributes are later used by the base class when executing commonly used methods called by the child object. This decreases the number of parameters that must be passed by each call to a base class method. OR..... (2) Ditch the base class and convert all the commonly used methods to static methods. That will require a larger number of parameters in each method call than in scenario (1), but it doesn't have the overhead of parent-child object interaction and the static mehtods would be faster. Related question: Would it be faster to convert the methods into functions and have the results returned by the function calls or keep them as methods and have them change either protected variables (scenario 1) or public static variables (scenario 2) which can be accessed later by the calling method? I hope my question is clear. Thanks for any suggestions you all might have.
Can you provide any more details about how these objects are being used and what your performance requirements are?
Joined: Jun 16, 2003
Thanks, David. Here is a more detailed explanation of what I am trying to do: I use a BufferedReader object to read the flat file line by line. Before I start reading the file, however, I create an ArrayList of what I call "DataField" objects. Each DataField object corresponds to a column on the DB table that I want to load from the current flat file I am about to read. Each class of DataField object returns a particular data type, depending on the data type of the DB column to which it corresponds, and it contains attributes (starting position, field length, data type, default value, left trim character, etc.) that describe how to parse the line of text and extract the data. So, when I get a line from the BufferedReader, I send it to a routine which loops through the ArrayList of DataField objects, passing to each one the line of data. Each DataField object extracts its data and loads it into a parameter of a PreparedStatement object. When the entire ArrayList has been traversed, I have loaded the data for one row, so I insert it into the DB table via the PreparedStatement. Common to all DataField objects is a base class which contains methods that are used by all the various types of DataField Objects. So when I instantiate a DataField object, I pass to the base class constructor all the parsing paramters (these never change) and it loads them into private fields of the base class. So, since the base class already has all the parsing information it needs, all the child DataField object has to do when getting an inputline from the BufferedReader is to pass the line in a call to the appropriate base class method which will extract the data type it needs. Of course, this routine traverses the ArrayList of DataField objects for each line of data, sometimes processing millions of records in a file. Currently, my process can import a 190MB file containing over 1,600,000 records in about 48 minutes. I'm trying to squeeze as much performance out of my app as possible. I know that calling a static method is much faster then calling methods across class hierarchies. But by using a static method, I would have to keep passing the same parsing attributes as parameters in the call to the static method that extracts the data from the input line. Would this negate any increase in performance obtained from using a static method? I know that the usually answer to this type of question is "It depends" but is their a genereal rule that applies? Thanks again.
Joined: Jul 27, 2001
Hmmm. As you say, it sounds like an "it depends" sort of question. Some people around here have investigated the relative performance of static vs. final vs. non-final, etc. I don't know how the cost of passing parameters fits into it all. The performance gain from playing with modifiers probably pales in comparison to the gain from using smarter algorithms and using the profiler to identify and optimize the most frequently executed lines, but I suppose it doesn't hurt to give it a shot. First of all, it doesn't hurt to be overly-liberal with the "final" modifier. Also, if you're using Sun's JVM it's probably a good idea to run the server version of the JIT if you aren't doing that already. If all the logic is really in the base class, then are the sub-classes of DataField necessary in the first place? It's possible that a few factory methods and a switch statement here or there would be enough. Maybe I misunderstand what the sub-classes are doing. I imagine that 50 minutes of text processing can involve a lot of constructing and garbage collecting of objects. The people at Sun keep saying that their latest JIT makes object pooling unnecessary while the people who write articles keep saying that Sun isn't quite there, so I don't know how far you'd get by avoiding object allocation. I'm just babbling random, incoherent thoughts, which is a sign that I probably ought to be going to bed. Goodnight, world!
Yes, there are some general rules to apply here: First, don't trust micro-benchmarks. Even if such a benchmark shows that a static method call might be ten times faster than a non-static one, that doesn't actually mean much. Your program probably spends *much* more time *in* methods than calling methods. Additionally, your codes structure has a huge impact on which optimizations the Hotspot Engine will apply - and it will certainly be different from the micro-benchmark. Second, object oriented design is *not* generally at odds with performance. In fact, OOP makes it easier to avoid duplicated logic and coupling. This makes it both easier for you and the Hotspot Engine to optimize the code. So, what should you do? At first, you should create the best design you can think of, applying OO principles where appropriate. Once the system works, you can run it against a profiler to find the bottlenecks in your system and try to remove them. Do this with small experiments: make a change, profile again. If the change didn't improve performance considerably, undo the change. Repeat until performance is acceptable.
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
The first rule of coding for performance is don't. Code the best OO design you can and then if you have performance issues look at where those issues are and try to improve your code only in places where you would get the best hit. Most applications would get a better performance improvement by checking the indexes on their database that they would by worrying about making methods static or final.
Your design sounds pretty cool. Are you batching your inserts? That can make a big difference. Maybe you could addBatch() for every statement and executeUpdate() every 10 or 100 or 1000. I've only had opportunity to use that once, but it made a very big difference.
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Joined: Jun 16, 2003
Thanks to everyone for your suggestions. Sorry that I was slow in responding, but I was out of the office this morning. David, thanks for your suggestion about the server version of JIT. I'll check into that. Ilja and Thomas, you made some very good points on how to approach the whole issue of OOP vs "coding for performance". I'll take your suggestions to heart if I should have to tweek my app for better performance in the future. Stan, thanks for your suggestion about batching. At first I didn't batch my updates, but when I finally did it cut the time by almost half. So you're right, batching was a big plus. Right now, I'm just fiddling around with the batch size to optimize peformance. Let me just say that I think Java Ranch is fantastic. This is one great source of info/encouragement for new code hands like me. The Java Ranch community is great! I appreciate everyone's help! Happy code punching!
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com