Maximus Moothiringus

Ranch Hand
+ Follow
since Jun 07, 2008
Maximus likes ...
Eclipse IDE Tomcat Server Java
A developer in the making.

Trying my hand with different open source projects. Planning to roll one out myself (Yeah!! Reinventing the wheel with style!!).

Have a little experience in programming with C. Jumped to Java in my second year of college as they had to teach me Data Structures and they forced us to use Java. Not much experienced in C++. But fell in love with Java

Web application developer to a large extent.

Well versed in PHP-MySQL development.

Starting to explore the world of Java Web Development.

Oh!! And I like changing my display name. Kinda uncomfortable with that!!! ;-).

@Sheriffs : Do tell me if the name is too bad to be out of your borders
Cows and Likes
Total received
In last 30 days
Total given
Total received
Received in last 30 days
Total given
Given in last 30 days
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Maximus Moothiringus

This one is lovely.. And the naming of characters.. Awesome..
11 years ago
Hi All,

I put in some time to change the code. Now the HashMap stores a String array as the value. This means that I dont have to use split function for every check that I perform - that too for every input line. Once I did this, it greatly improved performance.

I changed all usages of append to StringBuffer. This also contributed to increasing the performance. Because lesser memory was used.

I also increased the heap size while running which also might have helped in cutting time.

Now the whole processing takes place within a minute - including all checks on the inptus and printing out the error messages. This is a really great improvement.

I think if I use intern() and other considerations that you put forward, I might be able to bring it down even further.

Lessons learnt:

  • Profile your code - Find out slower functions, find out memory leaks
  • Check how slower functions can be improved - if not how to call them less
  • Check how memory use can be minimised

  • Thanks everyone for your support!! I think the problem for me is now resolved!!
    11 years ago

    William Brogden wrote:It occurs to me that you should be building your extract from the database as a Serializable HashMap in the first place - skipping any creation and parsing of strings and never writing that text file.

    Well, I think I mentioned that the class I am writing is actually going to be called by another software. This software already does all the above mentioned things. I dont have to do anything.. The jar that I am creating gets all these databases and the input file as string. This class is just meant for the checks that I have to perform on the input that I get. This class is invoked by another software which already gets all the inputs from the user. It queries the databases, reads row as string by the way and passes everything required by the jar as string to it. From what I know, the software is pretty good at it. The timings I received show that all these are done in less than 2secs by the software - which then passes it on to the class that I write.

    Again, the output also, I just have to pass out the output string. The software decides what all it should do with it. I think it just writes it out to a file. I am not sure. Meaning what I have is actually strings as input and strings as output. And lots of checks in between.

    For ease of explanation - I explained my test scenario where I take the string inputs from a test main function which reads all the database files and passes the strings into the class in question.

    I am sorry if that part of the system confused everyone.

    William Brogden wrote:Let the order of items in the String[] be that of the database columns and you essentially have high speed in memory lookup of a row.

    Yes, it is so currently. I changed the system to reflect that.

    Java serialization is another area that I have not experimented with much. Does it mean that i serialize each hashmap and store them in a place - and refer them later? Will i be able to serialize just these datastructures..

    11 years ago
    Yes, Increasing the memory did increase the performance a bit.. I see that changing from String to StringBuffer makes a lot of difference...

    I am still hung up on Pat's comment as to what is running O(n^2).. I am profiling the code to find out whats happening...

    Will let you know more as and when improvements occur..

    Yes, Putting String[] in the hashmap instead of String makes a lot of sense. Dont have to change much to get that.. That ought to reduce the split()s that I am doing inside the loop which could be one factor as per Pat's inference.

    Would you recommend putting in the String array or special holder objects inside the hashmap? Which one would you recommend keeping in mind both access and memory considerations.

    11 years ago

    William Brogden wrote:If we really have to optimize the memory used, lets look at the need for Strings in the first place. If all your characters are in the ASCII set, ie one byte, perhaps your hash value could hold a byte[] instead of a String, using half the memory. The keys would still be String but the value would only be turned into a String as needed.

    Never thought of things like these.. I understand that Strings are a bad usage... Would using a byte array be better than objects?? Because if I have it as a byte array, I have to convert it back to string and then split and do things. What would you suggest.

    William Brogden wrote:
    How big is the file of extracted database rows? That should give an idea of maximum memory use.

    Of all the tables, only two tables are big. Each would be like 400 rows maximum. The big one being about 60000 rows. The big table (HashMap by the way) is accessed only by one or two rules.

    11 years ago
    Yep, Will do that.
    I will have to try that out on a friends PC. Mine is a pretty blown out one.. Very small processing power.

    11 years ago

    fred rosenberger wrote:do you need to read the whole file into memory? Can you read one line, process it, then read the next?

    Reading the file and databases are done by another software which passes the strings to this java code that I am writing. The same cannot be changed as far as I understand - business reasons.

    Kees Jan Koster wrote:Dear Max,

    How much memory are you giving this application? How much of that is used? A simple test might be to double the heap space for the app and see if that brings down the GC times.

    Kees Jan

    Never fiddled with that one. Also as I said, I might again pass the same to another system. From what I understand, the java code most probably goes into a middleware system which calls the function with the required inputs. I am not fully aware of the whole architecture. But Ill try to change that and see the output.


    - update
    I am now working on changing all Strings to StringBuffers
    I will also change all string manipulations
    i will also try to use objects inside hashmap
    11 years ago

    fred rosenberger wrote:You seem to imply that your hash is working, and that things are fairly well spread out.

    Okay. I understand that. When I said my hash is working fine. What I am basing on is the results i got from my jprofiler output of the code. It says the get function is taking only 0.something % (a very meagre amount) of the whole time. I dont know if this is because the split and other string manipulations are taking a lot of time - and thus a larger percentage of the whole time.

    11 years ago

    William Brogden wrote:

    there are no database connections that remain open

    Do I understand you to say you are opening a new DB connection for each line and then closing it?


    Nope you got it all wrong. There are as I said about 16000 lines which I need to process. And for each process, I need to verify the data with data from the databases. Since JDBC query within a loop for each of the inputs would be a huge headache, I am doing it all before I start processing the input file.

    The data in the database is constant (In the sense, changes only once every 5-6 months - whereas the input file comes one every 10-15 mins. It is just for verification of the input data. I know what all conditions I have to give for querying the database. I know the primary keys of the database. So every time I update the database, I query the database, get all the rows - make them into a string and write them into a file - which I read every time I process an input file. The file that I read, I enter into a HashMap (each line being an entry in it - mapped to its key) which I then use get function to get the whole row. How I process the rest is still to be decided.

    So for your question. I use the DB before the input file is processed. Meaning I am not opening a DB connection when I am processing the input file.

    11 years ago
    Hey All,

    an update.

    I profiled through the code. The hash map is functioning pretty fine. Hashes seem to be good and unique. HashMap gets are pretty fast.

    Like Peter said String manipulation is eating up most of the time. I have to work on that. I have to change all the code so that I have to implement that Object thing that Peter suggested.

    Yes, I found some places where there are lot of accesses to string append functions and split - which combined eats up a lot. Have found a way to reduce the use through the use of StringBuffer and all

    Yes, Jprofiler says GC is running at 60% most of the time. I think the use of strings makes use and free up a lot of memory which makes GC run now and then. I intend to make use of other functions and objects to reduce that.

    Will implement these changes and profile and post more updates.

    Tim Holloway wrote:One thing that's worth looking at is how "perfect" your hashes are. If the Hashtables are sub-optimal, you'll spend more time chasing overflow links.

    Could you please explain on this? Or maybe provide links which I can read through.

    Tim Holloway wrote:Hopefully, you have some way to compile your rules. If you have to interpret them each time you use them, that can be VERY expensive.

    This one too.

    11 years ago

    William Brogden wrote:How many database connections do you have open "at one time"?
    Are you using a connection pool?

    As I said, I am querying the database before the input file is processed. All the results are made into a string and passed into the code that I am referring to. So, to answer your questions there are no database connections that remain open while the processing is happening. The maximum that can happen are some file reads..

    11 years ago
    I'll try doing that also..

    One other question that I had came up while I was experimenting with the code.

    I didnt change any part of the code upto now. But experimented with different sizes of the inputs.

    I ran the whole code with an input of 1000 lines. It ran in a minute or so.
    I ran the code again but added 1000 more lines to the code. I can understand clearly that the increase in time is not linear. How does this happen? Is it because of the HashMap or the split. Or is it both.


    Oh!! I forget!! Thats what profiling is for!! Back to profiling.
    11 years ago
    Yes, I am starting off with some profiling. Just finished installing JProfiler.

    Also Peter, How do you think I can minimize the split() that I am donig.

    Here's what I am doing right now.
    I am getting the string from the hashmap and then splitting to get each column.

    Do you suggest that instead of putting string, if I put in objects with setter getter which goes with the same design as the database the access will be easier??

    11 years ago
    Hmm.... That seems to be a decent explanation.. I know its a lot of memory.. But I have to get the data processed. And I am sure querying the database for all the time wont do much good.

    I feel that the current code is running in a good manner. Profiling is something I have not done much. But yes off to that then.

    I will get back with some reports on that.

    11 years ago

    Hey all,

    I need some help in improving the performance of this program that I am writing.

    I have an input file containing about 20000 lines. Each line has 12 fields which represent different data. I need to read each line and perform about 80 rules on them which makes sure the information in the line is correct - if not correct them - and pass on the information. All these rules(except a handful) query one of about 25 databases to perform the required check. I require the processing to be completed within 10-12 minutes which is when I receive another file for processing.

    Here is what I have done upto now.

    I have read the file line by line and stored into an array. For each line, I split the line to the 12 fields and then perform all the rules on the line one by one.

    Since I figured doing JDBC query all the time would be a bad idea, I am doing all the JDBC query before I start processing the input and pass the data in as String. I store the same in a HashMap with the tables' primary key as the key and the content (taken as string separated by a delimiter) as the value. I have stored 15 tables that I use as HashMaps. I get values from them using the


    function which does the required pretty well. The only thing that I am not able to do is perform the operation within 10-12 minutes.

    I ran the application with about 1500 records. This took about 1 minute to complete. I also did a print in the loop so that I can see where the program is running at the moment (bad way!! I know!!). It was running pretty fast enough.

    When the input is say 15000 (one extra zero - for those who didn't notice), the application takes about 40 minutes to complete. The print inside the loop was also working weirdly. I cant seem to figure out why. After about 2000-3000, the print seems to be halting for about a second or so after 10-15 input lines. This means the application is taking a break every 15 records. Does this have anything to do with the HashMap that I am using.

    Also, I did one more test just so that the HashMap is a convincing option. There are 15 or so tables which I require to query. As I said I do the query, put the output as delimited strings and put them in a HashMap. I calculated the time taken to put all the data into a HashMap, iterate through it using the Iterator and print the same. This whole thing took only about 2-3 seconds for all tables which would come to about 72000 records.

    Can you suggest any improvements I can do on this whole program?? Can you suggest any other way in which the same processing can be done??

    Would it help to change the hashCode() or maybe write my own get function for the HashMap? Please suggest

    Please do ask if you need any more clarifications on what I am doing?

    Thanks in advance

    11 years ago