wood burning stoves 2.0*
The moose likes JDBC and the fly likes Efficient java code to process large db set and parse them Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Efficient java code to process large db set and parse them " Watch "Efficient java code to process large db set and parse them " New topic
Author

Efficient java code to process large db set and parse them

Abagesh Kumar
Greenhorn

Joined: Mar 26, 2012
Posts: 9

hello:


I have a stored procedure sp1 which

1. Selects all records from table ( 5 million records)


select * from table1 where recstatus=0


I need to use this sp1 and write a java client


1. Java client calls sp1 and gets result set

2. For every record read:
parse the contents of a the long string into an object
String = ^0609|8|2829.49673|07704.82322|13.627|130.59|140312075012|2|0|A|1|0|0|0|3.015600|12.230000|2748.26|76373.00|352024
I need to parse this string using delimiter ( it is | here , it can be anything) and pack them as individual entities ( take care eg:140312075012 is date time and should be packed as 03/14/2012 04:00) put them in a way so that they can be sent to another stored procedure sp 2

end loop

procedure sp2
@param1 as varchar(50)=NULL,
@param2 as varchar(20)=NULL,
@param3 as varchar(20)=NULL,
..
...
...
...


There is a VB client which is doing this , but the app is very slow. What is the best way to write this app ( using DAO for db connection pool) and good parser/packer and , getting the result set ) - keeping performance as a big need? Any guidance will be of great help .

Thanks

Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3611
    
  60

The best way would be to do all this processing in the database:

1) If possible, get rid of stored procedures and write this as a single SQL statement (INSERT INTO ... AS SELECT ...). This assumes that the first procedure is a simple select and the second procedure does an insert. In most databases this could be rewritten as an SQL statement (splitting up the string can be done in virtually all databases, especially if the number of items in the string is known beforehand). That's going to be the fastest way possible.

2) If there is more complicated processing involved, write a third stored procedure to do the processing you're currently doing in VB. This is the second fastest way, certainly much faster than downloading millions of records from the database to the client, splitting them up and sending the same data once more via the network back to the database.

I'd strongly advice against any other way of doing this. Rewriting current VB code in Java won't probably lead into a measurable performance improvement, unless the VB code itself contain performance issues (that is possible, of course).

In any case, you should consider storing the data in the database already in parsed format, to avoid having to convert them later.
Abagesh Kumar
Greenhorn

Joined: Mar 26, 2012
Posts: 9
Thanks.

The first procedure is a simple select ( to get all the records ),


But the reason for dumping the raw data as it is pooled without parsing is because we dont want to loose any data. So as data packets are recieved they are dumped into the db and read later to be processed.

The number of items in the string and the delimiter and its positions vary. The second storedprocedure calls many more stored procedures and does a lot of processing on the values ( for every record) passed before it does an insert/ update on a different table ( using modifed values)

So, each record in the first table must be read and processed simultaneously ( threads) and the db contention must be managed too. So I was thinking of using a Java client for this purpose.

Thanks




Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3611
    
  60

In this case, go for the stored procedure approach.

I don't understand this:
So, each record in the first table must be read and processed simultaneously ( threads) ...

Why couldn't one record be completely processed by one thread? (I'd understand if you wanted to distribute input records evenly to different threads, so that each record would be processed by one thread only.)

Anyway, chances are that by moving the processing to the database it becomes fast enough and you won't have to parallelize it. If you do need to make it parallel after all, it is certainly possible with the stored procedure approach too.
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1871
    
  16

Martin is right: do it in the database. Databases are good at this sort of thing and you don't need to move tons of data around the network or manage all your own DB processes.

If you're using Oracle and you can't do it in pure SQL (the best solution), look at using bulk PL/SQL processing to speed things up without blowing your DB memory.


No more Blub for me, thank you, Vicar.
 
 
subject: Efficient java code to process large db set and parse them