• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Efficient java code to process large db set and parse them

 
Abagesh Kumar
Greenhorn
Posts: 9
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

hello:


I have a stored procedure sp1 which

1. Selects all records from table ( 5 million records)


select * from table1 where recstatus=0


I need to use this sp1 and write a java client


1. Java client calls sp1 and gets result set

2. For every record read:
parse the contents of a the long string into an object
String = ^0609|8|2829.49673|07704.82322|13.627|130.59|140312075012|2|0|A|1|0|0|0|3.015600|12.230000|2748.26|76373.00|352024
I need to parse this string using delimiter ( it is | here , it can be anything) and pack them as individual entities ( take care eg:140312075012 is date time and should be packed as 03/14/2012 04:00) put them in a way so that they can be sent to another stored procedure sp 2

end loop

procedure sp2
@param1 as varchar(50)=NULL,
@param2 as varchar(20)=NULL,
@param3 as varchar(20)=NULL,
..
...
...
...


There is a VB client which is doing this , but the app is very slow. What is the best way to write this app ( using DAO for db connection pool) and good parser/packer and , getting the result set ) - keeping performance as a big need? Any guidance will be of great help .

Thanks

 
Martin Vajsar
Sheriff
Pie
Posts: 3747
62
Chrome Netbeans IDE Oracle
  • 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The best way would be to do all this processing in the database:

1) If possible, get rid of stored procedures and write this as a single SQL statement (INSERT INTO ... AS SELECT ...). This assumes that the first procedure is a simple select and the second procedure does an insert. In most databases this could be rewritten as an SQL statement (splitting up the string can be done in virtually all databases, especially if the number of items in the string is known beforehand). That's going to be the fastest way possible.

2) If there is more complicated processing involved, write a third stored procedure to do the processing you're currently doing in VB. This is the second fastest way, certainly much faster than downloading millions of records from the database to the client, splitting them up and sending the same data once more via the network back to the database.

I'd strongly advice against any other way of doing this. Rewriting current VB code in Java won't probably lead into a measurable performance improvement, unless the VB code itself contain performance issues (that is possible, of course).

In any case, you should consider storing the data in the database already in parsed format, to avoid having to convert them later.
 
Abagesh Kumar
Greenhorn
Posts: 9
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks.

The first procedure is a simple select ( to get all the records ),


But the reason for dumping the raw data as it is pooled without parsing is because we dont want to loose any data. So as data packets are recieved they are dumped into the db and read later to be processed.

The number of items in the string and the delimiter and its positions vary. The second storedprocedure calls many more stored procedures and does a lot of processing on the values ( for every record) passed before it does an insert/ update on a different table ( using modifed values)

So, each record in the first table must be read and processed simultaneously ( threads) and the db contention must be managed too. So I was thinking of using a Java client for this purpose.

Thanks




 
Martin Vajsar
Sheriff
Pie
Posts: 3747
62
Chrome Netbeans IDE Oracle
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In this case, go for the stored procedure approach.

I don't understand this:
So, each record in the first table must be read and processed simultaneously ( threads) ...

Why couldn't one record be completely processed by one thread? (I'd understand if you wanted to distribute input records evenly to different threads, so that each record would be processed by one thread only.)

Anyway, chances are that by moving the processing to the database it becomes fast enough and you won't have to parallelize it. If you do need to make it parallel after all, it is certainly possible with the stored procedure approach too.
 
chris webster
Bartender
Pie
Posts: 2358
31
Linux Oracle Postgres Database Python Scala
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Martin is right: do it in the database. Databases are good at this sort of thing and you don't need to move tons of data around the network or manage all your own DB processes.

If you're using Oracle and you can't do it in pure SQL (the best solution), look at using bulk PL/SQL processing to speed things up without blowing your DB memory.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic