• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

processing huge file in multithreaded env

 
JH Harrison
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,
the functionality that i am trying to implement is:
1. Read file
2. validate each record(line).
3. store record to DB.

I want record processing should happen in parallel.
What I meant by this is, read a file by thread A, handle the line(record) to sub thread to validate and store in DB), while sub thread busy doing validation and storing, thread A continues to read file.
Basically what I don't want to happend is,
read record, validate record and store record in sequential pattern.

My initial sketch is something like below;
1. Create a Pool of threads
2. Create Jobqueue.

As the main thread start reading file, every record that fetches will put into, as and when records are available in the queue the second part of the process should get record from queue and validate it and store them , then pick next available record and continure untill it queue is empty.

Is this a right way of doing this, OR is there any better way of doing this?
If so, can some one here please suggest me.
any tools / opensource that has functionality of this kind is also welcome.
Bit of code snippet to get started with will be much appreciated.

Please help.

Thanks
 
Henry Wong
author
Marshal
Pie
Posts: 20881
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sounds good.

BTW, threadpools and queues are built into Java 5. So it would be easier to implement, without external libraries/tools, using Java 5.

Henry
 
JH Harrison
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry,
We are using jdk.1.4.2 and not jdk5, hence looking for good libraris / API.
Can you guide me with little code snippet to get start, I am getting confused.

Thanks
J Harrison.
 
Henry Wong
author
Marshal
Pie
Posts: 20881
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by JH Harrison:
Thanks Henry,
We are using jdk.1.4.2 and not jdk5, hence looking for good libraris / API.
Can you guide me with little code snippet to get start, I am getting confused.

Thanks
J Harrison.


The concurrent libraries from Java 5 is basically a port of the library that was developed by Doug Lea. This library runs in earlier platforms, and is available here.

Henry
[ July 06, 2006: Message edited by: Henry Wong ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is an interesting problem and one that might have very surprising results. You might consider two thread pools so you can observe queue sizes and concurrent threads.

main thread reads a record, creates a massage task, puts it in queue

process thread gets a massage task, massages data a bit, creates an insert task, puts it in queue

update thread gets an insert task, executes SQL against the database

I'd be interested to see if the update threads keep up, or if database locks are roughly equivalent to synchronizing the update method. The update thread might get up to "n" tasks and build a batch update statement and then fire it off. It would also have to know when we're all done to fire the last partial batch and commit.

Let us know how this runs!
 
JH Harrison
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
Thanks for your help and suggestion. I downloaded backport-util-concurrent for jdk1.4 and trying some API. What I did so far is:
1. parse a file.
2. submit each line to sub task for further processing.
Below is the code snippet which is doing this.


Just wanted some feedback to know that I am in the right direction, I never used these API's before and I never worked with threading before.

Also I wanted to know that, how do I terminate or empty the queue if there are any problem/exception in either validation process?
The reason behind asking for this is, our requirement says that if there are any error(s) in the validation file processing should stop and rollback all the db contents for that particualr file.

Is this achievable. if so, can you guys please suggest me!

PLEASE!.

J Harrison.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic