aspose file tools*
The moose likes Threads and Synchronization and the fly likes processing huge file in multithreaded env Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "processing huge file in multithreaded env" Watch "processing huge file in multithreaded env" New topic
Author

processing huge file in multithreaded env

JH Harrison
Greenhorn

Joined: Jul 05, 2006
Posts: 4
Hi all,
the functionality that i am trying to implement is:
1. Read file
2. validate each record(line).
3. store record to DB.

I want record processing should happen in parallel.
What I meant by this is, read a file by thread A, handle the line(record) to sub thread to validate and store in DB), while sub thread busy doing validation and storing, thread A continues to read file.
Basically what I don't want to happend is,
read record, validate record and store record in sequential pattern.

My initial sketch is something like below;
1. Create a Pool of threads
2. Create Jobqueue.

As the main thread start reading file, every record that fetches will put into, as and when records are available in the queue the second part of the process should get record from queue and validate it and store them , then pick next available record and continure untill it queue is empty.

Is this a right way of doing this, OR is there any better way of doing this?
If so, can some one here please suggest me.
any tools / opensource that has functionality of this kind is also welcome.
Bit of code snippet to get started with will be much appreciated.

Please help.

Thanks
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18836
    
  40

Sounds good.

BTW, threadpools and queues are built into Java 5. So it would be easier to implement, without external libraries/tools, using Java 5.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
JH Harrison
Greenhorn

Joined: Jul 05, 2006
Posts: 4
Thanks Henry,
We are using jdk.1.4.2 and not jdk5, hence looking for good libraris / API.
Can you guide me with little code snippet to get start, I am getting confused.

Thanks
J Harrison.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18836
    
  40

Originally posted by JH Harrison:
Thanks Henry,
We are using jdk.1.4.2 and not jdk5, hence looking for good libraris / API.
Can you guide me with little code snippet to get start, I am getting confused.

Thanks
J Harrison.


The concurrent libraries from Java 5 is basically a port of the library that was developed by Doug Lea. This library runs in earlier platforms, and is available here.

Henry
[ July 06, 2006: Message edited by: Henry Wong ]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
This is an interesting problem and one that might have very surprising results. You might consider two thread pools so you can observe queue sizes and concurrent threads.

main thread reads a record, creates a massage task, puts it in queue

process thread gets a massage task, massages data a bit, creates an insert task, puts it in queue

update thread gets an insert task, executes SQL against the database

I'd be interested to see if the update threads keep up, or if database locks are roughly equivalent to synchronizing the update method. The update thread might get up to "n" tasks and build a batch update statement and then fire it off. It would also have to know when we're all done to fire the last partial batch and commit.

Let us know how this runs!


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
JH Harrison
Greenhorn

Joined: Jul 05, 2006
Posts: 4
Hello,
Thanks for your help and suggestion. I downloaded backport-util-concurrent for jdk1.4 and trying some API. What I did so far is:
1. parse a file.
2. submit each line to sub task for further processing.
Below is the code snippet which is doing this.


Just wanted some feedback to know that I am in the right direction, I never used these API's before and I never worked with threading before.

Also I wanted to know that, how do I terminate or empty the queue if there are any problem/exception in either validation process?
The reason behind asking for this is, our requirement says that if there are any error(s) in the validation file processing should stop and rollback all the db contents for that particualr file.

Is this achievable. if so, can you guys please suggest me!

PLEASE!.

J Harrison.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: processing huge file in multithreaded env