File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Threads and Synchronization and the fly likes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark ""Thread pool" for "file parsing" Pls suggest idea." Watch ""Thread pool" for "file parsing" Pls suggest idea." New topic
Author

"Thread pool" for "file parsing" Pls suggest idea.

Ganesh Gowtham
Ranch Hand

Joined: Mar 30, 2005
Posts: 225

Hi All,
Could u pls suggest the best idea for the below requirement.
i do have "ArrayList" which all fully qualified releative path of all files in server.
my main programs reads from "harddisk" based on the "config.xml" which has extensionto scan (like doc or jpg ...)
so it will store in "ArrayList which satisfies all config file properties.

I need to create "Thread Pool" which will read all files in "ArryList" one by one w.o
amibiguity . so i want to create a "4 threads" intially .
This threads should read files one by one ..
if one thread after reading should go to pool ,threr it check out from pool and again read another file.
There by my process becomes easy

Since i am new to "Thread" coul;d u pls suggest ..


Thanks


Thanks, Ganesh Gowtham
http://ganesh.gowtham.googlepages.com
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24166
    
  30

The easiest thing to do would be to use the new BlockingQueue classes in JDK 1.5 -- i.e., see this. Use this class in place of your ArrayList, and then it's easy to coordinate multiple threads working on parsing the files.


[Jess in Action][AskingGoodQuestions]
Ganesh Gowtham
Ranch Hand

Joined: Mar 30, 2005
Posts: 225

Hi Ernest ,
In my module there is hard and fast rule use "ThreadPol" for this scenario ,Thats Y , even i cant upgrade the jdk version once the core module is above to deliver...
Pls suggest (or) link tooo

Thanks
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Could you make a static method to get the next file to work on from the array and have each thread call it when it needs another file:

ix and array are both static variables.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Ganesh Gowtham
Ranch Hand

Joined: Mar 30, 2005
Posts: 225

Thanks a lot,
I just wrote code to achive for the above secanrio ...
Thread pool has MAX of 4 threads
1st thread will read 1st file in arraylist
2nd thread will read 2nd file in arraylist...
so 4th Thread ...4th file
at that time my first thread finishes the job i need to kill that and check in my pool

My threads will read the files in arrylist ...
what i want is ...
once 1st hread finishes the job it sud notify me.. so that i will check in that threda oin my pool so that that same thread will read some other file..
since i am new i cant able to do this task..
ls suggest
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Even slicker, keep the thread running and just handle the next file. That's the benefit of a thread pool - we don't have to start up and tear down a thread per file, just the four we want running. Combined with the synchronized nextName() I sketched out above, each thread would do:
Ganesh Gowtham
Ranch Hand

Joined: Mar 30, 2005
Posts: 225

HI James
Combined with the synchronized nextName() I sketched out above
if nextName() is synchronized how come my 4 threads will get access of that mtd.since it is "ArrayList" it will give u data though accessing by so manyt threads.

if i write a code in manner as u said , i think this code is violating ?Thread pool" it becmoes normal prg..

what i want is if 1st Thread is finished it sud niotify me so that i will killor interupt so that i will make that thread null so i will create new and keep in "Thread Pool".

if anything is wrong Pls suggest idea...
Ganesh Gowtham
Ranch Hand

Joined: Mar 30, 2005
Posts: 225

hi .James ..
Thanks ur code works with fine.
But it is autamaticallly handled by JVM.
but as per my module i nedd to write code for
Checkin()
{
// code when 1st thread is finsihed i need to manually
nulligy that object and remove that thread from pool
}
checkOut()
{
(instance thread) createPool() / mtd which will create a thread
// code to add that returned insatnce to pool
filePraser()
}
So how sud i nullify that thread at end...
u want me to assign the null to all threads at end..


Pls suggest
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
I think we are talking about a couple different techniques. Generic object pools are often built as a collection that you check objects out of and return them to. We just made a pool like that for MQ-Series connections. The Apache Commons has a nifty object pool, too. The ones I've seen were implemented as a "blocking queue" or some kind of auto-resizing collection.

The thread pools I've seen don't do that. Instead you fire up some number of threads and they pull work from a queue. I guess that is not the usual "pool" but it does the job. You create a small number of threads that are used and reused for any number of tasks. These use a blocking queue or auto-resizing collection again to feed tasks to the threads, but not to hold the threads.

I described something to solve this problem: Given an array of filenames, start four threads so that each thread processes one file at a time until all filenames in the array have been processed.

If your problem is more complex, the solution will be as well. To make the threads sit around and wait for work I'd study the commons thread pool design and the Java 5 APIs, then (for schoolwork) write original code that I could turn in. To actually manage thread objects through checkout and return is more work yet ... you might start with an object pool (blocking queue) to hold the threads.
[ July 04, 2005: Message edited by: Stan James ]
Ganesh Gowtham
Ranch Hand

Joined: Mar 30, 2005
Posts: 225

HI James,
I think that you said reading files in run() unless some condition is flase makes my work simpler ,rather than destroying the thread , agin removing , creating the thread and adding in pool makes the performanxce low. Thanks for ur idea.

MY REQUirement:
As u know what i code does is once all files names have been dumped in"ArrayList" my "ThreadPool" starts and do the work,Since "dumping" in the "Arraylist"takes so much of time , in the mean while i want to read that list to scan...
it is something like semaphores where one dumpes and one sud read simultaneosy

where LIST_MAX is maximum size or araylist ,now i can i get the size() ,since by the time my pool starts my ayyaylist is full,when i start simulatneosly how to implemant this idea....
Pls suggest..
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
As u know what i code does is once all files names have been dumped in"ArrayList" my "ThreadPool" starts and do the work,Since "dumping" in the "Arraylist"takes so much of time , in the mean while i want to read that list to scan...

I think this says two subtly conflicting things ... one is "once all file names" are ready which sounds like we build the complete list first, but "in the mean while" sounds like you want to read the list at the same time it is being built. The former is what I worked at before, but the latter is much more fun and is precisely the kind of problem thread pools like the Apache commons pool are built to solve.

Using a thread pool replaces your array with a queue of objects. The main thread produces filenames and puts them in the queue. Each worker thread consumes filenames from the queue and does the processing. The "blocking queue" is the key. The commons thread pool, the commons object pool and JDK 1.5 all have nice blocking queues. If a thread asks for the next filename from the queue and there are no filenames ready, the thread blocks, hence the name. If there is a name available, the thread gets it and goes to work immediately.

It's easy to know when you're done building the filename list and you have put them all into the queue. It's a little harder to know when they've all been consumed. A counter held by a shared object might do the trick. What would you like to have happen when all the names have been processed?
Ganesh Gowtham
Ranch Hand

Joined: Mar 30, 2005
Posts: 225

James Thx for Ur reply..
1. i cant upgrade jdk1.5 for some reasons.
how and what my module is

A) what i am doing is .( mymodule is somethig like google dektop crawling)
intially when prg starts it will read xml (config.xml) file ( which has what extension file to scan which folder to scan).
B) so my single ton class starts read the xml file and store all relevant file(fully qualified path in "ArrayList" it may take some much of time may be 5 hrs)
C) In the mean while i want to read "ArrayList" to by reading reach data in list" with 4 threads to make my work easy and fast too..( ir something like semaphore problem )
My Problem
----------------
now i code will start scan(starting thredas) only happend s when adat in arralylist is populatesd since it takes so much of time in production nenvironment i want to use that CPU cycle in efficient way so i want to start my threads to read that arrylist..

As of now i am doing with arraylist.size() , it is not applicable when i start reading teh arry;ist will it is filling...

ANY IDEA"S ...
let me know ur personal email id so i it will good for me further ...
Thanks
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
I don't think I have any *new* ideas to contribute. To summarize the old ones: I'd lose the ArrayList, put names in a queue and have each thread pull names from the queue. I might override "get next name from queue" to somehow shut the whole show down if all names have been read and the queue is empty.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: "Thread pool" for "file parsing" Pls suggest idea.
 
Similar Threads
Using threads to access data from files
Best way of doing ThreadPool
Improve Performance by Best design
Process the multiple records in a file by Producer/consumer concept using Multithreading
processing huge file in multithreaded env