wood burning stoves*
The moose likes Threads and Synchronization and the fly likes Reading data from tables using Java Threads. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "Reading data from tables using Java Threads." Watch "Reading data from tables using Java Threads." New topic
Author

Reading data from tables using Java Threads.

harish sr
Greenhorn

Joined: Sep 02, 2011
Posts: 3
Hi ,

I was looking for some seudo code to read data from a table using java threads,here i idea is to read millions of data parallely.
Please suggest some good articles w.r.t to this.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Do you mean reading data from a database table? To do this in parallel you'd need to have a variety of resources available. In my experience, the most limiting bottleneck is often the network. In this case, reading data in parallel will not result in speedup, in all probability just more resources will be consumed in the database. Parallel fetch would speed things up only if the most limiting resource was the CPU and there were more CPUs available in the database, which is certainly possible, but not the most common configuration in the real world.

In any case, setting correct fetch size is critical for reading huge amounts of data. This might depend on the database, but definitely check this out. Good setting might improve your performance considerably.

If you need to process the data in parallel, I'd still suggest to fetch them serially and then distribute them to worker threads in the application.
harish sr
Greenhorn

Joined: Sep 02, 2011
Posts: 3
Martin Vajsar wrote:Do you mean reading data from a database table? To do this in parallel you'd need to have a variety of resources available. In my experience, the most limiting bottleneck is often the network. In this case, reading data in parallel will not result in speedup, in all probability just more resources will be consumed in the database. Parallel fetch would speed things up only if the most limiting resource was the CPU and there were more CPUs available in the database, which is certainly possible, but not the most common configuration in the real world.

In any case, setting correct fetch size is critical for reading huge amounts of data. This might depend on the database, but definitely check this out. Good setting might improve your performance considerably.

If you need to process the data in parallel, I'd still suggest to fetch them serially and then distribute them to worker threads in the application.


Yes I meant reading from database table,i was thinking to fetch the data using threads,but not sure how to implement it.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

harish sr wrote:Yes I meant reading from database table,i was thinking to fetch the data using threads,but not sure how to implement it.

Well, if you've enough resources for massive parallel reads, all you need is to split your data somehow into roughly equally-sized groups. This might depend on your data. If these are records spanning several years, for example, you might want to assign each year to one thread. You'll create a query that fetches just one group and then simply run these queries - one group per query, one query per thread - from worker threads. Some databases have built-in support that might help you to divide the data into groups, I hazily recall that Oracle has a feature that does this based on rowids (therefore there is no logical relationship between the groups).

Another consideration: what do you want to do with the data after you fetch them? If you could implement that logic in the database, it will in all probability be several orders of magnitude faster (no kidding). And given enough resources, this work could be done in parallel in the database too (depending on the db, of course). Fetching huge amounts of data across network just to do some processing over them is not a good idea; to process - not only store - the data is what the databases were built for in the first place. (This should have been said in my first response, of course.)
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
I've been thinking of doing something similar to this, submit data request on different threads, and then have joins to bring them all back together, to see if that speeds up the data acquisition. My only suggests are 1) get the data requests to work, 2) put the request in a class by itself, with the data that's coming back, 3) I don't think there's any issue with concurrency of having data overwritten, since the data space is not shared, 4) join the threads together at the end, where presumably you are waiting for longest data request to finish.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading data from tables using Java Threads.