• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Reading data from tables using Java Threads.

 
harish sr
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi ,

I was looking for some seudo code to read data from a table using java threads,here i idea is to read millions of data parallely.
Please suggest some good articles w.r.t to this.
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you mean reading data from a database table? To do this in parallel you'd need to have a variety of resources available. In my experience, the most limiting bottleneck is often the network. In this case, reading data in parallel will not result in speedup, in all probability just more resources will be consumed in the database. Parallel fetch would speed things up only if the most limiting resource was the CPU and there were more CPUs available in the database, which is certainly possible, but not the most common configuration in the real world.

In any case, setting correct fetch size is critical for reading huge amounts of data. This might depend on the database, but definitely check this out. Good setting might improve your performance considerably.

If you need to process the data in parallel, I'd still suggest to fetch them serially and then distribute them to worker threads in the application.
 
harish sr
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Martin Vajsar wrote:Do you mean reading data from a database table? To do this in parallel you'd need to have a variety of resources available. In my experience, the most limiting bottleneck is often the network. In this case, reading data in parallel will not result in speedup, in all probability just more resources will be consumed in the database. Parallel fetch would speed things up only if the most limiting resource was the CPU and there were more CPUs available in the database, which is certainly possible, but not the most common configuration in the real world.

In any case, setting correct fetch size is critical for reading huge amounts of data. This might depend on the database, but definitely check this out. Good setting might improve your performance considerably.

If you need to process the data in parallel, I'd still suggest to fetch them serially and then distribute them to worker threads in the application.


Yes I meant reading from database table,i was thinking to fetch the data using threads,but not sure how to implement it.
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
harish sr wrote:Yes I meant reading from database table,i was thinking to fetch the data using threads,but not sure how to implement it.

Well, if you've enough resources for massive parallel reads, all you need is to split your data somehow into roughly equally-sized groups. This might depend on your data. If these are records spanning several years, for example, you might want to assign each year to one thread. You'll create a query that fetches just one group and then simply run these queries - one group per query, one query per thread - from worker threads. Some databases have built-in support that might help you to divide the data into groups, I hazily recall that Oracle has a feature that does this based on rowids (therefore there is no logical relationship between the groups).

Another consideration: what do you want to do with the data after you fetch them? If you could implement that logic in the database, it will in all probability be several orders of magnitude faster (no kidding). And given enough resources, this work could be done in parallel in the database too (depending on the db, of course). Fetching huge amounts of data across network just to do some processing over them is not a good idea; to process - not only store - the data is what the databases were built for in the first place. (This should have been said in my first response, of course.)
 
John Vorwald
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've been thinking of doing something similar to this, submit data request on different threads, and then have joins to bring them all back together, to see if that speeds up the data acquisition. My only suggests are 1) get the data requests to work, 2) put the request in a class by itself, with the data that's coming back, 3) I don't think there's any issue with concurrency of having data overwritten, since the data space is not shared, 4) join the threads together at the end, where presumably you are waiting for longest data request to finish.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic