Meaningless Drivel is fun!*
The moose likes Performance and the fly likes Multi-threading - Performance concept Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Multi-threading - Performance concept" Watch "Multi-threading - Performance concept" New topic
Author

Multi-threading - Performance concept

karthick sambanghi
Greenhorn

Joined: Sep 25, 2013
Posts: 17
Hi,

I want to increase my performance in the Java code. below is my requirement

I want to extract XML datatype from oracle database & write it to a file.

the issue is it is huge volume database. At present i am extracting 1 million records & writing it to a file within 17 minutes time.

But the client requirement is to extract the data & write it to a file within 5 minutes for 2 million records.

I have implemented Multi-threading concept in my program.

I have created one Manager thread & worker thread. The input size of thread is "10" for 1 million records so that it will create some "100" jobs with batchsize=10000

Is there any other way to increase my performance ?

Any help will be much appreciated.

Thanks
Karthick
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7063
    
  16

karthick sambanghi wrote:I want to extract XML datatype from oracle database & write it to a file.

Well, that right there strikes me as a complete waste of time, but there's no accounting for some clients.

the issue is it is huge volume database. At present i am extracting 1 million records & writing it to a file within 17 minutes time.
But the client requirement is to extract the data & write it to a file within 5 minutes for 2 million records....

First off: is this going out to one file or many? If only one, then threading is unlikely to make much difference.

I have created one Manager thread & worker thread. The input size of thread is "10" for 1 million records so that it will create some "100" jobs with batchsize=10000
Is there any other way to increase my performance ?

It sounds to me like you're diving into the mechanics of how you're going to do this before you've actually worked out what you need to do.

Where is the delay? Is it getting this stuff from the database? Or is it writing it out? Have you, for example, timed a single Thread doing each thing in isolation? You could time the latter by just bashing out 2 million arbitrary lines of roughly the right size to a file (or many files, if that's what you're going to need), and forget about the db altogether. I wouldn't be at all surprised if you might be looking at a couple of minutes right there.

And, like I say, what you do will be very dependant on whether you're writing out to a single file or many of them - and indeed, whether you're getting all this stuff from a single database record or many of them.

Oh, and one last thing: I hope you're buffering your output, because that alone could have a major bearing on how long it takes.

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 962
    
  10

On my 5 year old desktop running the latest XUbuntu with JDK1.7.0_40 using single thread it takes 62 seconds to write a single file of 2,000,000 lines each between 352 and 358 bytes long. It seems to me that you should profile a single threaded version of your code to see just where the time is being taken; I bet it is in the DB access and that using multiple threads to access the DB will make little or no real difference.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 10916
    
  12

I am no expert, but i do know that multi-threading is not a panacea. There are situations where making something multi-threaded can SLOW it down.

There simply is no magic bullet for performance. Each and every application will need a different fix - but the way to find it is what Winston (and pretty much everyone else in the world) says: find out WHERE your program is slow, and work to improve that. Don't assume you know. You are not right. Don't say "i know THIS always speeds up programs, so I'll do it". It won't.

If you wanted to make a vehicle go faster, you might say "i know that reducing the weight will speed it up", so you spend $20,000,000 designing and building some new parts out of a hi-tech polymer to reduce the weight of the car by 20 lbs. Your vehicle now goes 0.0001% faster. Yippee.

But someone else looks at all aspects of the car, including weight, air intake, fuel type, and air resistance. They realize that adding a $50 air spoiler to it will reduce drag by 20%, giving them a 5% speed boost.

You need to find where you can get the biggest bang for your buck.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2052
    
  22

You might want to investigate native tools provided by Oracle outside of the database to extract data out of the database It makes your code database dependent, but at the same time you don't have to reinvent wheels that oracle developers have invented.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7063
    
  16

Jayesh A Lalwani wrote:You might want to investigate native tools provided by Oracle outside of the database...

@karthick: Or indeed, ask your clients whether they really want to do this.

Clients aren't always right; and it sounds to me like they're not only telling you what to do, but how to do it - and that's a recipe for disaster.

Presumably these pieces of XML were put into the database for a reason: Why can't you use them directly from it?
Pulling them out, only to plough them back into some file, or set of files, strikes me as the quintessential definition of "noise work".

Winston
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Multi-threading - Performance concept
 
Similar Threads
NX: Primary Key and Immutable Key
Design 01
NX: File Consistency
Process the multiple records in a file by Producer/consumer concept using Multithreading
Improve Performance by Best design