This week's book giveaway is in the Other Open Source APIs forum. We're giving away four copies of Storm Applied and have Sean Allen, Peter Pathirana & Matthew Jankowski on-line! See this thread for details.
karthick sambanghi wrote:I want to extract XML datatype from oracle database & write it to a file.
Well, that right there strikes me as a complete waste of time, but there's no accounting for some clients.
the issue is it is huge volume database. At present i am extracting 1 million records & writing it to a file within 17 minutes time.
But the client requirement is to extract the data & write it to a file within 5 minutes for 2 million records....
First off: is this going out to one file or many? If only one, then threading is unlikely to make much difference.
I have created one Manager thread & worker thread. The input size of thread is "10" for 1 million records so that it will create some "100" jobs with batchsize=10000
Is there any other way to increase my performance ?
It sounds to me like you're diving into the mechanics of how you're going to do this before you've actually worked out what you need to do.
Where is the delay? Is it getting this stuff from the database? Or is it writing it out? Have you, for example, timed a single Thread doing each thing in isolation? You could time the latter by just bashing out 2 million arbitrary lines of roughly the right size to a file (or many files, if that's what you're going to need), and forget about the db altogether. I wouldn't be at all surprised if you might be looking at a couple of minutes right there.
And, like I say, what you do will be very dependant on whether you're writing out to a single file or many of them - and indeed, whether you're getting all this stuff from a single database record or many of them.
Oh, and one last thing: I hope you're buffering your output, because that alone could have a major bearing on how long it takes.
Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
On my 5 year old desktop running the latest XUbuntu with JDK1.7.0_40 using single thread it takes 62 seconds to write a single file of 2,000,000 lines each between 352 and 358 bytes long. It seems to me that you should profile a single threaded version of your code to see just where the time is being taken; I bet it is in the DB access and that using multiple threads to access the DB will make little or no real difference.
I am no expert, but i do know that multi-threading is not a panacea. There are situations where making something multi-threaded can SLOW it down.
There simply is no magic bullet for performance. Each and every application will need a different fix - but the way to find it is what Winston (and pretty much everyone else in the world) says: find out WHERE your program is slow, and work to improve that. Don't assume you know. You are not right. Don't say "i know THIS always speeds up programs, so I'll do it". It won't.
If you wanted to make a vehicle go faster, you might say "i know that reducing the weight will speed it up", so you spend $20,000,000 designing and building some new parts out of a hi-tech polymer to reduce the weight of the car by 20 lbs. Your vehicle now goes 0.0001% faster. Yippee.
But someone else looks at all aspects of the car, including weight, air intake, fuel type, and air resistance. They realize that adding a $50 air spoiler to it will reduce drag by 20%, giving them a 5% speed boost.
You need to find where you can get the biggest bang for your buck.
There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
You might want to investigate native tools provided by Oracle outside of the database to extract data out of the database It makes your code database dependent, but at the same time you don't have to reinvent wheels that oracle developers have invented.
Jayesh A Lalwani wrote:You might want to investigate native tools provided by Oracle outside of the database...
@karthick: Or indeed, ask your clients whether they really want to do this.
Clients aren't always right; and it sounds to me like they're not only telling you what to do, but how to do it - and that's a recipe for disaster.
Presumably these pieces of XML were put into the database for a reason: Why can't you use them directly from it?
Pulling them out, only to plough them back into some file, or set of files, strikes me as the quintessential definition of "noise work".