aspose file tools*
The moose likes Performance and the fly likes I want to Read and copy 800 MB file. Any efficient way. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "I want to Read and copy 800 MB file. Any efficient way." Watch "I want to Read and copy 800 MB file. Any efficient way." New topic
Author

I want to Read and copy 800 MB file. Any efficient way.

kapil patel
Ranch Hand

Joined: Sep 30, 2005
Posts: 33
I have an assignment, where I have to read 800 MB file and create another file of same size. With I/O it takes 45 minutes and I am given target to complete in less time. Is there any other solution can anyone suggest???

Thank you.

"I love Java"
[ October 26, 2005: Message edited by: kapil patel ]
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
You need to read entire file using IO and again write it using IO or you need to copy the file using copy commnad?


My blood is tested +ve for Java.
kapil patel
Ranch Hand

Joined: Sep 30, 2005
Posts: 33
Thank you chetan.

I have tried using Java I/O. It takes 45 times. But I have target to complete in less time. Can u suggest any other solution?

Moreover, as an Input: I have resultSet or ArrayList and I have to copy those as Flat File with some formating.
David O'Meara
Rancher

Joined: Mar 06, 2001
Posts: 13459

Did you want to show us what you have and we can offer suggestions? eg I'm assuming you have a buffered stream piped to another buffered stream, but as Chetan says, you may be using the OS copy command.
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
Kapil,

Can you just post what data your are fetching and what you are storing.

Post in below manner.

Fetching :
FirstName Lastname
Chetan Parekh
Kapil Patel

Storing:
My name is Chetan Parekh.
My name is Kapil Patel.
David O'Meara
Rancher

Joined: Mar 06, 2001
Posts: 13459

A quick internet search provided some code that did a 600M file in about 3 minutes, with other activities running in the background. I'd love to pass it on, but we'd like to see yours first...
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
Kapil, I just want to give an example where we have done optimization.

We had to generate .xls file by firing query on database. Query was returning huge data. Query time and .xls file generation from ResultSet were very high. We found that, we were providing the same data to all users and to fulfill request of each and every user, we are doing same process again and again.

We decided that we will generate .xls file in BOD and give the link of that file for download, rather then generating file on the runtime.

We have reduced huge amount of load by doing this.

This is just to give you a hint if your requirement is the same.

Post your requirement. Why and what you want to save in file.
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
Some Tips
JuanP barbancho
Ranch Hand

Joined: Oct 25, 2005
Posts: 52
Hi,

I attach some code that I use in a lot of program:

kapil patel
Ranch Hand

Joined: Sep 30, 2005
Posts: 33
Thank you all.

My requirement is:

I am getting some data from database in ResultSet as string. We want to get all data as a dump in one flat file. Number of resultset is huge around 800 MB size of database. We need to do this only through Java application. A normal Java I/O takes 45 minutes. We want to reduce it to less than 20 mins.

Java NIO (New I/O) might be probable solution. Has anyone tried to write string objects using NIO into file system???

Or is there any other alternative???

Thanks again.

"I love Java"
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
These are some general tips to optimize your performance, but you will not see any dramatic improvement.

(1)Use native Database driver they are fast
(2)Set record fetch size to the optimum level

(3)If you have to do formatting of the data before duping in file, try to do some part of it in query.
Stefan Wagner
Ranch Hand

Joined: Jun 02, 2003
Posts: 1923

Are you sure, that IO is the bottleneck, and not the database-query?

If your drive is already used by other tasks, you could speed up things by writing to a different drive.

Without seeing your code, it's hard to suggest improvements there.


http://home.arcor.de/hirnstrom/bewerbung
kapil patel
Ranch Hand

Joined: Sep 30, 2005
Posts: 33
Thank you all. I/O is bottleneck in our case. We are trying with NIO.

If I will find solution, I will inform all.
Thanks chetan, your suggestion also we will consider for this.

"I love Java"
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14352
    
  22

You can find the code to copy a file using NIO here:

The Java Developers Almanac - Copying One File to Another (using nio)


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
JuanP barbancho
Ranch Hand

Joined: Oct 25, 2005
Posts: 52
HI,

Try CloverETL in Sourceforge, is a tool for read data from source, formatting and save it.

It is very fast, multithread and NIO.

I make a tool similar using 4 thread, but I need than the program make faster than CloverETL.

I make a tool in C and Pthread only for ORACLE, It is fastest than CloverETL, but I love Java and not C. If you want this tool, I could provide

send to me a email barbyware@yahoo.com
JuanP barbancho
Ranch Hand

Joined: Oct 25, 2005
Posts: 52
Try read data in a Thread and Write to File in other Thread.
JuanP barbancho
Ranch Hand

Joined: Oct 25, 2005
Posts: 52
Hi,

I put and very fast unload java program, I expect make this an open source project.

http://groups.yahoo.com/group/barbyware/files/
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
Originally posted by JuanP barbancho:


I expect make this an open source project.



Thx a lot.

Future is open, support Open Source.
[ October 27, 2005: Message edited by: Chetan Parekh ]
JuanP barbancho
Ranch Hand

Joined: Oct 25, 2005
Posts: 52
Hi,

I make this tool whithout NIO, It is posible that you improve using NIO and memory ByteBuffer.

I expect work in this project as soon as posible.
Scott Selikoff
author
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3716
    
    5

Kapil-

I'm curious how your storing the data in memory and how frequently you are writing to file. For example, are you storing the entire set in memory or writing to the file as soon as it is read from the result set? I think some clever memory management may help.

I would tend to think the database would be more of a bottleneck than the I/O, especially if it is accross a network, but again, I think we'd need to see some example code to get a better idea what you are doing.


My Blog: Down Home Country Coding with Scott Selikoff
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I agree with Scott, a DB across a network is most likely the problem here. 45 minutes for 800 MB is pretty bad, even for old-fashioned Java IO. Kapil, what makes you think the DB is not the bottleneck? Is it the fact that the execute() method returned quickly? That means nothing. DB drivers often return a ResultSet very quickly, even though the actual data is still being gathered. That's useful because it allows you to start processing data right away - but the real question is, how long does it take to get all of it?

You may find it interesting to see how long it takes to execute the following:

If you find it takes something close to 45 minutes with no additional I/O at all - that's a pretty big clue that the problem really is in the DB.

You could also add a call to resultset.getString(0) or whatever field(s) you need to access in each result inside the while loop, again with no additional I/O code. I suspect that you won't need it, but it might make a difference.

Please let us know what results you get from this. Good luck.


"I'm not back." - Bill Harding, Twister
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
More to what Jim Yingst said,

JuanP barbancho
Ranch Hand

Joined: Oct 25, 2005
Posts: 52
Hi,

I was the same problem, 45 minutes is very bad, but I it posible with large tables, 6 000 000 record with a lot of column, 50 or more. If you use a lot CHAR, then the problem is the same that I was in the past.

JDBC is streaming, you need read data in a loop.

. Try to use a buffer with 1000 or more record.
. Try to use the minimal byte, short, int, long of you can.
. Try to save the file in batch use a large Buffer.

If none tool run I could provide a C tool for Unload data

Thanks
JuanP barbancho
Ranch Hand

Joined: Oct 25, 2005
Posts: 52
I forgot if you want double database memory.

Thanks
Harald Kirsch
Ranch Hand

Joined: Oct 14, 2005
Posts: 37
Try to cheat, make a (soft) link.


Harald.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Kapi, some comments:

- The suggestions you get might be a little bit confusing, because your first description of the problem did sound like you wanted to make an exact copy of an existing file. Writing the formatted content of a DB to a file is a *very* different problem.

- It's hard to help you without knowing more. The current suggestions cannot be much more than wild guesses, and your conclusion that it's an IO problem feels very much like a guess, too. I very much doubt that NIO will solve your problem and would strongly suggest to first do a deeper analysis of the problem with our help to really find the bottleneck.

- As this is a performance question, I'm moving it to our Performance forum...


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
jiju ka
Ranch Hand

Joined: Oct 12, 2004
Posts: 306
Loading 800MB at a time and making a copy needs 1600MB of your RAM. If you don't have that much RAM your oerating system will do paging to create virtual memory. Most of the time will be lost in the paging process. In your 45minute program this might be the issue.

The solution to this is don't keep more than x bytes in memmory at a time. A fast algorithm will find a value for x at runtime based on allotted memory, other applications running etc. This x bytes will be rwad from input stream and write to output stream. The value for X should be chosen such that paging will not happen.

Does this help?
JuanP barbancho
Ranch Hand

Joined: Oct 25, 2005
Posts: 52
Hi,

Try to use a Thread for save the data, you could use two buffer array.

Thanks you.
 
wood burning stoves
 
subject: I want to Read and copy 800 MB file. Any efficient way.